Data Science Seminar
Hosted by the Department of Mathematics and Statistics

Abstract


This talk includes two recent studies. Study 1 is a methodology work: Gene-based association tests are widely used in Genome-wide Association Studies (GWAS). The power of a test is often limited by the sample size, the effect size, and the number of causal genetic variants or their directions in a gene. In addition, access to individual-level data is often limited. To resolve the existing limitations, we proposed an optimally weighted combination (OWC) test based on summary statistics from GWAS. We analytically proved that aggregating the variants in one gene is the same as using the weighted combination of Z- scores for each variant based on the proposed score test. Several popular methods are its special cases. We also numerically illustrated that the proposed test outperforms comparsion methods via simulation studies. Furthermore, we utilized schizophrenia GWAS data and fasting glucose GWAS meta-analysis data to demonstrate that our method outperforms comparsion methods in real data analyses. Study 2 is an application work: We used a carefully curated list of 87 previously published genetic variants to determine whether incorporation of genetic variants with non-genetic variables could improve identification of cancer survivors at risk for anthracycline-related cardiomyopathy. We used anthracycline-exposed childhood cancer survivors from a Children’s Oncology Group study (COG-ALTE03N1: 146 cases; 195 matched controls) as the discovery set. Replication was performed in two anthracycline-exposed survivor populations: i) childhood cancer survivors from the Childhood Cancer Survivor Study (CCSS: 126 cases; 250 controls); ii) autologous blood or marrow transplantation (BMT) survivors from the BMT Survivor Study (BMTSS: 80 cases; 78 controls). The Clinical+Genetic Model performed better than the Clinical Model in COG-ALTE03N1 (AUC of Clinical+Genetic Model = 0.88 vs. AUC of Clinical Model = 0.81) and BMTSS (AUC of Clinical+Genetic Model = 0.72 vs. AUC of Clinical Model = 0.64), but not in CCSS (AUC of Clinical+Genetic Model = 0.88 vs. AUC of Clinical Model = 0.89). However, the Clinical+Genetic model performed marginally better in CCSS patients without cardiovascular risk ractors where cardiomyopathy developed within 30 years of anthracycline exposure (AUC of Clinical+Genetic Model = 0.90 vs. AUC of Clinical Model = 0.85). Conclusions: Adding a comprehensively assembled genetic profile to clinical characteristics improves identification of cancer survivors at risk for anthracycline-related cardiomyopathy.


Biography of the speaker: Dr. Wang is Professor of Biostatistics at the Florida International University and Professor (Adjunct) of Biostatistics in the School of Medicine at the University of Alabama at Birmingham. She has extensive experience in designing a study based on a specific research goal, providing protocol to collect sample data, performing quality control of a big dataset, supervising postdocs or graduate students in methods development and data analysis, interpreting the study findings. In the past 10 years, she has worked with a number of principal investigators in grant applications by proposing study design, power estimation, methods, and procedures for data analysis. Dr. Wang has led data analysis for numerous projects in cancer etiology and treatment-related adverse outcomes ranging from candidate gene, genome-wide association study, to next-generation sequencing data analysis. In addition, she has developed many powerful statistical methods and computational tools in genetic association studies which contribute to the identification of the genetic susceptibility to complex diseases.