Statistical Issues in Genome-Wide Association Studies

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Statistical Issues in Genome-Wide Association Studies

Author:

Jiang, Wei, author.

ISBN:

9780438130685

Personal Author:

Jiang, Wei, author.

Physical Description:

1 electronic resource (183 pages)

General Note:

Source: Masters Abstracts International, Volume: 57-06M(E).

Advisors: Weichuan Yu.

Abstract:

Genome-wide association studies (GWASs) are widely used to discover single nucleotide polymorphisms (SNPs) associated with diseases. Commonly, we use a multi-stage setting to discover associations and to validate identified findings. Under such a setting, we discover associations in primary studies and validate findings in replication studies. Only the associations showing statistical significance in both studies are regarded as true findings. In this dissertation, we study three statistical issues in multi-stage GWASs. Another related statistical issue is how to improve power with multiple GWAS data sets. This dissertation also proposes a novel joint analysis method using summary statistics from multiple GWASs.

First, we study how to estimate the power of replication studies in multi-stage GWASs. The traditional approach estimates the power by plugging observed effect sizes into the power calculation. However, this approach would make the designed replication study underpowered since we are only interested in primary associations (i.e., statistically significant associations in the primary study) and the problem of the "winner's curse" would occur. In this dissertation, we propose an Empirical Bayes (EB)-based method to estimate the power of a replication study for each association. Simulation experiments show that our method is better than plug-in-based estimators in terms of overcoming the winner's curse and providing higher estimation accuracy. Experiments on data of six diseases from the Wellcome Trust Case Control Consortium (WTCCC) show that sample sizes determined by power using our method are higher than those with the traditional approach.

Second, we study the probability of a primary association being validated in the replication study. This dissertation proposes a Bayesian probabilistic measure, named the replication rate (RR), to find the answer. We further provide an estimation method for RR which makes use of the summary statistics from the primary study. We can use the estimated RR to determine the sample size of the replication study and to check the consistency between the results of the primary study and those of the replication study. Simulation and real-data experiments show that the estimated RR has good prediction and calibration performance. We also use these experiments to demonstrate the usefulness of RR.

Third, we study how to determine significance levels in multi-stage settings. In traditional methods, the significance levels of the primary and replication studies are determined separately. We argue that the separate-determination strategy reduces the power in the overall multi-stage study. Therefore, we propose a novel method to determine significance levels jointly. Our method is a reanalysis method that needs summary statistics from both studies. We find the most powerful significance levels when controlling the false discovery rate (Fdr) in the multi-stage study. To enjoy the power improvement from the joint-determination method, we suggest selecting SNPs for replication at a less stringent significance level. Simulation experiments show that our method can provide more power than traditional methods and that the Fdr is well controlled. Empirical experiments on data sets of five diseases/traits demonstrate that our method can help identify more associations.

Finally, we study joint analysis methods using summary statistics from multiple GWASs. Traditionally, meta-analysis methods are used to complete this task. We propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the Fdr at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous data sets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical data sets of four phenotypes.

Local Note:

School code: 1223

Subject Term:

Biostatistics.

Added Corporate Author:

Hong Kong University of Science and Technology (Hong Kong).

Electronic Access:

http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:10903340

Available:*

Shelf Number	Item Barcode	Shelf Location	Status
XX(696767.1)	696767-1001	Proquest E-Thesis Collection	Searching...

On Order

Select a list

Make this your default list.

The following items were successfully added.

There was an error while adding the following items. Please try again.