Eylem Seç
Multiple Testing Procedures Controlling False Discovery Rate with Applications to Genomic Data
Başlık:
Multiple Testing Procedures Controlling False Discovery Rate with Applications to Genomic Data
Yazar:
Gauran, Iris Ivy M., author.
ISBN:
9780355990034
Yazar Ek Girişi:
Fiziksel Tanımlama:
1 electronic resource (160 pages)
Genel Not:
Source: Dissertation Abstracts International, Volume: 79-10(E), Section: B.
Advisors: Junyong Park Committee members: Nak-Kyeong Kim; DoHwan Park; Anindya Roy; John Spouge.
Özet:
In recent mutation studies, analyses based on protein domain positions are gaining popularity over traditional gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. The overarching objective of this thesis is to propose different multiple testing procedures which can address the problems posed by discrete genomic data. Specifically, we are interested in identifying significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution.
In the first study, we developed an Empirical Bayes procedure. We assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed cut-off method juxtaposed with the two-stage testing procedure has superior empirical power.
In the second study, we developed full Bayesian procedures. We addressed the caveat of the Empirical Bayes procedure by proposing methods which can handle both the weakened assumption on the null distribution and the sparsity condition which is apparent among protein domains whose number of positions is considerably small. Based on the simulation studies, the full Bayesian methods have the ability to control FDR when the Empirical Bayes method fails. We also studied several cases in order to assess whether we need to implement the zero assumption on the null distribution. Results revealed that implementing this key assumption would still yield good results in terms of control of FDR and high values of the empirical power. In general, simulation results suggest that lesser number of rejections is preferable. The number of identified hotspots in the real data analysis are consistent with the simulation studies.
Notlar:
School code: 0434
Tüzel Kişi Ek Girişi:
Mevcut:*
Yer Numarası | Demirbaş Numarası | Shelf Location | Lokasyon / Statüsü / İade Tarihi |
---|---|---|---|
XX(680696.1) | 680696-1001 | Proquest E-Tez Koleksiyonu | Arıyor... |
On Order
Liste seç
Bunu varsayılan liste yap.
Öğeler başarıyla eklendi
Öğeler eklenirken hata oldu. Lütfen tekrar deneyiniz.
:
Select An Item
Data usage warning: You will receive one text message for each title you selected.
Standard text messaging rates apply.