TY - JOUR
T1 - Estimating the proportion of true null hypotheses using the pattern of observed p-values
AU - Tong, Tiejun
AU - Feng, Zeny
AU - Hilton, Julia S.
AU - Zhao, Hongyu
N1 - Funding Information:
Tiejun Tong’s research was supported by Hong Kong RGC grant HKBU202711 and Hong Kong Baptist University FRG grants FRG2/10-11/020 and FRG2/11-12/110. Zeny Feng’s research was supported by Natural Sciences and Engineering Research Council of Canada individual discovery grant. Hongyu Zhao’s research was supported by NIH grant GM59507 and NSF grant DMS0714817. The authors thank Dr Stan Pounds and Dr Cheng Cheng from St Jude Children’s Research Hospital for providing the data set and helpful comments. The authors also thank the editor, the associate editor, and two reviewers for their constructive comments that have substantially improved the paper.
PY - 2013/9
Y1 - 2013/9
N2 - Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1-λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
AB - Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1-λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.
KW - gene expression data
KW - multiple testing
KW - p-value
KW - proportion of true null hypotheses
UR - http://www.scopus.com/inward/record.url?scp=84883656105&partnerID=8YFLogxK
U2 - 10.1080/02664763.2013.800035
DO - 10.1080/02664763.2013.800035
M3 - Journal article
AN - SCOPUS:84883656105
SN - 0266-4763
VL - 40
SP - 1949
EP - 1964
JO - Journal of Applied Statistics
JF - Journal of Applied Statistics
IS - 9
ER -