Comparison of disease prevalence in two populations in the presence of misclassification

Man Lai TANG, Shi Fang Qiu*, Wai Yin Poon

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

5 Citations (Scopus)


Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high-cost infallible gold-standard classifiers or low-cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold-standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close-to-nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies.

Original languageEnglish
Pages (from-to)786-807
Number of pages22
JournalBiometrical Journal
Issue number6
Publication statusPublished - Nov 2012

Scopus Subject Areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

User-Defined Keywords

  • Difference between two disease prevalence rates
  • Partially validated series
  • Sample size
  • Score test


Dive into the research topics of 'Comparison of disease prevalence in two populations in the presence of misclassification'. Together they form a unique fingerprint.

Cite this