A modified Hosmer–Lemeshow test for large data sets

Wei Yu, Wangli Xu, Lixing ZHU*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)

Abstract

The Hosmer–Lemeshow test is a widely used method for evaluating the goodness of fit of logistic regression models. But its power is much influenced by the sample size, like other chi-square tests. Paul, Pennell, and Lemeshow (2013) considered using a large number of groups for large data sets to standardize the power. But simulations show that their method performs poorly for some models. In addition, it does not work when the sample size is larger than 25,000. In the present paper, we propose a modified Hosmer–Lemeshow test that is based on estimation and standardization of the distribution parameter of the Hosmer–Lemeshow statistic. We provide a mathematical derivation for obtaining the critical value and power of our test. Through simulations, we can see that our method satisfactorily standardizes the power of the Hosmer–Lemeshow test. It is especially recommendable for enough large data sets, as the power is rather stable. A bank marketing data set is also analyzed for comparison with existing methods.

Original languageEnglish
Pages (from-to)11813-11825
Number of pages13
JournalCommunications in Statistics - Theory and Methods
Volume46
Issue number23
DOIs
Publication statusPublished - 2 Dec 2017

Scopus Subject Areas

  • Statistics and Probability

User-Defined Keywords

  • Hosmer–Lemeshow test
  • large data sets
  • logistic regression
  • test power

Fingerprint

Dive into the research topics of 'A modified Hosmer–Lemeshow test for large data sets'. Together they form a unique fingerprint.

Cite this