TY - JOUR
T1 - A modified Hosmer–Lemeshow test for large data sets
AU - Yu, Wei
AU - Xu, Wangli
AU - ZHU, Lixing
N1 - Funding Information:
Theresearchdescribed herewith wassupported by agrant from the UniversityGrantsCouncil of Hong Kong and a grant by the National Natural Science Foundation of China (No: 11471335).
PY - 2017/12/2
Y1 - 2017/12/2
N2 - The Hosmer–Lemeshow test is a widely used method for evaluating the goodness of fit of logistic regression models. But its power is much influenced by the sample size, like other chi-square tests. Paul, Pennell, and Lemeshow (2013) considered using a large number of groups for large data sets to standardize the power. But simulations show that their method performs poorly for some models. In addition, it does not work when the sample size is larger than 25,000. In the present paper, we propose a modified Hosmer–Lemeshow test that is based on estimation and standardization of the distribution parameter of the Hosmer–Lemeshow statistic. We provide a mathematical derivation for obtaining the critical value and power of our test. Through simulations, we can see that our method satisfactorily standardizes the power of the Hosmer–Lemeshow test. It is especially recommendable for enough large data sets, as the power is rather stable. A bank marketing data set is also analyzed for comparison with existing methods.
AB - The Hosmer–Lemeshow test is a widely used method for evaluating the goodness of fit of logistic regression models. But its power is much influenced by the sample size, like other chi-square tests. Paul, Pennell, and Lemeshow (2013) considered using a large number of groups for large data sets to standardize the power. But simulations show that their method performs poorly for some models. In addition, it does not work when the sample size is larger than 25,000. In the present paper, we propose a modified Hosmer–Lemeshow test that is based on estimation and standardization of the distribution parameter of the Hosmer–Lemeshow statistic. We provide a mathematical derivation for obtaining the critical value and power of our test. Through simulations, we can see that our method satisfactorily standardizes the power of the Hosmer–Lemeshow test. It is especially recommendable for enough large data sets, as the power is rather stable. A bank marketing data set is also analyzed for comparison with existing methods.
KW - Hosmer–Lemeshow test
KW - large data sets
KW - logistic regression
KW - test power
UR - http://www.scopus.com/inward/record.url?scp=85028576307&partnerID=8YFLogxK
U2 - 10.1080/03610926.2017.1285922
DO - 10.1080/03610926.2017.1285922
M3 - Journal article
AN - SCOPUS:85028576307
SN - 0361-0926
VL - 46
SP - 11813
EP - 11825
JO - Communications in Statistics - Theory and Methods
JF - Communications in Statistics - Theory and Methods
IS - 23
ER -