TY - JOUR
T1 - An adaptive-to-model distributed hybrid test for conditional independence
AU - Li, Shaomin
AU - Zhu, Xuehu
AU - Zhu, Lixing
N1 - Funding information:
This work was supported by National Natural Science Foundation of China (Grant Nos. 12101056 and 12131006), the National Statistical Science Research Project (Grant No. 2022LY040), National Social Science Foundation of China (Grant No. 21BTJ048), the Talent Fund of Beijing Jiaotong University (Grant No. 2023XKRC008) and the University Grants Council of Hong Kong.
Publisher Copyright:
© Science China Press 2026.
PY - 2026/2/10
Y1 - 2026/2/10
N2 - In this paper, we propose a distributed test for conditional independence. To this end, we develop a distributed groupwise least squares estimation for the groupwise central dimension reduction subspace to identify the dimension of the underlying model structure. The test is an adaptive-to-model hybrid of two simple distributed tests. The dimension identification automatically adapts the test to the underlying model so that it is an omnibus test with a tractable limiting null distribution. It can detect the local alternatives distinct from the null hypothesis at a rate as close to 1/N1/\sqrt{N} as possible, where N is the total sample size. This is the fastest possible rate in hypothesis testing. When the total sample size is fixed, the sensitivity of the test to the alternative hypothesis does not decrease as the number of machines increases. Numerical studies suggest that the power of the hybrid remains high when the number of machines increases, while fixing the total sample size and the computational time decreases rapidly. The test is conducted to examine the suitability of proxies for unobserved factors in the study of returns to schooling, as well as to analyze the conditional independence between PM2.5 and other air pollution and meteorological variables for illustration.
AB - In this paper, we propose a distributed test for conditional independence. To this end, we develop a distributed groupwise least squares estimation for the groupwise central dimension reduction subspace to identify the dimension of the underlying model structure. The test is an adaptive-to-model hybrid of two simple distributed tests. The dimension identification automatically adapts the test to the underlying model so that it is an omnibus test with a tractable limiting null distribution. It can detect the local alternatives distinct from the null hypothesis at a rate as close to 1/N1/\sqrt{N} as possible, where N is the total sample size. This is the fastest possible rate in hypothesis testing. When the total sample size is fixed, the sensitivity of the test to the alternative hypothesis does not decrease as the number of machines increases. Numerical studies suggest that the power of the hybrid remains high when the number of machines increases, while fixing the total sample size and the computational time decreases rapidly. The test is conducted to examine the suitability of proxies for unobserved factors in the study of returns to schooling, as well as to analyze the conditional independence between PM2.5 and other air pollution and meteorological variables for illustration.
KW - distributed inference
KW - groupwise sufficient dimension reduction
KW - massive data sets
UR - https://www.scopus.com/pages/publications/105030305123
U2 - 10.1007/s11425-024-2429-8
DO - 10.1007/s11425-024-2429-8
M3 - Journal article
AN - SCOPUS:105030305123
SN - 1674-7283
JO - Science China Mathematics
JF - Science China Mathematics
ER -