TY - JOUR
T1 - On Estimation of Functional Causal Models: General Results and Application to the Post-Nonlinear Causal Model
AU - Zhang, Kun
AU - Wang, Zhikun
AU - Zhang, Jiji
AU - Schölkopf, Bernhard
N1 - Funding Information:
The research of J. Zhang was supported in part by the Research Grants Council of Hong Kong under General Research Fund LU342213.
Publisher copyright:
© 2015 ACM
PY - 2015/12
Y1 - 2015/12
N2 - Compared to constraint-based causal discovery, causal discovery based on functional causal models is able to identify the whole causal model under appropriate assumptions [Shimizu et al. 2006; Hoyer et al. 2009; Zhang and Hyvärinen 2009b]. Functional causal models represent the effect as a function of the direct causes together with an independent noise term. Examples include the linear non-Gaussian acyclic model (LiNGAM), nonlinear additive noise model, and post-nonlinear (PNL) model. Currently, there are two ways to estimate the parameters in the models: dependence minimization and maximum likelihood. In this article, we show that for any acyclic functional causal model, minimizing the mutual information between the hypothetical cause and the noise term is equivalent to maximizing the data likelihood with a flexible model for the distribution of the noise term. We then focus on estimation of the PNL causal model and propose to estimate it with the warped Gaussian process with the noise modeled by the mixture of Gaussians. As a Bayesian nonparametric approach, it outperforms the previous one based on mutual information minimization with nonlinear functions represented by multilayer perceptrons; we also show that unlike the ordinary regression, estimation results of the PNL causal model are sensitive to the assumption on the noise distribution. Experimental results on both synthetic and real data support our theoretical claims.
AB - Compared to constraint-based causal discovery, causal discovery based on functional causal models is able to identify the whole causal model under appropriate assumptions [Shimizu et al. 2006; Hoyer et al. 2009; Zhang and Hyvärinen 2009b]. Functional causal models represent the effect as a function of the direct causes together with an independent noise term. Examples include the linear non-Gaussian acyclic model (LiNGAM), nonlinear additive noise model, and post-nonlinear (PNL) model. Currently, there are two ways to estimate the parameters in the models: dependence minimization and maximum likelihood. In this article, we show that for any acyclic functional causal model, minimizing the mutual information between the hypothetical cause and the noise term is equivalent to maximizing the data likelihood with a flexible model for the distribution of the noise term. We then focus on estimation of the PNL causal model and propose to estimate it with the warped Gaussian process with the noise modeled by the mixture of Gaussians. As a Bayesian nonparametric approach, it outperforms the previous one based on mutual information minimization with nonlinear functions represented by multilayer perceptrons; we also show that unlike the ordinary regression, estimation results of the PNL causal model are sensitive to the assumption on the noise distribution. Experimental results on both synthetic and real data support our theoretical claims.
KW - Systems and Information Theory
KW - Learning
KW - Probability and Statistics
KW - Causal discovery
KW - functional causal model
KW - post-nonlinear causal model
KW - statistical independence
KW - maximum likelihood
U2 - 10.1145/2700476
DO - 10.1145/2700476
M3 - Journal article
SN - 2157-6904
VL - 7
SP - 1
EP - 22
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 2
M1 - 13
ER -