TY - JOUR
T1 - Computing exact permutation p-values for association rules
AU - Wu, Jun
AU - He, Zengyou
AU - Gu, Feiyang
AU - Liu, Xiaoqing
AU - Zhou, Jianyu
AU - Yang, Can
N1 - Funding Information:
This work is partially supported by the Natural Science Foundation of China under grant no. 61572094 and the Fundamental Research Funds for the Central Universities of China (DUT14QY07). Can Yang was supported in part by Grant No. 61501389 from Natural Science Foundation of China , Grant HKBU_22302815 from Hong Kong Research Grant Council, and Grant FRG2/14-15/069 from Hong Kong Baptist University .
PY - 2016/6/10
Y1 - 2016/6/10
N2 - Association rule mining is an important task in the field of data mining, and many efficient algorithms have been proposed to address this problem. However, a large portion of the rules reported by these algorithms just satisfy the user-defined constraints purely by accident, and those that are not statistically meaningful should be filtered out through statistical significance testing. In the context of association rule discovery, the permutation-based approach can achieve better performance than other competitive methods, although several drawbacks of this effective approach narrow its usability. In this paper, we provide an analysis of these disadvantages and propose an algorithm called Exact Permutation p-values for Association Rules (EPAR) to calculate the exact p-values of all tested rules. Experiments on different types of data sets demonstrate that EPAR can successfully alleviate the disadvantages and outperform the direct permutation-based method over several performance measures.
AB - Association rule mining is an important task in the field of data mining, and many efficient algorithms have been proposed to address this problem. However, a large portion of the rules reported by these algorithms just satisfy the user-defined constraints purely by accident, and those that are not statistically meaningful should be filtered out through statistical significance testing. In the context of association rule discovery, the permutation-based approach can achieve better performance than other competitive methods, although several drawbacks of this effective approach narrow its usability. In this paper, we provide an analysis of these disadvantages and propose an algorithm called Exact Permutation p-values for Association Rules (EPAR) to calculate the exact p-values of all tested rules. Experiments on different types of data sets demonstrate that EPAR can successfully alleviate the disadvantages and outperform the direct permutation-based method over several performance measures.
KW - Association rule mining
KW - Exact permutation p-value
KW - Permutation testing
KW - Statistical significance testing
UR - http://www.scopus.com/inward/record.url?scp=84964408670&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.01.094
DO - 10.1016/j.ins.2016.01.094
M3 - Journal article
AN - SCOPUS:84964408670
SN - 0020-0255
VL - 346-347
SP - 146
EP - 162
JO - Information Sciences
JF - Information Sciences
ER -