TY - JOUR
T1 - An adaptive fusion algorithm for spam detection
AU - Xu, Congfu
AU - Su, Baojun
AU - Cheng, Yunbiao
AU - Pan, Weike
AU - Chen, Li
N1 - the Natural Science Foundation of China (grants 60970081 and 61272303), and the National Basic Research Program of China (973 Plan, grant 2010CB327903) for their support.
PY - 2014/7/1
Y1 - 2014/7/1
N2 - Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.
AB - Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.
KW - adaptive fusion
KW - intelligent systems
KW - spam detection
UR - http://www.scopus.com/inward/record.url?scp=84907602109&partnerID=8YFLogxK
U2 - 10.1109/MIS.2013.54
DO - 10.1109/MIS.2013.54
M3 - Journal article
AN - SCOPUS:84907602109
SN - 1541-1672
VL - 29
SP - 2
EP - 8
JO - IEEE Intelligent Systems
JF - IEEE Intelligent Systems
IS - 4
M1 - 6563073
ER -