Abstract
Blogging has been an emerging media for people to express themselves. However, the presence of spam blogs (also known as splogs) may reduce the value of blogs and blog search engines. Hence, splog detection has recently attracted much attention from research. Most existing works on splog detection identify splogs using their content/link features and target on spam filters protecting blog search engines' index from spam. In this paper, we propose a splog detection framework by monitoring the on-line search results. The novelty of our splog detection is that our detection capitalizes on the results returned by search engines. The proposed method therefore is particularly useful in detecting those splogs that have successfully slipped through the spam filters that are also actively generating spam-posts. More specifically, our method monitors the top-ranked results of a sequence of temporally-ordered queries and detects splogs based on blogs' temporal behavior. The temporal behavior of a blog is maintained in a blog profile. Given blog profiles, splog detecting functions have been proposed and evaluated using real data collected from a popular blog search engine. Our experiments have demonstrated that splogs could be detected with high accuracy. The proposed method can be implemented on top of any existing blog search engine without intrusion to the latter.
Original language | English |
---|---|
Pages (from-to) | 246-262 |
Number of pages | 17 |
Journal | Information Processing and Management |
Volume | 47 |
Issue number | 2 |
DOIs | |
Publication status | Published - Mar 2011 |
Scopus Subject Areas
- Information Systems
- Media Technology
- Computer Science Applications
- Management Science and Operations Research
- Library and Information Sciences
User-Defined Keywords
- Blog profile
- Blog search
- Blog temporal behavior
- Spam blog
- Spam blog detection
- Splog