TY - GEN
T1 - Clustering facebook for biased context extraction
AU - Franzoni, Valentina
AU - Li, Yuanxi
AU - Mengoni, Paolo
AU - Milani, Alfredo
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017/7/6
Y1 - 2017/7/6
N2 - Facebook comments and shared posts often convey human biases, which play a pivotal role in information spreading and content consumption, where short information can be quickly consumed, and later ruminated. Such bias is nevertheless at the basis of human-generated content, and being able to extract contexts that does not amplify but represent such a bias can be relevant to data mining and artificial intelligence, because it is what shapes the opinion of users through social media. Starting from the observation that a separation in topic clusters, i.e. sub-contexts, spontaneously occur if evaluated by human common sense, especially in particular domains e.g. politics, technology, this work introduces a process for automated context extraction by means of a class of path-based semantic similarity measures which, using third party knowledge e.g. WordNet, Wikipedia, can create a bag of words relating to relevant concepts present in Facebook comments to topic-related posts, thus reflecting the collective knowledge of a community of users. It is thus easy to create human-readable views e.g. word clouds, or structured information to be readable by machines for further learning or content explanation, e.g. augmenting information with time stamps of posts and comments. Experimental evidence, obtained by the domain of information security and technology over a sample of 9M3k page users, where previous comments serve as a use case for forthcoming users, shows that a simple clustering on frequency-based bag of words can identify the main context words contained in Facebook comments identifiable by human common sense. Group similarity measures are also of great interest for many application domains, since they can be used to evaluate similarity of objects in term of the similarity of the associated sets, can then be calculated on the extracted context words to reflect the collective notion of semantic similarity, providing additional insights on which to reason, e.g. in terms of cognitive factors and behavioral patterns.
AB - Facebook comments and shared posts often convey human biases, which play a pivotal role in information spreading and content consumption, where short information can be quickly consumed, and later ruminated. Such bias is nevertheless at the basis of human-generated content, and being able to extract contexts that does not amplify but represent such a bias can be relevant to data mining and artificial intelligence, because it is what shapes the opinion of users through social media. Starting from the observation that a separation in topic clusters, i.e. sub-contexts, spontaneously occur if evaluated by human common sense, especially in particular domains e.g. politics, technology, this work introduces a process for automated context extraction by means of a class of path-based semantic similarity measures which, using third party knowledge e.g. WordNet, Wikipedia, can create a bag of words relating to relevant concepts present in Facebook comments to topic-related posts, thus reflecting the collective knowledge of a community of users. It is thus easy to create human-readable views e.g. word clouds, or structured information to be readable by machines for further learning or content explanation, e.g. augmenting information with time stamps of posts and comments. Experimental evidence, obtained by the domain of information security and technology over a sample of 9M3k page users, where previous comments serve as a use case for forthcoming users, shows that a simple clustering on frequency-based bag of words can identify the main context words contained in Facebook comments identifiable by human common sense. Group similarity measures are also of great interest for many application domains, since they can be used to evaluate similarity of objects in term of the similarity of the associated sets, can then be calculated on the extracted context words to reflect the collective notion of semantic similarity, providing additional insights on which to reason, e.g. in terms of cognitive factors and behavioral patterns.
KW - Artificial intelligence
KW - Collective knowledge
KW - Data mining
KW - Knowledge discovery
KW - Semantic distance
KW - Word similarity
UR - http://www.scopus.com/inward/record.url?scp=85027149767&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-62392-4_52
DO - 10.1007/978-3-319-62392-4_52
M3 - Conference proceeding
AN - SCOPUS:85027149767
SN - 9783319623917
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 717
EP - 729
BT - 17th International Conference on Computational Science and Its Applications (ICCSA 2017)
A2 - Murgante, Beniamino
A2 - Apduhan, Bernady O.
A2 - Borruso, Giuseppe
A2 - Stankova, Elena
A2 - Gervasi, Osvaldo
A2 - Misra, Sanjay
A2 - Taniar, David
A2 - Rocha, Ana Maria A.C.
A2 - Cuzzocrea, Alfredo
A2 - Torre, Carmelo M.
PB - Springer Verlag
T2 - 17th International Conference on Computational Science and Its Applications, ICCSA 2017
Y2 - 3 July 2017 through 6 July 2017
ER -