TY - JOUR
T1 - A novel machine learning framework for comparison of viral COVID-19-related sina weibo and twitter posts
T2 - Workflow development and content analysis
AU - Chen, Shi
AU - Zhou, Lina
AU - SONG, Celine
AU - Xu, Qian
AU - Wang, Ping
AU - Wang, Kanlun
AU - Ge, Yaorong
AU - Janies, Daniel
N1 - Funding Information:
SC is supported by the Models of Infectious Disease Agent Study (MIDAS) COVID-19 urgent supplementary grant (MIDASUP2020-5). YS is supported by the Interdisciplinary Research Clusters Matching Scheme (IRCMS/19-20/D04) and the AI and Media Research Lab at Hong Kong Baptist University (SDF17-1013-P01). The authors are grateful for the help from Ms Mengyu Li and Mr Minghao Wang from Hong Kong Baptist University and Mr Evan Lai from St. Mark’s School of Texas for their participation in and contributions to this project. The authors are also grateful for the technical assistance from the School of Data Science, UNC Charlotte, in providing the social media data.
Funding Information:
SC is supported by the Models of Infectious Disease Agent Study (MIDAS) COVID-19 urgent supplementary grant (MIDASUP2020-5). YS is supported by the Interdisciplinary Research Clusters Matching Scheme (IRCMS/19-20/D04) and the AI and Media Research Lab at Hong Kong Baptist University (SDF17-1013-P01). The authors are grateful for the help from Ms Mengyu Li and Mr Minghao Wang from Hong Kong Baptist University and Mr Evan Lai from St. Mark's School of Texas for their participation in and contributions to this project. The authors are also grateful for the technical assistance from the School of Data Science, UNC Charlotte, in providing the social media data.
PY - 2021/1/6
Y1 - 2021/1/6
N2 - Background: Social media plays a critical role in health communications, especially during global health emergencies such as the current COVID-19 pandemic. However, there is a lack of a universal analytical framework to extract, quantify, and compare content features in public discourse of emerging health issues on different social media platforms across a broad sociocultural spectrum. Objective: We aimed to develop a novel and universal content feature extraction and analytical framework and contrast how content features differ with sociocultural background in discussions of the emerging COVID-19 global health crisis on major social media platforms. Methods: We sampled the 1000 most shared viral Twitter and Sina Weibo posts regarding COVID-19, developed a comprehensive coding scheme to identify 77 potential features across six major categories (eg, clinical and epidemiological, countermeasures, politics and policy, responses), quantified feature values (0 or 1, indicating whether or not the content feature is mentioned in the post) in each viral post across social media platforms, and performed subsequent comparative analyses. Machine learning dimension reduction and clustering analysis were then applied to harness the power of social media data and provide more unbiased characterization of web-based health communications. Results: There were substantially different distributions, prevalence, and associations of content features in public discourse about the COVID-19 pandemic on the two social media platforms. Weibo users were more likely to focus on the disease itself and health aspects, while Twitter users engaged more about policy, politics, and other societal issues. Conclusions: We extracted a rich set of content features from social media data to accurately characterize public discourse related to COVID-19 in different sociocultural backgrounds. In addition, this universal framework can be adopted to analyze social media discussions of other emerging health issues beyond the COVID-19 pandemic.
AB - Background: Social media plays a critical role in health communications, especially during global health emergencies such as the current COVID-19 pandemic. However, there is a lack of a universal analytical framework to extract, quantify, and compare content features in public discourse of emerging health issues on different social media platforms across a broad sociocultural spectrum. Objective: We aimed to develop a novel and universal content feature extraction and analytical framework and contrast how content features differ with sociocultural background in discussions of the emerging COVID-19 global health crisis on major social media platforms. Methods: We sampled the 1000 most shared viral Twitter and Sina Weibo posts regarding COVID-19, developed a comprehensive coding scheme to identify 77 potential features across six major categories (eg, clinical and epidemiological, countermeasures, politics and policy, responses), quantified feature values (0 or 1, indicating whether or not the content feature is mentioned in the post) in each viral post across social media platforms, and performed subsequent comparative analyses. Machine learning dimension reduction and clustering analysis were then applied to harness the power of social media data and provide more unbiased characterization of web-based health communications. Results: There were substantially different distributions, prevalence, and associations of content features in public discourse about the COVID-19 pandemic on the two social media platforms. Weibo users were more likely to focus on the disease itself and health aspects, while Twitter users engaged more about policy, politics, and other societal issues. Conclusions: We extracted a rich set of content features from social media data to accurately characterize public discourse related to COVID-19 in different sociocultural backgrounds. In addition, this universal framework can be adopted to analyze social media discussions of other emerging health issues beyond the COVID-19 pandemic.
KW - Communication
KW - Content analysis
KW - Content feature extraction
KW - COVID-19
KW - Cross-cultural comparison
KW - Framework
KW - Infodemiology
KW - Infoveillance
KW - Machine learning
KW - Sina Weibo
KW - Social media
KW - Twitter
KW - Workflow
UR - http://www.scopus.com/inward/record.url?scp=85099375156&partnerID=8YFLogxK
U2 - 10.2196/24889
DO - 10.2196/24889
M3 - Journal article
C2 - 33326408
AN - SCOPUS:85099375156
SN - 1439-4456
VL - 23
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
IS - 1
M1 - e24889
ER -