Computational models for mining online drug reviews

  • Chao Tang

Student thesis: Master's Thesis


Healthcare social media is emerging in recent years with increasing attention on people’s health. Online review websites are not only diversi.ed with medicine, hospitals, or doctors but abundant in amount. To discover knowledge from these online reviews, several computational models are proposed. Online healthcare review websites are facing challenges in con.ict of interests among various healthcare stakeholders. To avoid legal complaints and better sustain under such circumstance, we propose a decoupling approach for designing healthcare review websites. Objective components such as medical condition and treatment are remained as the primary parts, as they are generic, impersonal and directly related to patients themselves. Subjective components, however, such as comments to doctors or hospitals are decoupled as secondary parts for sensitive and controversial informa­tion and are optional to reviewers. Our proposed approach shows better .exibility in managing of contents in different levels of details and ability of balancing the right of expression of reviewers with other stakeholders. To identity the patient-reported adverse reactions in drug reviews, we propose a consumer-oriented coding scheme using wordnet synonym and derivational related form. Signi.cant discrepancy of incidences of adverse reactions is discovered be­tween online reviews and clinical trials. We proposed an adverse reaction report ratio model for integrated interpretation of adverse reactions reported in online re­views versus those from clinical trial. Our estimation on average adverse reactions shows high correlation with drug acceptability score obtained from a large-scale meta-analysis. To investigate the impact of key adverse reactions in patients’ perspective, we propose a topic model named Fisher’s Linear Discriminant Analysis Projected Non­negative Matrix Factorization (FLDA-projected-NMF) for discovering discrimina­tive features and topics with additional class information. With satisfaction scores provided in the reviews, discriminative features and topics on satisfaction are dis­covered and polarities of adverse reactions are estimated based on the discriminative feature weights. Discriminative features and topics on medication duration and on age group are obtained as well. Our method outperforms other supervised methods in evaluation of topic sentiment score and topic interpretation measured by entropy. Patient-reported adverse reaction terms are mined from reviews with comment class label. Some new adverse reactions in depression drug and statin drug are also dis­covered. To further study patients’ behaviors, we use structural equation modeling for studying the relationship of factors in patients’ treatment experience with patients’ quality of life. In covariance model, most adverse reactions are found of small co­variance except nausea, headache and dizziness. In measurement model, coef.cients of individual adverse reactions on latent adverse reaction are correlated to the inci­dence of adverse reactions. In structural model, we model the relationship of latent adverse reaction, rating score, positive sentiment and negative sentiment. Compari­son between the measurement models of rating scores of depression drug and statin drug shows that there could be latent factors to account for the variances of latent rating, which shows correlations with the severity of adverse reactions.
Date of Award16 Aug 2014
Original languageEnglish
SupervisorKwok Wai CHEUNG (Supervisor) & Chun Hung Li (Supervisor)

User-Defined Keywords

  • Computer simulation
  • Drugs
  • Testing

Cite this