Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV

Chao Yu*, Yinzhao Dong, Jiming LIU, Guoqi Ren

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Background: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. However, existing studies simply apply naive RL algorithms in discovering optimal treatment strategies for a targeted problem. This kind of direct applications ignores the abundant causal relationships between treatment options and the associated outcomes that are inherent in medical domains. Methods: This paper investigates how to integrate causal factors into an RL process in order to facilitate the final learning performance and increase explanations of learned strategies. A causal policy gradient algorithm is proposed and evaluated in dynamic treatment regimes (DTRs) for HIV based on a simulated computational model. Results: Simulations prove the effectiveness of the proposed algorithm for designing more efficient treatment protocols in HIV, and different definitions of the causal factors could have significant influence on the final learning performance, indicating the necessity of human prior knowledge on defining a suitable causal relationships for a given problem. Conclusions: More efficient and robust DTRs for HIV can be derived through incorporation of causal factors between options of anti-HIV drugs and the associated treatment outcomes.

Original languageEnglish
Article number60
JournalBMC Medical Informatics and Decision Making
Volume19
DOIs
Publication statusPublished - 9 Apr 2019

Scopus Subject Areas

  • Health Policy
  • Health Informatics

User-Defined Keywords

  • Causal factors
  • Dynamic treatment regime
  • HIV
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV'. Together they form a unique fingerprint.

Cite this