Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge

Ashwani Tanwar, Jingqing Zhang, Julia Ive, Vibhor Gupta, Yike Guo*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingChapterpeer-review

1 Citation (Scopus)

Abstract

Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.

Original languageEnglish
Title of host publicationStudies in Computational Intelligence
EditorsArash Shaban-Nejad, Martin Michalowski, Simone Bianco
PublisherSpringer Cham
Pages11-28
Number of pages18
ISBN (Electronic)9783031147715
ISBN (Print)9783031147708, 9783031147739
DOIs
Publication statusPublished - 28 Nov 2022

Publication series

NameStudies in Computational Intelligence
PublisherSpringer Cham
Volume1060
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Scopus Subject Areas

  • Artificial Intelligence

User-Defined Keywords

  • Contextualized word embeddings
  • Deep learning
  • Natural language processing
  • Numerical reasoning
  • Phenotyping
  • Unsupervised learning

Cite this