LenC: A redundancy-aware length control framework for extractive summarization

Shuxin Li, Weifeng Su, Jiming Liu

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

Abstract

While extractive summarization is an important approach of the NLP text summarization task, redundancy in the generated extractive summary is always a problem. Previous works usually set the length of the output summary to a fixed number, which might only be appropriate for some of the documents while too long for others. At the same time, though extractive summarization possesses high readability as it directly selects sentences from the document, the unimportant parts within sentences are also selected. These two scenarios result in redundancy in the extractive summaries. To solve this problem, we propose a length control framework for extractive summarization, named LenC, in a two-stage pipeline. We first use a pretrained BERT-based summarizer to select smaller units (i.e. EDUs) than original sentences to abandon the insignificant parts of a sentence. Then a portable length controller is implemented to prune the output summary to an appropriate length, and it can be attached to any extractive summarizer. Experiments show that the proposed model outperforms the state-of-the-art baseline models and successfully reduces the redundancy in the extractive summaries.

Original languageEnglish
Title of host publication2021 4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-7
Number of pages7
ISBN (Electronic)9781665413220
ISBN (Print)9781665413237
DOIs
Publication statusPublished - 20 Aug 2021
Event4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021 - Virtual, Yibin, China
Duration: 20 Aug 202122 Aug 2021

Publication series

NameProceedings of International Conference on Pattern Recognition and Artificial Intelligence

Conference

Conference4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021
Country/TerritoryChina
CityVirtual, Yibin
Period20/08/2122/08/21

Scopus Subject Areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

User-Defined Keywords

  • Redundant information
  • Single document summarization

Fingerprint

Dive into the research topics of 'LenC: A redundancy-aware length control framework for extractive summarization'. Together they form a unique fingerprint.

Cite this