TY - GEN
T1 - LenC
T2 - 4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021
AU - Li, Shuxin
AU - Su, Weifeng
AU - Liu, Jiming
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported by the BNU-HKBU United International College research grant.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/8/20
Y1 - 2021/8/20
N2 - While extractive summarization is an important approach of the NLP text summarization task, redundancy in the generated extractive summary is always a problem. Previous works usually set the length of the output summary to a fixed number, which might only be appropriate for some of the documents while too long for others. At the same time, though extractive summarization possesses high readability as it directly selects sentences from the document, the unimportant parts within sentences are also selected. These two scenarios result in redundancy in the extractive summaries. To solve this problem, we propose a length control framework for extractive summarization, named LenC, in a two-stage pipeline. We first use a pretrained BERT-based summarizer to select smaller units (i.e. EDUs) than original sentences to abandon the insignificant parts of a sentence. Then a portable length controller is implemented to prune the output summary to an appropriate length, and it can be attached to any extractive summarizer. Experiments show that the proposed model outperforms the state-of-the-art baseline models and successfully reduces the redundancy in the extractive summaries.
AB - While extractive summarization is an important approach of the NLP text summarization task, redundancy in the generated extractive summary is always a problem. Previous works usually set the length of the output summary to a fixed number, which might only be appropriate for some of the documents while too long for others. At the same time, though extractive summarization possesses high readability as it directly selects sentences from the document, the unimportant parts within sentences are also selected. These two scenarios result in redundancy in the extractive summaries. To solve this problem, we propose a length control framework for extractive summarization, named LenC, in a two-stage pipeline. We first use a pretrained BERT-based summarizer to select smaller units (i.e. EDUs) than original sentences to abandon the insignificant parts of a sentence. Then a portable length controller is implemented to prune the output summary to an appropriate length, and it can be attached to any extractive summarizer. Experiments show that the proposed model outperforms the state-of-the-art baseline models and successfully reduces the redundancy in the extractive summaries.
KW - Redundant information
KW - Single document summarization
UR - http://www.scopus.com/inward/record.url?scp=85117897473&partnerID=8YFLogxK
U2 - 10.1109/PRAI53619.2021.9550801
DO - 10.1109/PRAI53619.2021.9550801
M3 - Conference proceeding
AN - SCOPUS:85117897473
SN - 9781665413237
T3 - Proceedings of International Conference on Pattern Recognition and Artificial Intelligence
SP - 1
EP - 7
BT - 2021 4th International Conference on Pattern Recognition and Artificial Intelligence, PRAI 2021
PB - IEEE
Y2 - 20 August 2021 through 22 August 2021
ER -