Exploiting Category-Specific Information for Multi-Document Summarization
Research Area: Natural Language Processing Year: 2012
Type of Publication: In Proceedings Keywords: text summarization, csi, guided summarization, tac
Authors:
  • Jun-Ping Ng
  • Praveen Bysani
  • Ziheng Lin
  • Min-Yen Kan
  • Chew-Lin Tan
 
   
Abstract:
We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in multi-document summarization tasks, and can be encapsulated by the concept of category-specific importance (CSI). Our experiments show that CSI is a valuable metric to aid sentence selection in extractive summarization tasks. We operationalize the computation CSI of sentences through the introduction of two new features that can be computed without needing any external knowledge. We also generalize this approach, showing that when manually-curated document-to-category mappings are un- available, performing automatic categorization of document sets also improves summarization performance. We have incorporated these features into a simple, freely available, open-source extractive summarization system, called SWING. In the recent TAC-2011 guided summariza- tion task, SWING outperformed all other participant summarization systems as measured by automated ROUGE measures.
Digital version