Opinosis Dataset - Topic related review sentences
Dataset Type: Text
Format: Topic oriented opinion sentences for 51 different topics
Domain: hotels, cars, products
How to cite dataset: [ bib ]
Citing Dataset [ bib ]
If you use this dataset for your own research please cite the following to mark the dataset:
"Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions", Proceedings of the 23rd International Conference on Computational Linguistics (COLING '10), 2010.
@inproceedings{ganesan2010opinosis, title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions}, author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei}, booktitle={Proceedings of the 23rd International Conference on Computational Linguistics}, pages={340--348}, year={2010}, organization={Association for Computational Linguistics} }
Description:
This dataset contains sentences extracted from user reviews on a given topic. Example topics are “performance of Toyota Camry” and “sound quality of ipod nano”, etc. In total there are 51 such topics with each topic having approximately 100 sentences (on average). The reviews were obtained from various sources - Tripadvisor (hotels), Edmunds.com (cars) and Amazon.com (various electronics). This dataset was used for the following automatic text summarization project [1] .
The dataset file also comes with gold standard summaries used for the summarization paper listed below. I have also provided some scripts to help with the summarization/evaluation tasks using ROUGE. Detailed information about the dataset and the list of scripts is provided in the documentation.
Please send me an email if you have any questions regarding this dataset.
Downloads:
- Opinosis Dataset - contains review / opinion sentences, gold standard summaries, scripts for ROUGE, documentation
- Opinosis Dataset Documentation
References
- 11701 reads