Re-tweeting from a Linguistic Perspective
We release the manual annotated corpus and the tools that
described in this paper:
- Aobo Wang, Tao Chen, Min-Yen Kan. 2012. Re-tweeting from a linguistic perspective. In Proceedings of the Workshop on Languages in Social Media (LSM '12). NAACL-HLT, Montreal. [.pdf] [slides]
Tweet Corpus
This corpus consists of 860 tweets (520 retweets, and 340 non-retweets) that were annotated with Level-2 category by workers on Amazon's Mechanical Turk.
In the corpus, each instance contains a tweet id and categories labeled by three different workers.
This is the scheme that we used in the annotation.
| Category | Level-1 | Level-2 |
|---|---|---|
| a | Opinion | Abstract |
| c | Opinion | Concrete |
| j | Opinion | Joke |
| m | Update | Myself |
| o | Update | Someone else |
| i | Interaction | |
| f | Fact | |
| d | Deals | |
| n | News | |
| x | Others |
Classifier
LDA Classifier Demo
Enter your tweets here (tweets are seperated by single empty line):