Re-tweeting from a Linguistic Perspective

We release the manual annotated corpus and the tools that described in this paper:

  • Aobo Wang, Tao Chen, Min-Yen Kan. 2012. Re-tweeting from a linguistic perspective. In Proceedings of the Workshop on Languages in Social Media (LSM '12). NAACL-HLT, Montreal. [.pdf] [slides]

Tweet Corpus

This corpus consists of 860 tweets (520 retweets, and 340 non-retweets) that were annotated with Level-2 category by workers on Amazon's Mechanical Turk.

In the corpus, each instance contains a tweet id and categories labeled by three different workers.

This is the scheme that we used in the annotation.

Category Level-1 Level-2
a Opinion Abstract
c Opinion Concrete
j Opinion Joke
m Update Myself
o Update Someone else
i Interaction  
f Fact  
d Deals  
n News  
x Others  



Classifier

LDA Classifier Demo

Enter your tweets here (tweets are seperated by single empty line):