Image tweets are becoming a prevalent form of social
media, but little is known about their content – textual
and visual – and the relationship between the two mediums.
Our analysis of image tweets shows that while visual
elements certainly play a large role in image-text
relationships, other factors such as emotional elements,
also factor into the relationship. We develop Visual-
Emotional LDA (VELDA), a novel topic model to capture
the image-text correlation from multiple perspectives
(namely, visual and emotional).
Experiments on real-world image tweets in both English
and Chinese and other user generated content,
show that VELDA significantly outperforms existing
methods on cross-modality image retrieval. Even in
other domains where emotion does not factor in image
choice directly, our VELDA model demonstrates good
generalization ability, achieving higher fidelity modeling
of such multimedia documents.