The ELCo Dataset: Bridging Emoji and Lexical Composition

A summary of the ELCo project. The ELCo dataset is comprised of 1,655 annotations of 209 EN phrases 45 adjectives and 77 attributes. Our corpus study reveals five structures to compose emoji compositions, and we show metaphorical structures use more diverse emojis. Our new EmoTE task is challenging for all models, but fine-tuning on ELCo helps to learn useful emoji composition skills.

Abstract

Can emojis be composed to convey intricate meanings like English phrases? As a pioneering study, we present the Emoji-Lexical Composition (ELCo) dataset, a new resource that offers parallel annotations of emoji sequences corresponding to English phrases. Our dataset contains 1,655 instances, spanning 209 diverse concepts from tangible ones like {}right man{''} (✔️👨) to abstract ones such as {}full attention{’’} (🧐✍️, illustrating a metaphoric composition of a focusing face and writing hand). ELCo enables the analysis of the patterns shared between emoji and lexical composition. Through a corpus study, we discovered that simple strategies like direct representation and reduplication are sufficient for conveying certain concepts, but a richer, metaphorical strategy is essential for expressing more abstract ideas. We further introduce an evaluative task, Emoji-based Textual Entailment (EmoTE), to assess the proficiency of NLP models in comprehending emoji compositions. Our findings reveals the challenge of understanding emoji composition in a zero-shot setting for current models, including ChatGPT. Our analysis indicates that the intricacy of metaphorical compositions contributes to this challenge. Encouragingly, models show marked improvement when fine-tuned on the ELCo dataset, with larger models excelling in deciphering nuanced metaphorical compositions.

Publication
In In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Zi Yun Yang
FYP Alumnus (Aug ‘21)

FYP student

Yisong Miao
Yisong Miao
Doctoral Student (Jan ‘21)

PhD Candidate January 2021 Intake