With the advent of Emojis in online text messages, large efforts have been made to build Emoji Mashup System (EMS). EMS takes two separate emojis as input and generates a new one that combines the two. Existing EMSs only leverage on the visual information and forgo rich semantic information in the text. In this work, we present the first effort in bridging lexical and emoji understanding. Emoji has become a universal means of communication in social media and workspace. Emojis are neither a rigorous logical symbol nor free form language. With the advent of emojis, there should be more effort to help the NLP system to handle emojis.
We study a novel problem of representing a concept by composing a sequence of Emojis. It
is a challenging problem because the Emoji compositions should uncover implicit and nonliteral meaning in the concept. We first overcome the data scarcity by customizing Unicode
ZWJ dataset and creating our own ELCo dataset (1663 annotations for 210 Adjective Noun
Compounds). We then benchmark this task under the generation setting and find it challenging
for a state-of-the-art system. Hence we re-formalize it under a simplistic ranking setting in
order to evaluate the intrinsic property of our task. We have made the following discoveries: (1)
Pretrained Language Model (PLM) is good at distinguishing ground truth against irrelevant,
but weak at distinguishing ground truth against plausible samples; (2) PLM can be optimized
on our task by training on our ELCo-AN dataset; (3) PLM is able to identify patterns in emoji
compositionality consistently such as repetition while it is not as sensitive to emoji ordering.
- Group Presentation Update on 9 Sept 2021: link
- FYP Interim Presentation on 17 Nov 2021: link
- FYP Final Presentation on 18 Apr 2022: link