KeYric: Unsupervised Keywords Extraction and Expansion from Music for Coherent Lyrics Generation

Figure from Ma et al. (2025).

Abstract

We address the challenge of enhancing coherence in generated lyrics from symbolic music, particularly for creating singing-based language learning materials. Coherence, defined as the quality of being logical and consistent, forming a unified whole, is crucial for lyrics at multiple levels–word, sentence, and full-text. Additionally, it involves lyrics’ musicality–matching of style and sentiment of the music. To tackle this, we introduce KeYric, a novel system that leverages keyword skeletons to strengthen both coherence and musicality in lyrics generation. KeYric employs an innovative approach with an unsupervised keyword skeleton extractor and a graph-based skeleton expander, designed to produce a style-appropriate keyword skeleton from input music. This framework integrates the skeleton with the input music via a three-layer coherence mechanism, significantly enhancing lyric coherence by 5% in objective evaluations. Subjective assessments confirm that KeYric-generated lyrics are perceived as 19% more coherent and suitable for language learning through singing compared to existing models. Our analyses indicate that integrating genre-relevant elements, such as pitch, into music encoding is crucial, as musical genres significantly affect lyric coherence.

Publication
ACM Trans. Multim. Comput. Commun. Appl.
Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.

Ye Wang
Ye Wang
Research Collaborator

Tenured Associate Professor in the School of Computing, National University of Singapore