KeYric: Unsupervised Keywords Extraction and Expansion from Music for Coherent Lyrics Generation

Xichu Ma, Varun Sharma, Min-Yen Kan, Wee Sun Lee, Ye Wang

January 2025

Figure from Ma et al. (2025).

Abstract

We address the challenge of enhancing coherence in generated lyrics from symbolic music, particularly for creating singing-based language learning materials. Coherence, defined as the quality of being logical and consistent, forming a unified whole, is crucial for lyrics at multiple levels–word, sentence, and full-text. Additionally, it involves lyrics’ musicality–matching of style and sentiment of the music. To tackle this, we introduce KeYric, a novel system that leverages keyword skeletons to strengthen both coherence and musicality in lyrics generation. KeYric employs an innovative approach with an unsupervised keyword skeleton extractor and a graph-based skeleton expander, designed to produce a style-appropriate keyword skeleton from input music. This framework integrates the skeleton with the input music via a three-layer coherence mechanism, significantly enhancing lyric coherence by 5% in objective evaluations. Subjective assessments confirm that KeYric-generated lyrics are perceived as 19% more coherent and suitable for language learning through singing compared to existing models. Our analyses indicate that integrating genre-relevant elements, such as pitch, into music encoding is crucial, as musical genres significantly affect lyric coherence.

Type

Journal article

Publication

ACM Trans. Multim. Comput. Commun. Appl.

KeYric: Unsupervised Keywords Extraction and Expansion from Music for Coherent Lyrics Generation

Abstract

Min-Yen Kan

Associate Professor

Ye Wang

Research Collaborator