
DATATALES was developed to address the limitations of existing benchmarks in evaluating language models’ proficiency in data narration. Unlike traditional data-to-text tasks that focus on basic information transformation, DATATALES captures the complex analytical reasoning required to transform tabular financial data into coherent, insightful narratives.
The benchmark consists of 4.9k financial market reports paired with comprehensive market data, sourced from diverse professional outlets. These reports demonstrate real-world data narration challenges, requiring models to not just describe data but analyze trends, explore causality, and make predictions using specialized financial terminology. Extensive evaluation of state-of-the-art language models on DATATALES reveals significant challenges in achieving the necessary precision and analytical depth required for effective data narration.
DATATALES was presented at EMNLP 2024, and the dataset is publicly available for further research. Researchers interested in data narration, complex reasoning over tabular data, and financial text generation are encouraged to explore DATATALES and contribute to advancing model capabilities in transforming complex data into accessible narratives.