SkyLadder: Better and Faster Pretraining via Context Window Scheduling

SkyLadder schedules pretraining from shorter to longer context windows.

Abstract

SkyLadder proposes a short-to-long context window scheduling strategy for LLM pretraining. Controlled experiments show that the schedule preserves standard benchmark performance, improves long-context ability, and speeds up training compared with fixed long-context baselines.

Publication
Advances in Neural Information Processing Systems 38 (NeurIPS 2025)
Tongyao Zhu
Tongyao Zhu
IPP Doctoral Student (Jan ‘23; SEA)

PhD Candidate January 2023 Intake

Min-Yen Kan
Min-Yen Kan
Associate Professor

WING lead; interests include Digital Libraries, Information Retrieval and Natural Language Processing.