Self-Improving and Self-Adapting Agents
🎯 Overview: We study test-time scaling (TTS) methods for self-improving and self-adapting agents, advancing a new paradigm of artificial intelligence in which autonomous systems do not merely act, but evolve reliably through learning from experience, refining behavior at test time, and autonomously modifying their own learning mechanisms.
🧠Abstract: Although modern foundation models (FMs) like ChatGPT are extraordinarily capable, they remain largely fragile: small variations in input (aka ‘‘prompts’’), such as subtle phrasing changes or unintended noise, can lead to contradictory or ungrounded reasoning. This limits their broader deployment in high-stakes domains like healthcare, law, and science, where precision, reliability, and interpretability are essential.
In this project, we aim to build the next generation of FM systems that can continually assess and correct their behavior at test time, making them more trustworthy and capable. Our studies span three main aspects:
(1) Understand Fragility of FMs: How can we interpretably understand model behavior through the lens of prompts to improve FMs reliably? (NLPromptEval @ ACL'25, FormatBiasEval @ NAACL'25)
(2) Study TTS Methods: With the above understanding, how can we develop TTS to enhance model effectively and reliably? (Multi-Expert Prompting @ EMNLP'24, adv-ICL @ ACL'24, Chain-of-Opinion @ COLING'25)
(3) Self-Improving Agents: How can we build powerful TTS-powered agentic systems that critique and refine their own behavior, ultimately enabling autonomous self-improvement? (LongGuide (Self-Adapting Agent) @ ACL'25, VISTA (Self-Improving Agent))
Our findings so far have offered a foundation for building foundation model agents that reliably improve through interaction, feedback, and structure-aware test-time methods—without reliance on additional gradient-based training.
🔥 News: We are expanding our studies towards multiple directions. Please review our work above. If you have a strong interest in the topics listed and a solid background in mathematics and programming, and optionally in research, come to join us!
