Abstract
Large language model agents trained with reinforcement learning can overfit to brittle, task-specific behaviors. This work frames agent generalization through the Minimum Description Length principle: successful trajectories should be compressible into a small set of reusable abstract skills. It introduces ReuseRL, which extracts a shared skill dictionary from successful trajectories and augments the reinforcement learning objective with a segmentation cost that penalizes idiosyncratic behavior. The paper also proves a PAC-Bayes generalization bound for the compression penalty and evaluates ReuseRL across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, showing stronger in-distribution and out-of-distribution success than vanilla GRPO and round-length baselines.
BibTeX
@misc{xu2026skillreusecompressionagentic,
title={Skill Reuse as Compression in Agentic RL},
author={Zhikun Xu and Yu Feng and Jacob Dineen and Taiwei Shi and Jieyu Zhao and Ben Zhou},
year={2026},
eprint={2605.31509},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.31509}
}