Taiwei Shi

Skill Reuse as Compression in Agentic RL

Arxiv Preprint (Preprint), 2026

Abstract

Large language model agents trained with reinforcement learning can overfit to brittle, task-specific behaviors. This work frames agent generalization through the Minimum Description Length principle: successful trajectories should be compressible into a small set of reusable abstract skills. It introduces ReuseRL, which extracts a shared skill dictionary from successful trajectories and augments the reinforcement learning objective with a segmentation cost that penalizes idiosyncratic behavior. The paper also proves a PAC-Bayes generalization bound for the compression penalty and evaluates ReuseRL across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, showing stronger in-distribution and out-of-distribution success than vanilla GRPO and round-length baselines.

BibTeX

			
@misc{xu2026skillreusecompressionagentic,
  title={Skill Reuse as Compression in Agentic RL},
  author={Zhikun Xu and Yu Feng and Jacob Dineen and Taiwei Shi and Jieyu Zhao and Ben Zhou},
  year={2026},
  eprint={2605.31509},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2605.31509}
}