Skill Reuse as Compression in Agentic RL

Arxiv Preprint (Preprint), 2026

Zhikun Xu

Yu Feng

Jacob Dineen

Taiwei Shi

Jieyu Zhao

Ben Zhou

Project

PDF

Abstract

Large language model agents trained with reinforcement learning can overfit to brittle, task-specific behaviors. This work frames agent generalization through the Minimum Description Length principle: successful trajectories should be compressible into a small set of reusable abstract skills. It introduces ReuseRL, which extracts a shared skill dictionary from successful trajectories and augments the reinforcement learning objective with a segmentation cost that penalizes idiosyncratic behavior. The paper also proves a PAC-Bayes generalization bound for the compression penalty and evaluates ReuseRL across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, showing stronger in-distribution and out-of-distribution success than vanilla GRPO and round-length baselines.

BibTeX

					
@misc{xu2026skillreusecompressionagentic,
  title={Skill Reuse as Compression in Agentic RL},
  author={Zhikun Xu and Yu Feng and Jacob Dineen and Taiwei Shi and Jieyu Zhao and Ben Zhou},
  year={2026},
  eprint={2605.31509},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2605.31509}
}