Taiwei Shi

Everything Else

Archive

The one-stop shop, including all posts from the Blog and Projects.

2026

Skill Reuse as Compression in Agentic RL papers
Machine Consciousness blog
Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks papers
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents papers
Video-Based Reward Modeling for Computer-Use Agents papers
DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning papers
Experiential Reinforcement Learning papers
One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence papers

2025

STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models papers
The Hallucination Tax of Reinforcement Finetuning papers
CoAct-1: Computer-using Agents with Coding as Actions papers
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning papers
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base papers
On the Trustworthiness of Generative Foundation Models papers
Detecting and Filtering Unsafe Training Data via Data Attribution papers

2024

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback papers
How Susceptible are Large Language Models to Ideological Manipulation? papers

2023

Can Language Model Moderators Improve the Health of Online Discourse? papers
Safer-Instruct: Aligning Language Models with Automated Preference Data papers
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation papers
Positive Reframing Keyboard projects
Neural Story Planning papers
Investigating AAVE in Question Answering Systems papers

© 2026 Taiwei Shi
Design from Fred Hohman

Taiwei Shi is a final-year Ph.D. student at USC, interested in natural language processing and reinforcement learning.