Wenli Xiao

Research

Co-Lead

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

ICRA 2025

HOVER is a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this 'subconsciousness' in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation.

Tairan He*, Wenli Xiao*, Toru Lin, Zhengyi Luo, Zhengjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang, Linxi 'Jim' Fan†, Yuke Zhu†

Core Contributor

CaP-X: Benchmarking and Improving Coding Agents for Robot Manipulation

ArXiv 2026

CaP-X introduces CaP-Agent0, a training-free agentic framework enabling off-the-shelf LLMs to perform robotic manipulation via code generation, and CaP-Bench, a comprehensive evaluation suite of 100+ tasks. Frontier models achieve 30%+ zero-shot success on unseen tasks with 18% on perturbed tasks (vs. 0% for VLA models), and a 7B model improves from 20% to 72% in sim with 84% real-world transfer.

Letian Fu*, Justin Yu*, Karim El-Refai*, Ethan Kou, Haoru Xue, Huang Huang, Wenli Xiao, Guanzhi Wang, Fei-Fei Li, Guanya Shi, Jiajun Wu, Shankar Sastry, Yuke Zhu, Ken Goldberg, Linxi "Jim" Fan

Co-Lead1 / 4

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

ICLR 2026

PLD (Probe, Learn, Distill) is a plug-and-play recipe for Vision-Language-Action (VLA) post-training. It is model agnostic, supporting both autoregressive and diffusion architectures, and can push success rates to 99%.

Wenli Xiao*, Haotian Lin*, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan†, Guanya Shi, Yuke Zhu†

swipe to browse