Research
HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots
HOVER is a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this 'subconsciousness' in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation.
Tairan He*, Wenli Xiao*, Toru Lin, Zhengyi Luo, Zhengjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang, Linxi 'Jim' Fan†, Yuke Zhuâ€
CaP-X: Benchmarking and Improving Coding Agents for Robot Manipulation
CaP-X introduces CaP-Agent0, a training-free agentic framework enabling off-the-shelf LLMs to perform robotic manipulation via code generation, and CaP-Bench, a comprehensive evaluation suite of 100+ tasks. Frontier models achieve 30%+ zero-shot success on unseen tasks with 18% on perturbed tasks (vs. 0% for VLA models), and a 7B model improves from 20% to 72% in sim with 84% real-world transfer.
Letian Fu*, Justin Yu*, Karim El-Refai*, Ethan Kou, Haoru Xue, Huang Huang, Wenli Xiao, Guanzhi Wang, Fei-Fei Li, Guanya Shi, Jiajun Wu, Shankar Sastry, Yuke Zhu, Ken Goldberg, Linxi "Jim" Fan
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
PLD (Probe, Learn, Distill) is a plug-and-play recipe for Vision-Language-Action (VLA) post-training. It is model agnostic, supporting both autoregressive and diffusion architectures, and can push success rates to 99%.
Wenli Xiao*, Haotian Lin*, Andy Peng, Haoru Xue, Tairan He, Yuqi Xie, Fengyuan Hu, Jimmy Wu, Zhengyi Luo, Linxi "Jim" Fan†, Guanya Shi, Yuke Zhuâ€
swipe to browse




