|
Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons
Anthony Liang*, Yigit Korkmaz*, Jiahui Zhang, Minyoung Hwang, Abrar Anwar, Sidhant Kaushik, Aditya Shah, Alex S. Huang, Luke Zettlemoyer, Dieter Fox, Yu Xiang, Anqi Li, Andreea Bobu, Abhishek Gupta, Stephen Tu, Erdem Biyik, Jesse Zhang
@ RSS 2026
[Arxiv]
[Website]
[Paper]
[Code]
Robometer is a general-purpose video-language dense reward model trained on RBM-1M, a dataset with over one million trajectories spanning 21 robot embodiments. It improves robot learning across online RL, offline RL, model-based RL, failure detection, and data retrieval for imitation learning.
|
|
Cross Domain Imitation Learning via MPC
(Internship Project)
Jiahui Zhang,
Haonan Yu,
Wei Xu
[Website]
We introduce CDMPC, an approach for learning new skill combinations from long-horizon skill trajectories. CDMPC enables agents to chain skills from diverse source domains and integrates them with a low-level policy in the target domain.
CDMPC learns to chain short-horizon skills from long-horizon trajectories across demonstrations from diverse source domains, including various skill combinations. The policy learned from CDMPC adapts to tasks from any source domain and makes the agent able to tackle new tasks that require novel skill combinations.
|
Served as a reviewer: IROS 2024
Presidential scholarship, Beijing University of Technology. 2018
Outstanding Research Achievement Award, Beijing University of Technology, Fan Gongxiu Honors College. 2017
|