PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

Anonymous Authors

Paper Video (coming soon)

TL;DR: Synthesizing sparse HOI motions and refining them into physically feasible dexterous robot actions via downstream task space residual RL.

Abstract

Learning dexterous manipulation from human-object interaction (HOI) data is a scalable alternative to teleoperation, but HOI demonstrations are sparse and provide only kinematic motion that is not directly executable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a framework with three coupled components: HOI-MMFP, a task and object initial state-conditioned motion manifold primitives that synthesizes coordinated hand-object trajectories from sparse HOI demonstrations; a task-space residual RL policy that physically grounds the synthesized reference while inheriting its kinematic structure; and a contact-and-dynamics adaptation module that enables sim-to-real transfer from proprioceptive history. Across five dexterous manipulation tasks spanning pick-and-place, tool use, and object reorientation, our task-space residual policy outperforms prior action-representation baselines in simulations and transfers to a real robot on all five tasks, improving over kinematic retargeting by 50–70 percentage points.

HOI-MMFP

HOI-MMFP is a task and object initial state-conditioned motion manifold primitives that synthesizes coordinated hand-object trajectories from sparse HOI demonstrations.

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

Synthesizing trajectories directly in trajectory space produces jittery and jerky motions, while our manifold-based synthesis yields smooth, physically plausible HOI trajectories.

Task Space Residual Action Learning

Our task-space residual RL policy physically grounds the synthesized reference motion while inheriting its kinematic structure, enabling robust execution under embodiment mismatch and contact-rich dynamics.