DexSynRefine iconDexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

Anonymous Authors

TL;DR: Synthesizing sparse HOI motions and refining them into physically feasible dexterous robot actions via downstream task space residual RL.

DexSynRefine overview figure

Abstract

Learning dexterous manipulation from human-object interaction (HOI) data is a scalable alternative to teleoperation, but HOI demonstrations are sparse and provide only kinematic motion that is not directly executable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a framework with three coupled components: HOI-MMFP, a task and object initial state-conditioned motion manifold primitives that synthesizes coordinated hand-object trajectories from sparse HOI demonstrations; a task-space residual RL policy that physically grounds the synthesized reference while inheriting its kinematic structure; and a contact-and-dynamics adaptation module that enables sim-to-real transfer from proprioceptive history. Across five dexterous manipulation tasks spanning pick-and-place, tool use, and object reorientation, our task-space residual policy outperforms prior action-representation baselines in simulations and transfers to a real robot on all five tasks, improving over kinematic retargeting by 50–70 percentage points.

HOI-MMFP

HOI-MMFP is a task and object initial state-conditioned motion manifold primitives that synthesizes coordinated hand-object trajectories from sparse HOI demonstrations.

HOI-MMFP figure

HOI Trajectory Synthesis Results

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

HOI-MMFP (Ours)

DiT-Full

TC-VAE

Trajectory Space synthesis comparison
Manifold Space synthesis comparison

Synthesizing trajectories directly in trajectory space produces jittery and jerky motions, while our manifold-based synthesis yields smooth, physically plausible HOI trajectories.

Task Space Residual Action Learning

Our task-space residual RL policy physically grounds the synthesized reference motion while inheriting its kinematic structure, enabling robust execution under embodiment mismatch and contact-rich dynamics.

Task Space Residual Action Learning figure

HOI Trajectory Refinement Results

Task Space Residual RL (Ours)

Kinematic Retargeting

Task Space Residual RL (Ours)

Kinematic Retargeting

Task Space Residual RL (Ours)

Kinematic Retargeting

Task Space Residual RL (Ours)

Kinematic Retargeting

Task Space Residual RL (Ours)

Kinematic Retargeting

Contact Adaptation Visualization

correct binary contact    incorrect binary contact

GT: ground-truth contact    Pr: predicted contact

Real World Deployment Results

Synthesized HOI Trajectory

Book Synthesis Trajectory

Task Space Residual RL (Ours)

Kinematic Retargeting

Synthesized HOI Trajectory

Bowl Synthesis Trajectory

Task Space Residual RL (Ours)

Kinematic Retargeting

Synthesized HOI Trajectory

Hammer Synthesis Trajectory

Task Space Residual RL (Ours)

Kinematic Retargeting

Synthesized HOI Trajectory

Pringles Synthesis Trajectory

Task Space Residual RL (Ours)

Kinematic Retargeting

Synthesized HOI Trajectory

Watering Can Synthesis Trajectory

Task Space Residual RL (Ours)

Kinematic Retargeting

Real World Digital Twin Visualization

Pick Up and Hammer

Synthesized Trajectory

Hammer Synthesized Trajectory

Digital Twin Video

Reorient Pringles Bottle

Synthesized Trajectory

Pringles Synthesized Trajectory

Digital Twin Video

BibTeX

coming soon