Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

1Bosch Center for Artificial Intelligence(BCAI) 2Karlsruhe Institute of Technology

Our HyDo approach learns to align unseen 6D objects to target poses using contact-rich, dynamic non-prehensile motions, achieving multiple solution paths.


Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy algorithm ({\bf \acrshort{hydo}}) and evaluate its performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo encourages more diverse behavior policies, leading to a notable improvement in success rates, from 53\% to 72\% on a 6D pose alignment task.


Hybrid Diffusion pOlicy

Method Figure

An overview of the proposed method. The point cloud observation includes the location of the points and point features. The goal is represented as per-point flow of the object points. The actor takes the observation as input and outputs an Actor Map of per-point motion parameters. The Actor Map is concatenated with the per-point critic features to generate the Critic Map of per-point Q-values. Finally, we choose the best contact location according to the highest value in the Critic Map and find the corresponding motion parameters in the Actor Map.



Diversity Evaluation Environment

Diversity Evaluation Env on Realrobot


Sim Results  with Interactive Visualizations 

Interactive Visuliazation:   Drag the slider to visualize different timesteps. Click on the legends on the plot to show/hide elements.



  author    = {Huy Le, Miroslav Gabriel, Tai Hoang, Gerhard Neumann, Ngo Anh Vien},
  title     = {Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation},
  journal   = {preprint},
  year      = {2024},