Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Abstract

Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy algorithm (Hydo) and evaluate its performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo encourages more diverse behavior policies, leading to a notable improvement in success rates, from 53% to 72% on a 6D pose alignment task.

HyDo

Hybrid Diffusion pOlicy

An overview of the proposed method. The point cloud observation includes the location of the points and point features. The goal is represented as per-point flow of the object points. The actor takes the observation as input and outputs an Actor Map of per-point motion parameters. The Actor Map is concatenated with the per-point critic features to generate the Critic Map of per-point Q-values. Finally, we choose the best contact location according to the highest value in the Critic Map and find the corresponding motion parameters in the Actor Map.

Videos

Real robot setup for capturing point-cloud

Rubik Cube

We have created a setup that closely reflects a real-world scenario. The scene objects are cluttered with the manipulation object. In this demonstration, we perform a 6D pose alignment task with a Rubik's Cube, including two additional scene objects.

Foam Brick

We demonstrate a 6D pose alignment task with a foam brick, including five scene objects. These objects are cluttered around the manipulation object. This setup strongly reflects a real-world scenario.

Pushing an Object Toward the Target While Avoiding Obstacle

This task demonstrates that by segmenting out the background, we do not lose essential information about the environment. Given the same initial position, the robot must push the object to the other side.

Without Obstacle

In the scene without an obstacle, the robot pushes the object directly toward the goal.

With Obstacle

In the presence of an obstacle, the robot pushes the object around it to reach the target.

Diversity Evaluation: Trajectory Visualization

The pushing task is similar to the avoiding task in Jia et al. (2024). This task is designed to evaluate the model's ability to capture diverse solutions. There are 24 different successful ways to complete the task.

We visualize the trajectory of the manipulated object. HACMan and HyDo (w/o Diff) fail to explore alternative strategies and converge to a single solution. In contrast, diffusion-based methods successfully discover and complete the task using multiple diverse trajectories.

Diversity Evaluation Push Alignment

Diversity Evaluation Env on Realrobot

Cluttered Environment

We included videos in a cluttered environment where the manipulated object is surrounded by scene objects. This highlights the robustness of our method.

Simulation Results with Interactive Visualizations

Interactive Visuliazation: Drag the slider to visualize different timesteps. Click on the legends on the plot to show/hide elements.

Object:

BibTeX

@ARTICLE{le2025hydo,
  author={Le, Huy and Hoang, Tai and Gabriel, Miroslav and Neumann, Gerhard and Vien, Ngo Anh},
  journal={IEEE Robotics and Automation Letters}, 
  title={Enhancing Exploration With Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation}, 
  year={2025},
  volume={10},
  number={6},
  pages={6143-6150},
  doi={10.1109/LRA.2025.3564780}
}