Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and
generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold
approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we
model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a
maximum entropy reinforcement learning framework that unifies both the discrete and continuous components.
The discrete action space, such as contact point selection, is optimized through Q-function maximization,
while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a
principled objective, where the maximum entropy term is derived as a lower bound using structured
variational inference. We propose the Hybrid Diffusion Policy algorithm (Hydo) and evaluate its
performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo
encourages more diverse behavior policies, leading to a notable improvement in success rates, from 53% to
72% on a 6D pose alignment task.