DOPO: Dense Online Preference Optimization for Cross-Dataset Motion Diffusion Adaptation

Media Integration and Communication Center (MICC), University of Florence, Florence, Italy.

Adapting text-conditioned motion diffusion models to new domains typically requires collecting additional motion capture data and retraining from scratch, a process that is expensive, time-consuming, and often impractical. In this paper we propose DOPO (Dense Online Preference Optimization), a post-training framework that fine-tunes pretrained motion diffusion models using only textual prompts from the target domain, without directly requiring ground-truth motion data or human preference annotations. DOPO builds on two components: (1) TMR-Dense, a timestep-conditioned text-motion retrieval network that evaluates motion-text alignment at arbitrary points along the denoising trajectory, enabling dense preference signals throughout generation; and (2) an online adaptation of Step-by-step Preference Optimization that uses TMR-Dense to automatically construct preference pairs from target domain prompts, eliminating the need for pre-collected preference datasets. Unlike prior reinforcement learning approaches that rely on sparse terminal rewards, DOPO provides step-aware supervision while avoiding the instabilities associated with policy gradient optimization. We evaluate our approach on cross-dataset adaptation scenarios spanning BABEL, HumanML3D, and MotionX, testing both latent-space and joint-space diffusion architectures. Experimental results indicate consistent improvements over zero-shot baselines and prior RL-based method across metrics, with training times reduced by approximately an order of magnitude. DOPO preserves performance on source distributions after adaptation, suggesting that online preference-based fine-tuning offers a practical path toward more adaptable motion generation systems.