World-R1: Enhancing Video Foundation Models with Flow-GRPO and 3D-Aware Rewards

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

Microsoft Research and Zhejiang University have introduced World-R1, a reinforcement learning framework that aligns video generation with 3D constraints. The system improves geometric consistency in Wan 2.1, achieving a 10.23 dB PSNR gain in the Small variant through post-training.

Why This Matters

Current video foundation models like Wan 2.1 often fail to maintain 3D coherence, leading to spatial warping and texture stretching during camera movement because they fit 2D pixel correlations rather than simulating 3D scenes. World-R1 addresses this by eliciting latent geometric knowledge through reinforcement learning rather than supervised training on expensive 3D assets, maintaining the original model architecture and inference efficiency while fixing structural inconsistencies.

Key Insights

World-R1-Large achieved a 27.67 PSNR on 3DGS-based reconstruction, representing a 7.91 dB improvement over the base Wan2.1-T2V-14B model in 2026.
The framework utilizes Flow-GRPO-Fast to adapt Group Relative Policy Optimization to flow-matching diffusion models by injecting SDE noise at random intermediate steps to reduce rollout costs.
A composite 3D reward system employs Depth Anything 3 and Qwen3-VL to score reconstructions from meta-views, penalizing artifacts like floaters or billboard effects that occur off-axis.
Implicit camera conditioning is achieved via noise wrapping, projecting camera extrinsics into 2D optical flow to warp initial latents without adding new parameters or adapters.
Periodic decoupled training is implemented to prevent reward hacking; every 100 steps, 3D rewards are suspended to prioritize aesthetic rewards (HPSv3) and preserve dynamic motion.

Practical Applications

Use case: High-fidelity cinematic camera movements (orbiting, pushing in) implemented via noise wrapping in World-R1-Large. Pitfall: Over-optimization for 3D reconstruction can lead to static scenes where dynamic elements like water or fire stop moving to minimize error.
Use case: Long-form video generation up to 121 frames maintaining geometric consistency via the World-R1-Large backbone. Pitfall: Relying solely on 3DGS rewards without aesthetic regularization (HPSv3) causes visual quality to collapse under geometric pressure.

References:

https://www.marktechpost.com/2026/04/30/microsoft-researchs-world-r1-uses-flow-grpo-and-3d-aware-rewards-to-inject-geometric-consistency-into-wan-2-1-without-architectural-changes/

On This Page

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework

Building a Netflix VOID Video Object Removal Pipeline with CogVideoX

Vision Banana: Google DeepMind’s Instruction-Tuned Model Outperforms SAM 3 and Depth Anything V3