GRASP: Robust Gradient-Based Planning for Long-Horizon World Models

Gradient-based Planning for World Models at Longer Horizons

Researchers have introduced GRASP, a new gradient-based planner designed to overcome the fragility of long-horizon control in learned world models. At a planning horizon of 60, GRASP achieves a 26.2% success rate while standard Cross-Entropy Method (CEM) performance drops to 7.2%.

Why This Matters

While modern world models act as powerful general-purpose simulators, long-horizon planning is technically fragile due to ill-conditioned computation graphs where Jacobian conditioning scales exponentially with time. Additionally, high-dimensional latent spaces introduce adversarial robustness issues where state-input gradients become brittle, causing optimization to fail in unseen directions orthogonal to the data manifold.

Key Insights

Backprop through time (BPTT) in world models leads to exploding or vanishing gradients as Jacobian conditioning scales exponentially with the horizon T.
Adversarial robustness issues, identified by Szegedy et al. (2014) and Goodfellow et al. (2015), cause models to have high Lipschitz constants in directions normal to the data manifold.
Lifting dynamics into virtual states via collocation allows optimization to occur in parallel across time, providing a speed-up compared to serial rollout objectives.
Injecting Gaussian noise into virtual state iterates, rather than action iterates, enables effective exploration between basins in lifted optimization spaces.
Reshaping gradients by stopping brittle state-input signals while maintaining action-input signals prevents the optimizer from ‘hacking’ the model via adversarial state examples.

Practical Applications

Long-horizon robotic manipulation: GRASP maintains a 10.4% success rate at horizon H=80 in Push-T tasks where competing methods like LatCo fail entirely.
High-speed trajectory optimization: Lifted state optimization allows for a median time to success of 15.2s at H=50, nearly 6x faster than CEM’s 96.2s.
Pitfall: Using standard state-input gradients in deep world models often results in ‘sticky’ optimization where the model tricks itself into feasible but unphysical dynamics.

References:

On This Page

Gradient-based Planning for World Models at Longer Horizons

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World

MBZUAI Researchers Introduce PAN: A General World Model For Interactable Long Horizon Simulation

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework