Skip to main content

On This Page

GRASP: Robust Gradient-Based Planning for Long-Horizon World Models

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Gradient-based Planning for World Models at Longer Horizons

Researchers have introduced GRASP, a new gradient-based planner designed to overcome the fragility of long-horizon control in learned world models. At a planning horizon of 60, GRASP achieves a 26.2% success rate while standard Cross-Entropy Method (CEM) performance drops to 7.2%.

Why This Matters

While modern world models act as powerful general-purpose simulators, long-horizon planning is technically fragile due to ill-conditioned computation graphs where Jacobian conditioning scales exponentially with time. Additionally, high-dimensional latent spaces introduce adversarial robustness issues where state-input gradients become brittle, causing optimization to fail in unseen directions orthogonal to the data manifold.

Key Insights

  • Backprop through time (BPTT) in world models leads to exploding or vanishing gradients as Jacobian conditioning scales exponentially with the horizon T.
  • Adversarial robustness issues, identified by Szegedy et al. (2014) and Goodfellow et al. (2015), cause models to have high Lipschitz constants in directions normal to the data manifold.
  • Lifting dynamics into virtual states via collocation allows optimization to occur in parallel across time, providing a speed-up compared to serial rollout objectives.
  • Injecting Gaussian noise into virtual state iterates, rather than action iterates, enables effective exploration between basins in lifted optimization spaces.
  • Reshaping gradients by stopping brittle state-input signals while maintaining action-input signals prevents the optimizer from ‘hacking’ the model via adversarial state examples.

Practical Applications

  • Long-horizon robotic manipulation: GRASP maintains a 10.4% success rate at horizon H=80 in Push-T tasks where competing methods like LatCo fail entirely.
  • High-speed trajectory optimization: Lifted state optimization allows for a median time to success of 15.2s at H=50, nearly 6x faster than CEM’s 96.2s.
  • Pitfall: Using standard state-input gradients in deep world models often results in ‘sticky’ optimization where the model tricks itself into feasible but unphysical dynamics.

References:

Continue reading

Next article

Automating Dead Endpoint Detection: Deleting 16,000 Lines of Legacy Node.js Code

Related Content