Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World
These articles are AI-generated summaries. Please check the original sources for full details.
Ordered Action Tokenization (OAT) for Robotics
The introduction of OAT by researchers from Harvard University and Stanford University marks a significant milestone in applying autoregressive models to robotics, with OAT achieving high compression, total decodability, and causal ordering. By using a transformer encoder with register tokens and nested dropout, OAT enables efficient and reliable tokenization of continuous robot movements.
Why This Matters
The technical reality of robot actions is that they are difficult to turn into discrete tokens, with previous strategies like binning, FAST, and learned latent tokenizers having fatal flaws, such as massive sequences, undecodable sequences, and lack of specific order. OAT addresses these limitations, ensuring that every possible token sequence maps to a valid movement and allowing for flexible “anytime” inference.
Key Insights
- OAT outperforms the industry-standard Diffusion Policy (DP) and previous tokenizers, achieving a 52.3% aggregate success rate across 20+ tasks in 4 major simulation benchmarks.
- The use of nested dropout forces the model to learn “important” things first, capturing global motion and later refining details.
- Prefix-based detokenization enables a smooth trade-off between computation cost and action fidelity, allowing for coarse actions with just 1 or 2 tokens and fine actions with all 8 tokens.
Working Example
# Example code for OAT tokenization and detokenization
import torch
import torch.nn as nn
class OAT(nn.Module):
def __init__(self, num_tokens, num_registers):
super(OAT, self).__init__()
self.transformer_encoder = nn.TransformerEncoderLayer(d_model=num_tokens, nhead=8)
self.register_tokens = nn.Embedding(num_registers, num_tokens)
def forward(self, actions):
# Tokenize actions using transformer encoder and register tokens
tokens = self.transformer_encoder(actions)
return tokens
def detokenize(self, tokens):
# Detokenize tokens using prefix-based detokenization
actions = []
for i in range(len(tokens)):
action = self.register_tokens(tokens[i])
actions.append(action)
return actions
Practical Applications
- Use Case: OAT can be used in robotics applications such as pick-and-place tasks, stack cups, and other tasks that require flexible and efficient tokenization of continuous movements.
- Pitfall: A common anti-pattern is to use fixed-length tokenizers, which can lead to poor performance and reliability issues, highlighting the importance of OAT’s flexible “anytime” inference.
References:
Continue reading
Next article
India's Quantum Future Takes Shape
Related Content
Google DeepMind Gemini Robotics-ER 1.6: Advancing Embodied Reasoning and Industrial Instrument Reading
Google DeepMind's Gemini Robotics-ER 1.6 achieves a 93% success rate in industrial instrument reading using agentic vision for physical AI.
GRASP: Robust Gradient-Based Planning for Long-Horizon World Models
GRASP achieves a 26.2% success rate at horizon H=60, significantly outperforming CEM and GD by leveraging lifted state optimization and gradient reshaping.
Top 10 Physical AI Models Powering Real-World Robots in 2026
NVIDIA's GR00T N1.7 and Google's Gemini Robotics 1.5 lead the 2026 shift toward physical foundation models, scaling dexterity through 20,000+ hours of human video data.