Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World

Ordered Action Tokenization (OAT) for Robotics

The introduction of OAT by researchers from Harvard University and Stanford University marks a significant milestone in applying autoregressive models to robotics, with OAT achieving high compression, total decodability, and causal ordering. By using a transformer encoder with register tokens and nested dropout, OAT enables efficient and reliable tokenization of continuous robot movements.

Why This Matters

The technical reality of robot actions is that they are difficult to turn into discrete tokens, with previous strategies like binning, FAST, and learned latent tokenizers having fatal flaws, such as massive sequences, undecodable sequences, and lack of specific order. OAT addresses these limitations, ensuring that every possible token sequence maps to a valid movement and allowing for flexible “anytime” inference.

Key Insights

OAT outperforms the industry-standard Diffusion Policy (DP) and previous tokenizers, achieving a 52.3% aggregate success rate across 20+ tasks in 4 major simulation benchmarks.
The use of nested dropout forces the model to learn “important” things first, capturing global motion and later refining details.
Prefix-based detokenization enables a smooth trade-off between computation cost and action fidelity, allowing for coarse actions with just 1 or 2 tokens and fine actions with all 8 tokens.

Working Example

# Example code for OAT tokenization and detokenization
import torch
import torch.nn as nn

class OAT(nn.Module):
    def __init__(self, num_tokens, num_registers):
        super(OAT, self).__init__()
        self.transformer_encoder = nn.TransformerEncoderLayer(d_model=num_tokens, nhead=8)
        self.register_tokens = nn.Embedding(num_registers, num_tokens)

    def forward(self, actions):
        # Tokenize actions using transformer encoder and register tokens
        tokens = self.transformer_encoder(actions)
        return tokens

    def detokenize(self, tokens):
        # Detokenize tokens using prefix-based detokenization
        actions = []
        for i in range(len(tokens)):
            action = self.register_tokens(tokens[i])
            actions.append(action)
        return actions

Practical Applications

Use Case: OAT can be used in robotics applications such as pick-and-place tasks, stack cups, and other tasks that require flexible and efficient tokenization of continuous movements.
Pitfall: A common anti-pattern is to use fixed-length tokenizers, which can lead to poor performance and reliability issues, highlighting the importance of OAT’s flexible “anytime” inference.

References:

On This Page

Ordered Action Tokenization (OAT) for Robotics

Why This Matters

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

Google DeepMind Gemini Robotics-ER 1.6: Advancing Embodied Reasoning and Industrial Instrument Reading

GRASP: Robust Gradient-Based Planning for Long-Horizon World Models

Top 10 Physical AI Models Powering Real-World Robots in 2026