Andrej Karpathy Open-Sources 'Autoresearch': A 630-Line Tool for Autonomous ML Experiments

Andrej Karpathy Open-Sources ‘Autoresearch’: A 630-Line Python Tool Letting AI Agents Run Autonomous ML Experiments on Single GPUs

Andrej Karpathy has released autoresearch, a minimalist Python framework designed for autonomous machine learning experimentation. The tool condenses the nanochat LLM training core into a single-file repository of approximately 630 lines of code.

Why This Matters

In traditional machine learning development, researchers spend significant time manually tuning hyperparameters and testing architectural tweaks. Autoresearch shifts this burden to AI agents, which can autonomously propose, implement, and validate changes within a controlled 5-minute training window. This transition from manual tuning to agent engineering allows for more rapid iteration and discovery of non-obvious optimizations that can outperform human-configured models.

Key Insights

The framework uses a 630-line codebase to ensure the entire script fits within modern LLM context windows, reducing code generation errors (Karpathy, 2026).
Validation is strictly enforced via bits-per-byte (BPB); the agent only commits code changes if the score improves (Karpathy, 2026).
Training runs are restricted to fixed 5-minute intervals on a single NVIDIA GPU to maintain high-velocity experimentation loops (Karpathy, 2026).
Shopify’s Tobi Lutke reported a 19% improvement in validation scores using the tool to optimize a smaller model architecture (Lutke, 2026).
Autonomous optimizations discovered by the agent in small-scale runs were successfully integrated back into the larger nanochat production framework (Karpathy, 2026).

Practical Applications

Use Case: Shopify CEO Tobi Lutke used the tool to optimize a smaller model architecture, resulting in a 19% improvement that outperformed a larger, manually configured model.
Pitfall: Relying on agents without strict validation metrics like BPB can lead to committing regressions; the system mitigates this by only merging code that improves performance.
Use Case: ML researchers can use the tool to automate the search for optimal neural network architectures and optimizers by simply updating a Markdown instruction file.
Pitfall: Over-extending the codebase beyond the context window of the LLM; the 630-line constraint is critical for maintaining the agent’s holistic understanding of the script.

References:

https://www.marktechpost.com/2026/03/08/andrej-karpathy-open-sources-autoresearch-a-630-line-python-tool-letting-ai-agents-run-autonomous-ml-experiments-on-single-gpus/

On This Page

Andrej Karpathy Open-Sources ‘Autoresearch’: A 630-Line Python Tool Letting AI Agents Run Autonomous ML Experiments on Single GPUs

Why This Matters

Key Insights

Practical Applications

Continue reading

Related Content

NVIDIA Open-Sources OpenShell: Secure Sandboxed Runtime for AI Agents

Designing an Autonomous Multi-Agent Data Infrastructure System with Lightweight Qwen Models

Defeating the ‘Token Tax’: Google Gemma 4 and NVIDIA Revolutionize Local Agentic AI