Skip to main content

On This Page

AI-Driven Development: Moving Beyond Vibe Coding to Agentic Engineering

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The orchestration mindset

Andrew Stellman developed Octobatch, a production-grade batch orchestrator for Monte Carlo simulations. The system comprises 21,000 lines of Python and nearly 1,000 automated tests built entirely by AI.

Why This Matters

There is a critical gap between theoretical knowledge of AI tools and the practical ability to maintain architectural coherence across thousands of lines of generated code. While fully autonomous agents can produce massive outputs—such as Anthropic’s experiment where 16 Claude instances spent $20,000 to build a 100,000-line C compiler that still required human intervention to fix bugs—true reliability requires an ‘orchestration mindset’ where humans own the architecture and verification.

Key Insights

  • The ‘Cognitive Shortcut Paradox’ indicates that developers who already know what good software looks like are the most effective at driving AI coding tools (Stellman, O’Reilly Radar).
  • LLM Batch APIs (released by OpenAI, Anthropic, and Google between April 2024 and July 2025) provide a 50% cost reduction and better performance at scale compared to real-time APIs by treating LLMs as processing infrastructure rather than chatbots.
  • AI exhibits a generative bias toward adding code rather than deleting it; experienced developers must override this instinct to prevent unnecessary complexity in the codebase.
  • Agentic engineering requires specific roles: one LLM for architecture planning, another for execution, a coding agent for implementation, and a human for vision and verification.

Practical Applications

  • [Octobatch / Monte Carlo Simulations] Use case: Running thousands of iterations with seeded randomness for reproducibility. Pitfall: Re-seeding RNGs at every iteration creates correlation bias, leading to incorrect statistical results (e.g., sailors falling in water at 77.5% vs the expected 50%).
  • [Multi-LLM Coordination] Use case: Using one model (Gemini) to validate the output or identify hallucinations produced by another (Claude). Pitfall: Relying on a single LLM’s estimate of complexity; models may overestimate implementation time due to lack of full architectural context.

References:

Continue reading

Next article

Gemma 4: Enabling Local-First Multimodal AI Infrastructure for Developers

Related Content