Why AI Detection Tools Fail: Vibe-Check Scores 0/100 on AI-Generated Codebase
These articles are AI-generated summaries. Please check the original sources for full details.
I Built a Vibe-Check Tool — Then Ran It on an AI-Built Codebase and It Scored 0/100
Lakshmi Sravya Vedantham developed vibe-check to identify AI-authored code by detecting patterns like over-commenting and placeholder naming. When tested on a 30,000-line full-stack application, the tool returned a 0/100 ‘Mostly Human’ score despite the codebase being approximately 50% AI-generated.
Why This Matters
Technical detection tools often rely on ‘style markers of careless AI usage,’ such as generic variable names or hallucinatory imports, which modern models easily bypass when provided with deep domain context. As AI moves from producing sloppy boilerplate to expert-level code with perfect docstring uniformity and comprehensive error handling, the ‘distribution shift’ makes AI code look better than average human code, rendering surface-level heuristics obsolete and creating a massive accuracy gap in security and auditing tools.
Key Insights
- Fact: The vibe-check tool returned a 0/100 score on a repository containing 30,000 lines across React and FastAPI (2026).
- Concept: ‘Consistency Scoring’ measures variance in style; absolute consistency at scale is a stronger AI signal than specific keywords like ‘helper’ or ‘manager’.
- Tool: commit-prophet, a CLI tool built entirely by an AI agent in one session, scored only 2/100 on standard detection metrics.
- Concept: The ‘Vocabulary Specificity Index’ reveals that AI given domain context produces more precise terminology than junior developers, defeating generic naming detectors.
- Fact: Approximately 70% of the tested codebase (TypeScript/JavaScript) was invisible to the detector because it was limited to Python analysis.
Working Examples
Example of domain-specific variable names that defeat generic AI detectors.
confidence_weighted_score = weighted_avg(model_outputs, confidence_weights)
normalized_feature_vector = standardize(raw_features, per_channel=True)
inter_class_variance = between_class / within_class
calibrated_threshold = baseline_mean + (2.5 * baseline_std)
rolling_accuracy = ema(correct_predictions, window=50)
A textbook AI signature: perfectly organized, multi-line import blocks added in a single session.
import { Switch, Route } from "wouter";
import { QueryClientProvider } from "@tanstack/react-query";
import Dashboard from "@/pages/dashboard";
import Analytics from "@/pages/analytics";
import Settings from "@/pages/settings";
// ... 20 more page imports
Practical Applications
- Use case: Git history analysis; identifying AI generation by monitoring for ‘burst’ commits that add thousands of lines of documented code with zero fix-up cycles.
- Pitfall: Lexical naming detectors; relying on keywords like ‘process_data’ fails when AI uses terms like ‘inter_class_variance’ derived from technical literature.
- Use case: Structural uniformity auditing; measuring the coefficient of variation for docstring length and test-to-source ratios to find suspiciously perfect coverage.
- Pitfall: Single-language scanning; ignoring polyglot components in a stack leads to total invisibility of AI-generated frontends or middleware.
References:
Continue reading
Next article
Solving Prompt Drift: A Git-Like Version Control System for LLM Prompts
Related Content
Mastering Tool Calling for Production AI Agents: A Technical Roadmap
Learn to design, scale, and secure tool calling in AI agents to prevent production failures caused by malformed arguments and unhandled errors.
Rhett Launches The Code of Law Challenge: AI-Driven Legal Automation Hackathon
Rhett's Code of Law Challenge hackathon offers a ₹22,000 prize pool for developers building AI-driven contract review and legal governance tools.
Bridging the Gap Between AI-Assisted Speed and System Stability
AI tools boost code production speed, but exceeding a system's change absorption capacity leads to production failures and triple the rework time.