Automating Hidden JSON API Discovery for Robust Web Scraping

I Built a Script That Finds Hidden APIs on Any Website (Here’s the Code)

Alex Spinov developed a Node.js discovery script to identify internal JSON endpoints used by modern websites. By targeting common paths like /api/v1 and /_next/data, the tool bypasses the fragility of CSS selectors and the overhead of headless browsers.

Why This Matters

Traditional web scraping relies on DOM structures that frequently break during site redesigns and are easily blocked by anti-bot systems. Transitioning to internal JSON APIs provides a more stable, structured data source that eliminates the need for complex HTML parsing. In a real-world implementation, this shift reduced a scraper’s codebase from 120 lines to just 15 lines while increasing execution speed by 10x.

Key Insights

87.5% reduction in code complexity (120 lines to 15 lines) by switching from DOM parsing to JSON APIs, 2026
10x performance gain achieved by bypassing headless browser requirements for data extraction
Discovery of hidden endpoints like /_next/data and /wp-json/wp/v2/posts for structured content access
Utilizing the robots.txt file to reveal restricted API paths that remain technically public
Framework-specific patterns such as appending .json to URLs for automatic JSON responses
Network tab analysis in DevTools to identify XHR/Fetch calls used by the frontend

Working Examples

Node.js script to probe common API paths and return content type and status.

const https = require("https");
async function findAPIs(domain) {
const commonPaths = [
"/api/v1",
"/api/v2",
"/api/graphql",
"/_next/data",
"/wp-json/wp/v2/posts",
"/feed.json",
"/sitemap.xml",
"/.well-known/openid-configuration",
"/robots.txt",
"/manifest.json"
];
const results = [];
for (const path of commonPaths) {
try {
const res = await fetch(`https://${domain}${path}`);
if (res.ok) {
const contentType = res.headers.get("content-type") || "";
results.push({
path,
status: res.status,
type: contentType.split(";")[0],
size: res.headers.get("content-length") || "unknown"
});
}
} catch (e) {
// Skip failed requests
}
}
return results;
}
// Usage
findAPIs("dev.to").then(apis => {
console.log(`Found ${apis.length} endpoints:\n`);
apis.forEach(api => {
console.log(` ${api.path} → ${api.type} (${api.size} bytes)`);
});
});

Practical Applications

Data Aggregation: Use /api/articles on dev.to or /pypi/{pkg}/json for clean, structured metadata without HTML overhead.
Package Management: Accessing registry.npmjs.org for complete package metadata.
Pitfall: Relying on CSS selectors leads to breakage during frontend updates; use internal XHR/Fetch endpoints for long-term stability.
Pitfall: Ignoring robots.txt; consequence is missing valuable public API documentation.

References:

https://dev.to/0012303/i-built-a-script-that-finds-hidden-apis-on-any-website-heres-the-code-2e5j

On This Page

I Built a Script That Finds Hidden APIs on Any Website (Here’s the Code)

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

AI-Driven Design-to-Code Pipeline Risks Repeating Dreamweaver Mistakes

Google Stitch 2.0: Automated Design System Extraction and AI Code Generation

Portfolio Accessibility Audit: How Semantic HTML Boosted Lighthouse Score to 100/100