Skip to main content

On This Page

Optimizing Cloud Economics: Why AWS Service Billing Fails Feature-Level Attribution

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

Your AWS bill is lying to you — it shows services, not features

Arpit Gupta’s AWS bill surged to $180,000 in a single month, yet existing observability tools like Datadog and CloudWatch could not identify the cause. While infrastructure metrics were granular by service, the team lacked any data on which specific product feature drove a $34,000 jump in compute costs.

Why This Matters

Modern observability tools excel at infrastructure questions but are blind to product-level unit economics. When cost is not a first-class concern in service communication, a single Lambda function or ECS task processing multiple features—such as onboarding imports versus manual uploads—creates invisible cost leaks that are impossible to optimize without granular attribution.

Key Insights

  • AWS Cost Explorer API enables daily cost grouping by custom tags, but requires rigorous tagging discipline to provide actionable feature-level data.
  • Propagating an X-Feature-Context header from the client through gateways like Kong allows downstream services to log infrastructure consumption by product action.
  • Onboarding document imports were found to consume 61% of Textract spend while only generating 12% of active users, identifying broken unit economics.
  • AI-assisted tagging consumed 35% of the total AI budget ($6,841 in 14 days) by utilizing GPT-4o for tasks that GPT-4o-mini could handle at a fraction of the cost.
  • Token budgets per feature act as the AI equivalent of a memory leak detector, preventing silent cost spikes when input sizes exceed expected ranges.

Working Examples

Querying AWS Cost Explorer API for tag-based cost attribution.

def get_service_costs_by_tag(start_date: str, end_date: str, tag_key: str): response = ce.get_cost_and_usage(TimePeriod={'Start': start_date, 'End': end_date}, Granularity='DAILY', Filter={'Tags': {'Key': tag_key, 'MatchOptions': ['PRESENT']}}, GroupBy=[{'Type': 'TAG', 'Key': tag_key}, {'Type': 'DIMENSION', 'Key': 'SERVICE'}], Metrics=['UnblendedCost']) return response['ResultsByTime']

Propagating feature context via request headers in the frontend client.

const apiFetch = (path, options = {}) => { const featureContext = getCurrentFeatureContext(); return fetch(`${API_BASE}${path}`, { ...options, headers: { ...options.headers, 'X-Feature-Context': featureContext } }); };

Python decorator for binding feature context to structured log output.

def with_cost_context(func): @functools.wraps(func) async def wrapper(request, *args, **kwargs): feature_ctx = request.headers.get('X-Feature-Context', 'unknown') with structlog.contextvars.bound_contextvars(feature_context=feature_ctx): return await func(request, *args, **kwargs) return wrapper

Practical Applications

  • Use Case: Document processing systems. Instrument services to distinguish between high-cost integration syncs and low-cost manual uploads to adjust enterprise pricing tiers. Pitfall: Aggregating all processing costs into a single service bucket hides unprofitable customer behaviors.
  • Use Case: AI Model Selection. Implement internal AI proxies that log token counts and costs per feature to identify where GPT-4o-mini can replace GPT-4o. Pitfall: Running expensive models for background tasks like auto-tagging can consume 35% of a budget without user-facing benefits.
  • Use Case: Infrastructure Right-sizing. Tag AWS resources by ‘Feature’ in Terraform to identify over-provisioned legacy instances. Pitfall: Service-level tagging only reveals total cost, missing specific components that have been over-provisioned since historical traffic events.

References:

Continue reading

Next article

AI News Weekly Summary: May 02 - May 10, 2026

Related Content