Skip to main content

On This Page

The Case of the 40-Second Logins: Debugging an ALB Gone Wrong

2 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

The Case of the 40-Second Logins: Debugging an ALB Gone Wrong

A smooth EKS migration turned into a three-hour debugging marathon when users faced 20–40s login delays. The root cause: a single misconfigured ALB subnet in a private network.

Why This Matters

The ideal model assumes load balancers distribute traffic evenly and reliably. However, this incident revealed how a misconfigured ALB node in a private subnet (without an Internet Gateway) could cause intermittent, user-specific latency. The cost? Hours of debugging and potential user churn from inconsistent performance.

Key Insights

  • “40-second login delays, 2025”: A direct metric from the incident.
  • “ALB subnet misconfiguration caused intermittent hangs”: A private subnet without an Internet Gateway broke the ALB’s ability to serve external traffic.
  • “curl —resolve used for per-IP testing”: A critical tool to isolate faulty ALB nodes.

Working Example

# Test individual ALB IPs with curl --resolve
curl -w "Connect:%{time_connect} SSL:%{time_appconnect} StartTransfer:%{time_starttransfer} Total:%{time_total}\n" \
-o /dev/null -s --resolve <API_DOMAIN>:443:<ALB_IP_1> https://<API_DOMAIN>/validatetoken
# Automated per-IP testing with conncheck.sh
#!/usr/bin/env bash
DOMAIN="<API_DOMAIN>"
IPS=(<ALB_IP_1> <ALB_IP_2>)
for ip in "${IPS[@]}"; do
  echo "--- Testing $ip ---"
  for i in {1..10}; do
    curl -s -o /dev/null --resolve $DOMAIN:443:$ip \
    -w "Run:$i Connect:%{time_connect} SSL:%{time_appconnect} StartTransfer:%{time_starttransfer} Total:%{time_total}\n" \
    https://$DOMAIN/validatetoken
  done
done

Practical Applications

  • Use Case: EKS + ALB migration with subnet misconfiguration → intermittent API latency.
  • Pitfall: Assuming reused ALB is healthy without testing each AZ/IP → undetected subnet mismatches.

References:

Continue reading

Next article

Type Constraints in Terraform: Enhancing Infrastructure Code Reliability

Related Content