Skip to main content
the auth layer

Auth at Scale: Key Rotation Without Downtime, JWKS Endpoints, and Cache Invalidation

3 min read Chapter 37 of 45

Auth at Scale: Key Rotation Without Downtime

Cryptographic keys have lifespans. A signing key used for JWT tokens will eventually need to be rotated: because of a suspected compromise, because of compliance requirements (PCI DSS mandates annual rotation), or because the key has been in use long enough that the risk of compromise increases.

The challenge is not generating a new key. The challenge is rotating from the old key to the new key without invalidating every token, session, and client in the system.

The Problem

The authorization server signs JWTs with private key A. Every resource server validates JWTs using public key A (fetched from the JWKS endpoint). You rotate to key B:

  1. The authorization server starts signing with key B.
  2. Resource servers still have key A cached.
  3. Every new token signed with key B is rejected by every resource server until their JWKS cache refreshes.

If the JWKS cache TTL is 24 hours, you have a 24-hour window where new tokens are intermittently rejected depending on when each resource server last refreshed its cache. Users experience random 401 errors. API integrations break. The on-call team pages.

The Rotation Protocol

Key rotation must be a multi-phase process:

PhaseDurationAuthorization serverResource servers
1. IntroduceHours to daysPublish key B in JWKS alongside key A. Continue signing with key A.Fetch updated JWKS, now trust both A and B.
2. ActivateInstantSwitch signing to key B. Both keys remain in JWKS.Accept tokens signed by A (still valid) and B (new tokens).
3. RetireAfter max token TTLRemove key A from JWKS.No unexpired tokens signed with A remain. Key A is no longer trusted.

The critical insight: phase 1 must last at least as long as the longest JWKS cache TTL across all resource servers. If any resource server caches JWKS for 24 hours, phase 1 must last at least 24 hours.

What This Chapter Covers

Section 1: JWKS cache invalidation strategies. How resource servers detect key changes, how to reduce the gap between key publication and key trust, and how to handle the edge case of emergency rotation (suspected key compromise where you cannot wait for caches to refresh).

Section 2: Automated key rotation. Building a rotation schedule that requires no human intervention, storing key history for audit, and integrating rotation with the deployment pipeline.