Skip to main content

On This Page

containerd at Scale: 5 Day-2 Failure Patterns for High-Density Kubernetes

3 min read
Share

These articles are AI-generated summaries. Please check the original sources for full details.

containerd in Production: 5 Day-2 Failure Patterns at High Pod Density

Production Kubernetes environments running 400–1,000 containers per node often face quiet failures where containerd-shim processes consume up to 10GB of unreserved memory. This architectural detail causes the Linux OOM killer to terminate containers that Kubernetes never requested to kill.

Why This Matters

Kubernetes resource accounting tracks container cgroup limits but fails to account for the per-container containerd-shim overhead, which handles PID tracking and log relaying. This discrepancy means a node can be technically under its container memory budget while the runtime layer consumes enough unreserved memory to trigger system-level OOM events, rendering standard dashboards misleading. Operational success at scale requires moving beyond documented container limits to account for the underlying execution chain’s physical footprint.

Key Insights

  • At 800 containers, containerd-shim overhead reaches approximately 10.0 GB of resident memory (NTCTech, 2026).
  • Mixed cgroup v1/v2 nodes cause resource accounting corruption when containerd semantics do not match the host kernel version (NTCTech, 2026).
  • OverlayFS snapshot counts exceeding 2,000–3,000 cause measurable image pull degradation on long-running nodes (NTCTech, 2026).
  • The containerd gRPC socket processes requests serially, creating a serialization bottleneck in high-churn environments (NTCTech, 2026).
  • The crun runtime is used by high-density clusters to reduce per-shim memory overhead by 30-40% compared to runc (NTCTech, 2026).

Working Examples

Count shim processes and total memory consumption on a node

ps aux | grep containerd-shim | awk '{sum += $6} END {print "Total shim RSS: " sum/1024 " MB, Count: " NR}'

Verify cgroup version consistency between the host, containerd, and kubelet

stat -fc %T /sys/fs/cgroup/
containerd config dump | grep cgroup
systemctl show kubelet | grep cgroup

Check for inode exhaustion and snapshot accumulation in OverlayFS

df -i /var/lib/containerd/
find /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/ -maxdepth 1 -type d | wc -l

Configure containerd Garbage Collection parameters to manage snapshot drift

[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
GCPercent = 50

Practical Applications

  • High-density Kubernetes nodes: Reserve 2-4GB of memory via —system-reserved to account for shim overhead. Pitfall: Tracking only container cgroups leads to system-level OOM kills on compliant containers.
  • CI/CD job runners: Isolate ephemeral workloads to dedicated node pools to prevent gRPC socket saturation. Pitfall: Rapid pod cycling causes scheduling delays despite low CPU and memory pressure.
  • Long-running clusters: Implement node rotation every 60-90 days or scheduled ‘crictl rmi —prune’ via DaemonSets. Pitfall: Ignoring image churn leads to inode exhaustion and ‘No space left on device’ errors.

References:

Continue reading

Next article

Deploying Production-Grade Node.js on Oracle Cloud Free Tier

Related Content