PGArchive: Zero-Knowledge Database Backups with Verified Restores
These articles are AI-generated summaries. Please check the original sources for full details.
I built a database backup tool where even I can’t read your backups
Developer Kalees launched PGArchive to solve the security risks associated with third-party backup proxying. The system utilizes a local agent to perform AES-256-GCM encryption, ensuring the service provider never sees the backup files.
Why This Matters
Standard SaaS backup tools typically route data through their own infrastructure before forwarding it to a user’s storage bucket, creating a window where the provider can access sensitive data. PGArchive implements a zero-knowledge model where bytes move directly from the user’s server to their own S3 or R2 storage, maintaining complete data sovereignty at the cost of permanent loss if encryption keys are misplaced.
Key Insights
- AES-256-GCM encryption is executed locally on the user’s server using keys that never leave the local environment (Kalees, 2026).
- Automated verification involves performing a full pg_restore within a Docker container to confirm backup integrity beyond simple file size checks.
- The local agent utilizes outbound polling every 30 seconds, allowing it to operate behind NAT and firewalls without requiring inbound open ports.
- Automatic database version detection ensures the agent invokes the matching pg_dump or mysqldump binary to prevent version mismatch errors.
- The control plane at pgarchive.com is restricted to job scheduling and success/failure monitoring, with zero access to the backup contents or storage credentials.
Practical Applications
- Use Case: Securing VPS or homelab databases where data must be sent directly to Cloudflare R2. Pitfall: Losing the local encryption key makes all historical backups unrecoverable by any party.
- Use Case: Implementing automated restore testing for Postgres databases. Pitfall: Relying on cron jobs without verification often leads to discovering silent backup failures only during a critical recovery event.
References:
Continue reading
Next article
Surviving Repeated Cryptomining Attacks: A 10-Day Security Hardening Case Study
Related Content
Automated Linux Database Backups: A Guide for PostgreSQL and MySQL
Learn to automate PostgreSQL and MySQL backups on Linux using bash scripts, cron jobs, and AWS S3 to prevent data loss from bad deploys.
Optimizing High-Throughput Workloads with InfluxDB Time-Series Database
InfluxDB handles millions of writes per second using columnar storage and delta encoding, outperforming traditional RDBMS for time-stamped metrics and IoT data.
Mastering Multi-Service Orchestration with Docker Compose
Optimize local development environments using Docker Compose 3.8 to orchestrate web, Postgres 15, and Redis services with automated scaling.