Auditing Schema Drift in Existing Systems
Auditing Schema Drift in Existing Systems
You inherited a system that has been running ddl-auto=update for two years. The schema is drifted. You need to quantify the damage and build a migration path without breaking production.
The Lie
The entity model is the source of truth for your schema.
The Reality
After months of ddl-auto=update, neither the entity model nor the database is the source of truth. The entity model describes what Hibernate thinks the schema should be. The database contains what the schema actually is, including ghost columns, manually added indexes, constraints that Hibernate never knew about, and type mismatches that accumulated silently.
You need to compare both and reconcile them into a single, versioned migration history.
The Evidence
Here is a real-world example of drift after 18 months of ddl-auto=update:
-- What Hibernate thinks the table looks like (from entity mapping):
-- CREATE TABLE orders (
-- id BIGINT PRIMARY KEY,
-- customer_id BIGINT NOT NULL,
-- status VARCHAR(255),
-- total_amount DECIMAL(19,2),
-- created_at TIMESTAMP
-- )
-- What the actual database contains:
\d orders
-- Table "public.orders"
-- Column | Type | Nullable | Default
-- ------------------+------------------------+----------+-------------
-- id | bigint | not null | nextval(...)
-- customer_id | bigint | not null |
-- status | character varying(255) | yes |
-- total_amount | numeric(19,2) | yes |
-- created_at | timestamp | yes |
-- old_status | character varying(50) | yes | -- ghost
-- customer_name | character varying(255) | yes | -- ghost
-- discount_pct | double precision | yes | -- ghost
-- legacy_ref | character varying(100) | yes | -- ghost
--
-- Indexes:
-- orders_pkey PRIMARY KEY (id)
-- idx_orders_customer_manual btree (customer_id) -- manually added, not in entity
--
-- Foreign-keys:
-- fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(id)
-- -- added manually by DBA, not in entity mapping
Four ghost columns. One manually added index. One manually added foreign key. Hibernate does not know about any of these.
The Fix
Step 1: Detect Drift Programmatically
Use Hibernate’s SchemaValidator or Spring’s ddl-auto=validate to find mismatches between entities and schema:
// BETTER: Programmatic drift detection
@Component
public class SchemaDriftAuditor implements CommandLineRunner {
@Autowired
private EntityManagerFactory emf;
@Override
public void run(String... args) {
SessionFactory sf = emf.unwrap(SessionFactory.class);
Metadata metadata = ((SessionFactoryImplementor) sf)
.getJdbcServices()
.getBootstrapJdbcConnectionAccess()
// ... access metadata from boot model
// Alternative: use SchemaManagementTool
ServiceRegistry serviceRegistry = ((SessionFactoryImplementor) sf)
.getServiceRegistry();
SchemaManagementTool tool = serviceRegistry
.getService(SchemaManagementTool.class);
// This will throw if schema doesn't match entities
// Catch and log instead of failing
try {
tool.getSchemaValidator(/* tool options */);
} catch (SchemaManagementException e) {
log.error("Schema drift detected: {}", e.getMessage());
}
}
}
A more practical approach uses direct SQL to find ghost columns:
// BETTER: Query information_schema to find columns not mapped in entities
@Component
public class GhostColumnDetector {
@Autowired
private JdbcTemplate jdbcTemplate;
public List<String> findGhostColumns(String tableName,
Set<String> entityColumns) {
List<String> dbColumns = jdbcTemplate.queryForList(
"""
SELECT column_name FROM information_schema.columns
WHERE table_name = ? AND table_schema = 'public'
""",
String.class, tableName);
return dbColumns.stream()
.filter(col -> !entityColumns.contains(col))
.toList();
}
}
// Usage:
// Set<String> mapped = Set.of("id", "customer_id", "status",
// "total_amount", "created_at");
// List<String> ghosts = detector.findGhostColumns("orders", mapped);
// ghosts = ["old_status", "customer_name", "discount_pct", "legacy_ref"]
Step 2: Generate a Baseline Migration
Once you have audited the drift, generate a baseline migration that matches the current production schema exactly:
# Using Flyway
flyway baseline -baselineVersion=0 -baselineDescription="Existing schema"
# Or dump the current schema as your V1 migration
pg_dump --schema-only --no-owner mydb > V1__baseline.sql
Step 3: Reconcile in Phases
Do not try to fix all drift in one migration. Plan phases:
-- V2__remove_ghost_columns.sql
-- Phase 1: Remove clearly abandoned columns
-- Verify no application or report reads these columns first
ALTER TABLE orders DROP COLUMN IF EXISTS old_status;
ALTER TABLE orders DROP COLUMN IF EXISTS legacy_ref;
-- Phase 2: Formalize manually added structures
-- These were added by a DBA and are actually useful
-- Now they're tracked in migration history
-- (idx_orders_customer_manual already exists, just documenting it)
-- If it didn't exist: CREATE INDEX idx_orders_customer ON orders (customer_id);
-- V3__add_missing_constraints.sql
-- Phase 3: Add constraints that Hibernate never generated
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders ADD CONSTRAINT chk_orders_status
CHECK (status IN ('PENDING', 'CONFIRMED', 'SHIPPED', 'DELIVERED', 'CANCELLED'));
ALTER TABLE orders ALTER COLUMN created_at SET NOT NULL;
ALTER TABLE orders ALTER COLUMN created_at SET DEFAULT NOW();
Step 4: Switch to validate
# After baseline and reconciliation migrations are applied
spring:
jpa:
hibernate:
ddl-auto: validate
flyway:
enabled: true
baseline-on-migrate: true
From this point forward, every schema change goes through a Flyway migration. Hibernate validates on startup that entities match the schema. If they diverge, the application fails to start, which is the correct behavior.
The Cost Model
Schema drift auditing takes 1-3 days for a medium-sized application (20-50 entities). The reconciliation migration plan takes another 2-5 days, depending on how many ghost columns carry data that downstream systems depend on.
The cost of not doing this: every month of continued ddl-auto=update adds more ghost columns, widens the gap between entity model and database, and makes the eventual reconciliation harder. After two years, you are looking at a week-long project. After four years, it becomes a multi-sprint effort with real risk of data loss.
The right time to switch was when you went to production. The second best time is now.