Collection Caching and the Invalidation Avalanche
Collection Caching and the Invalidation Avalanche
The L2 entity cache is useful. The L2 collection cache is a trap for most use cases.
The Lie
Cache the collection and Hibernate serves child entities from memory.
The Reality
The collection cache stores a list of child entity IDs, keyed by the parent entity’s ID. When you access a cached collection, Hibernate reads the ID list from the collection cache, then looks up each child entity in the L2 entity cache. If a child entity has been evicted from the L2 cache, Hibernate queries the database for it individually.
Adding or removing a single child entity invalidates the entire collection cache entry for that parent. Not just the added/removed entry. The entire list.
@Entity
@Table(name = "orders")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@OneToMany(mappedBy = "order")
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
private List<OrderItem> items = new ArrayList<>();
}
@Entity
@Table(name = "order_items")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class OrderItem {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String productName;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "order_id")
private Order order;
}
The Evidence
// Load order and items - populates both entity and collection cache
Order order = entityManager.find(Order.class, 1L);
order.getItems().size(); // triggers collection load
// Collection cache now contains:
// Key: Order#1.items
// Value: [ItemId:1, ItemId:2, ItemId:3, ItemId:4, ItemId:5]
entityManager.clear();
// Second access - served from cache
Order order2 = entityManager.find(Order.class, 1L);
order2.getItems().size(); // No SQL - collection cache hit, entity cache hit for each item
// Now add a new item in a separate transaction
OrderItem newItem = new OrderItem();
newItem.setProductName("Widget");
newItem.setOrder(order2);
entityManager.persist(newItem);
// ENTIRE collection cache entry for Order#1.items is invalidated
// Next access must reload the full collection from database
With Hibernate statistics:
var stats = sessionFactory.getStatistics();
long hits = stats.getSecondLevelCacheHitCount();
long misses = stats.getSecondLevelCacheMissCount();
long puts = stats.getSecondLevelCachePutCount();
double hitRate = (double) hits / (hits + misses);
// For a collection with frequent child additions: hitRate < 0.1
// For a stable collection: hitRate > 0.95
The Fix
For collections that change frequently, do not cache them. Query for children explicitly.
// BETTER: Skip collection caching, query explicitly
@Entity
@Table(name = "orders")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
// No @Cache on the collection
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<OrderItem> items = new ArrayList<>();
}
// For read-heavy access, use a cached query or application-level cache
@Repository
public class OrderItemRepository {
@Query("SELECT i FROM OrderItem i WHERE i.order.id = :orderId ORDER BY i.id")
List<OrderItem> findByOrderId(@Param("orderId") Long orderId);
}
For collections that are stable (items rarely added or removed), the collection cache works well. Categories on a product, tags on an article, roles on a user, all change infrequently and benefit from caching.
The Cost Model
Scenario: 10,000 orders, average 8 items per order, 10 item additions per minute across all orders.
With collection cache: Each addition invalidates one collection entry. 10 invalidations per minute, affecting 10 out of 10,000 collections. If those collections are accessed 100 times before the next addition, hit rate is ~99%. This works.
With collection cache, 1,000 additions per minute: 1,000 invalidations per minute. If collections are accessed once every 5 seconds, most accesses see an invalidated entry. Hit rate drops below 50%. The cache is causing harm: lookup in cache (miss), then database query, slower than just querying the database directly.
Without collection cache: One database query per collection access. No invalidation overhead, no memory usage, predictable latency.
The break-even point: if the ratio of reads to writes for a collection exceeds 100:1, the collection cache helps. Below that, it adds overhead without benefit.