Skip to main content
the lies your orm tells you

Collection Caching and the Invalidation Avalanche

4 min read Chapter 9 of 30

Collection Caching and the Invalidation Avalanche

The L2 entity cache is useful. The L2 collection cache is a trap for most use cases.

The Lie

Cache the collection and Hibernate serves child entities from memory.

The Reality

The collection cache stores a list of child entity IDs, keyed by the parent entity’s ID. When you access a cached collection, Hibernate reads the ID list from the collection cache, then looks up each child entity in the L2 entity cache. If a child entity has been evicted from the L2 cache, Hibernate queries the database for it individually.

Adding or removing a single child entity invalidates the entire collection cache entry for that parent. Not just the added/removed entry. The entire list.

@Entity
@Table(name = "orders")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @OneToMany(mappedBy = "order")
    @org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
    private List<OrderItem> items = new ArrayList<>();
}

@Entity
@Table(name = "order_items")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class OrderItem {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String productName;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "order_id")
    private Order order;
}

The Evidence

// Load order and items - populates both entity and collection cache
Order order = entityManager.find(Order.class, 1L);
order.getItems().size(); // triggers collection load

// Collection cache now contains:
// Key: Order#1.items
// Value: [ItemId:1, ItemId:2, ItemId:3, ItemId:4, ItemId:5]

entityManager.clear();

// Second access - served from cache
Order order2 = entityManager.find(Order.class, 1L);
order2.getItems().size(); // No SQL - collection cache hit, entity cache hit for each item

// Now add a new item in a separate transaction
OrderItem newItem = new OrderItem();
newItem.setProductName("Widget");
newItem.setOrder(order2);
entityManager.persist(newItem);
// ENTIRE collection cache entry for Order#1.items is invalidated
// Next access must reload the full collection from database

With Hibernate statistics:

var stats = sessionFactory.getStatistics();
long hits = stats.getSecondLevelCacheHitCount();
long misses = stats.getSecondLevelCacheMissCount();
long puts = stats.getSecondLevelCachePutCount();

double hitRate = (double) hits / (hits + misses);
// For a collection with frequent child additions: hitRate < 0.1
// For a stable collection: hitRate > 0.95

The Fix

For collections that change frequently, do not cache them. Query for children explicitly.

// BETTER: Skip collection caching, query explicitly
@Entity
@Table(name = "orders")
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    // No @Cache on the collection
    @OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
    private List<OrderItem> items = new ArrayList<>();
}

// For read-heavy access, use a cached query or application-level cache
@Repository
public class OrderItemRepository {

    @Query("SELECT i FROM OrderItem i WHERE i.order.id = :orderId ORDER BY i.id")
    List<OrderItem> findByOrderId(@Param("orderId") Long orderId);
}

For collections that are stable (items rarely added or removed), the collection cache works well. Categories on a product, tags on an article, roles on a user, all change infrequently and benefit from caching.

The Cost Model

Scenario: 10,000 orders, average 8 items per order, 10 item additions per minute across all orders.

With collection cache: Each addition invalidates one collection entry. 10 invalidations per minute, affecting 10 out of 10,000 collections. If those collections are accessed 100 times before the next addition, hit rate is ~99%. This works.

With collection cache, 1,000 additions per minute: 1,000 invalidations per minute. If collections are accessed once every 5 seconds, most accesses see an invalidated entry. Hit rate drops below 50%. The cache is causing harm: lookup in cache (miss), then database query, slower than just querying the database directly.

Without collection cache: One database query per collection access. No invalidation overhead, no memory usage, predictable latency.

The break-even point: if the ratio of reads to writes for a collection exceeds 100:1, the collection cache helps. Below that, it adds overhead without benefit.