The Cartesian Product Trap in Multi-Collection Fetch Joins
The Cartesian Product Trap in Multi-Collection Fetch Joins
You fixed your N+1 with a JOIN FETCH. Good. Then you added a second collection to the entity and JOIN FETCH’d that too. Now your query returns orders * items * shipments rows. An order with 10 items and 3 shipments produces 30 rows for that single order. A thousand orders with similar distributions produce 30,000 rows. You traded N+1 for a Cartesian explosion.
The Lie
JOIN FETCH solves the N+1 problem. Use it on all your associations.
The Reality
JOIN FETCH works correctly for a single collection. The moment you JOIN FETCH two or more collections from the same root entity, the SQL join produces a cross product between those collections. Hibernate deduplicates the parent entity in memory, but the database still transmits every combination over the wire.
Worse: Hibernate 6 will throw a MultipleBagFetchException if both collections are List types. This is Hibernate protecting you from the Cartesian product, but the error message does not explain why. Developers often “fix” it by switching to Set, which suppresses the exception but does not suppress the Cartesian join.
The Evidence
@Entity
@Table(name = "orders")
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<OrderItem> items = new ArrayList<>();
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<Shipment> shipments = new ArrayList<>();
}
// BAD: This throws MultipleBagFetchException
@Query("SELECT o FROM Order o JOIN FETCH o.items JOIN FETCH o.shipments")
List<Order> findAllWithEverything();
// org.hibernate.loader.MultipleBagFetchException:
// cannot simultaneously fetch multiple bags
// BAD: "Fixing" by switching to Set suppresses the error but not the problem
@Entity
@Table(name = "orders")
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private Set<OrderItem> items = new HashSet<>();
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private Set<Shipment> shipments = new HashSet<>();
}
@Query("SELECT o FROM Order o JOIN FETCH o.items JOIN FETCH o.shipments")
List<Order> findAllWithEverything();
// Generated SQL:
// select o1_0.id, i1_0.order_id, i1_0.id, i1_0.product_name, i1_0.quantity,
// s1_0.order_id, s1_0.id, s1_0.carrier, s1_0.tracking_number
// from orders o1_0
// join order_items i1_0 on o1_0.id = i1_0.order_id
// join shipments s1_0 on o1_0.id = s1_0.order_id
For an order with 10 items and 3 shipments, this query returns 30 rows. Hibernate deduplicates them into one Order object, but the database, the network, and the JDBC driver all processed 30 rows. With 1,000 orders, that is 30,000 rows instead of the 1,000 + 10,000 + 3,000 = 14,000 rows that separate queries would produce.
The Fix
Fetch the collections in separate queries. This is counterintuitive because “more queries” sounds worse. It is not.
// BETTER: Two separate queries, no Cartesian product
@Service
public class OrderService {
@Autowired
private EntityManager em;
@Transactional(readOnly = true)
public List<Order> findAllWithEverything() {
// Query 1: fetch orders with items
List<Order> orders = em.createQuery(
"SELECT DISTINCT o FROM Order o JOIN FETCH o.items",
Order.class
).getResultList();
// Query 2: fetch shipments for the same orders
// Hibernate's persistence context ensures these attach to the same Order objects
em.createQuery(
"SELECT DISTINCT o FROM Order o JOIN FETCH o.shipments WHERE o IN :orders",
Order.class
).setParameter("orders", orders)
.getResultList();
return orders;
}
}
// Generated SQL:
// Query 1:
// select distinct o1_0.id, o1_0.customer_name, i1_0.order_id, i1_0.id, i1_0.product_name, i1_0.quantity
// from orders o1_0
// join order_items i1_0 on o1_0.id = i1_0.order_id
//
// Query 2:
// select distinct o1_0.id, o1_0.customer_name, s1_0.order_id, s1_0.id, s1_0.carrier, s1_0.tracking_number
// from orders o1_0
// join shipments s1_0 on o1_0.id = s1_0.order_id
// where o1_0.id in (?, ?, ?, ...)
Two queries, no Cartesian product. The persistence context (first-level cache) ensures that the second query’s results are attached to the same Order instances loaded by the first query. You do not need to merge them manually.
An alternative for Hibernate-specific code:
// BETTER: @Fetch(FetchMode.SUBSELECT) for automatic subselect loading
@Entity
@Table(name = "orders")
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private Set<OrderItem> items = new HashSet<>();
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
@Fetch(FetchMode.SUBSELECT)
private Set<Shipment> shipments = new HashSet<>();
}
// When shipments is accessed, Hibernate generates:
// select s1_0.order_id, s1_0.id, s1_0.carrier, s1_0.tracking_number
// from shipments s1_0
// where s1_0.order_id in (select o1_0.id from orders o1_0)
SUBSELECT replays the original parent query as a subquery in the IN clause. Elegant, but the subquery can be expensive if the parent query was complex.
The Cost Model
| Scenario (1,000 orders) | Rows transferred | Queries |
|---|---|---|
| JOIN FETCH both | 30,000 | 1 |
| Separate queries | 14,000 | 2 |
| @BatchSize(25) on both | 14,000 | 82 |
| SUBSELECT on second | 14,000 | 2 |
At 1,000 orders with 10 items and 3 shipments each: the Cartesian approach transfers 2.14x more data over the wire. The separate query approach uses one additional round trip, which at 1ms latency costs 1ms. The data savings at 30,000 vs 14,000 rows are far more significant.
At 100,000 orders: the Cartesian product produces 3 million rows. The separate query approach produces 1.4 million rows in two queries. At this scale, the Cartesian approach may also cause the database to spill to disk during the join, degrading query performance beyond just data transfer.
The rule: JOIN FETCH exactly one collection. For additional collections, use separate queries or SUBSELECT fetch mode.