Projection Queries: Selecting Less, Getting More
Projection Queries: Selecting Less, Getting More
The Lie
findAll() and findById() return entities. Entities have all the data you need. That is the point of an ORM.
The Reality
Every Spring Data repository method that returns an entity type executes SELECT *. Your Product entity has 25 columns, including a description TEXT field averaging 4KB. Your product listing page shows the name, price, and category. You load 50 products. That is 50 × 4KB = 200KB of description text transferred from the database, deserialized by JDBC, hydrated by Hibernate, stored in the persistence context (twice, if not read-only), serialized to JSON (where you ignore it), and then garbage collected.
For a single request, this is negligible. At 100 requests/second, it is 20MB/second of wasted database I/O, wasted heap, and wasted GC pressure.
Hibernate tracks every loaded entity in the persistence context. Each entity gets a snapshot for dirty checking (unless readOnly). Loading 25 columns instead of 3 means the persistence context is 8x larger than it needs to be.
The left panel shows what a full entity load allocates for 50 products: 225 KB transferred over the wire, 450 KB of heap (entity instances plus dirty-check snapshots), with 22 of 25 columns unused. The right panel shows the DTO projection for the same 50 rows: 5 KB transferred, 5 KB heap, 3 of 3 columns used. At 100 requests/second, the full entity path allocates 22 MB/s of heap that the GC must collect. The DTO path allocates 0.5 MB/s.
The Evidence
// BAD: Loading full entities for a listing page
@RestController
public class ProductController {
@GetMapping("/products")
public List<ProductListItem> listProducts(Pageable pageable) {
Page<Product> products = productRepository.findAll(pageable);
// Every Product is fully hydrated: 25 columns, including
// description (TEXT), specifications (JSONB), audit fields
// We use 3 fields. We loaded 25.
return products.stream()
.map(p -> new ProductListItem(p.getId(), p.getName(),
p.getPrice()))
.toList();
}
}
// Generated SQL:
// select p.id, p.name, p.price, p.category, p.description,
// p.specifications, p.image_url, p.created_at, p.updated_at,
// p.created_by, p.sku, p.weight, p.dimensions, p.stock_count,
// p.min_order_qty, p.max_order_qty, p.supplier_id, p.brand,
// p.rating, p.review_count, p.is_active, p.tax_category,
// p.shipping_class, p.meta_title, p.meta_description
// from products p
// order by p.id
// limit 50 offset 0
The query transfers all 25 columns. Hibernate hydrates all 25 fields. The persistence context stores snapshots of all 25 fields. The controller uses 3 fields.
The Fix
DTO Projection with Constructor Expression
The most efficient approach. Hibernate executes a targeted SELECT and constructs your DTO directly, bypassing entity hydration and persistence context storage entirely.
// BETTER: DTO projection
public record ProductListItem(Long id, String name, BigDecimal price) {}
public interface ProductRepository extends JpaRepository<Product, Long> {
@Query("""
SELECT new com.example.dto.ProductListItem(p.id, p.name, p.price)
FROM Product p
WHERE p.isActive = true
ORDER BY p.name
""")
Page<ProductListItem> findActiveProductList(Pageable pageable);
}
// Generated SQL:
// select p.id, p.name, p.price
// from products p
// where p.is_active = true
// order by p.name
// limit 50 offset 0
The DTO is not managed by the persistence context. No snapshot. No dirty checking. No entity lifecycle. The JDBC result set contains only 3 columns instead of 25. The record is constructed directly from the result set.
Interface-Based Projection
Spring Data supports interface projections. Fewer lines of code, but with a hidden cost.
// ALTERNATIVE: Interface projection
public interface ProductListView {
Long getId();
String getName();
BigDecimal getPrice();
}
public interface ProductRepository extends JpaRepository<Product, Long> {
Page<ProductListView> findByIsActiveTrue(Pageable pageable);
}
// Generated SQL: Same efficient SELECT with only needed columns
// select p.id, p.name, p.price
// from products p
// where p.is_active = true
// order by p.name
// limit 50 offset 0
The interface projection generates the same efficient SQL. The catch: Spring Data creates a JDK dynamic proxy for each result row. Proxy creation involves reflection, which adds CPU overhead per row. For 50 rows, immeasurable. For 50,000 rows in a batch report, the proxy overhead becomes visible.
Tuple Queries
For ad-hoc projections where creating a DTO class feels excessive:
// ALTERNATIVE: Tuple query
@Query("""
SELECT p.id as id, p.name as name, p.price as price
FROM Product p
WHERE p.category = :category
""")
List<Tuple> findByCategoryTuple(@Param("category") String category);
// Usage:
List<Tuple> results = repository.findByCategoryTuple("ELECTRONICS");
for (Tuple tuple : results) {
Long id = tuple.get("id", Long.class);
String name = tuple.get("name", String.class);
BigDecimal price = tuple.get("price", BigDecimal.class);
}
Tuple queries avoid proxy overhead and DTO class creation. The trade-off is type-unsafe access. A typo in tuple.get("naem", String.class) fails at runtime, not compile time.
The Cost Model
Measurements on PostgreSQL 16, 50-row result set, Product entity with 25 columns (including one TEXT averaging 4KB):
| Approach | Data Transferred | Heap per Result Set | Persistence Context | Proxy Overhead |
|---|---|---|---|---|
Full entity (findAll) | ~225 KB | ~450 KB (entity + snapshot) | Yes | None |
| Full entity (readOnly) | ~225 KB | ~225 KB (entity only) | Yes (read-only) | None |
DTO projection (new) | ~5 KB | ~5 KB | No | None |
| Interface projection | ~5 KB | ~8 KB | No | ~3 KB (proxies) |
| Tuple query | ~5 KB | ~6 KB | No | None |
The DTO projection transfers and allocates 45x less data than the full entity approach. At 100 requests/second, that is the difference between 22.5 MB/s and 0.5 MB/s of heap allocation, which directly affects GC pause frequency and duration.
Use full entity loading when you need to modify the entity and persist changes. Use DTO projections for everything else. The “everything else” category is larger than most codebases acknowledge. Read-heavy applications (which is most applications) should use projections for 80%+ of their queries.