Building Production-Ready Semantic Search: Implementing the Service Layer with Java and pgvector

The Service Layer: Where Separate Components Become a System

Ozioma Ochin introduces a production-style service layer architecture for semantic search APIs that coordinates JPA and OpenAI embedding clients. The system utilizes a ‘save-first’ state transition model to ensure that document metadata remains recoverable even if external API calls fail.

Why This Matters

In production systems, components often work in isolation but fail at the boundaries; the service layer acts as the source of truth for these transitions. By decoupling the API contract from the database schema using DTOs and interfaces, engineers can evolve search logic and metadata filters without breaking client integrations. Failure to manage these boundaries leads to silent data loss and inconsistent search states that are impossible to debug.

Key Insights

The ‘save-first, embed-second’ pattern prevents silent failures by recording a PENDING state in PostgreSQL before calling external APIs like OpenAI.
PostgreSQL cosine distance aliases cannot be referenced in a WHERE clause at the same query level, requiring a subquery for score threshold filtering.
Regex-based validation (^[a-zA-Z0-9_-]{1,64}$) for metadata keys is a critical security control to prevent SQL injection in dynamic JSONB path expressions.
A Document Lifecycle model (PENDING, READY, FAILED) doubles as a performance optimization by utilizing composite indexes on (status, created_at DESC).
Global exception handling via @RestControllerAdvice ensures consistent API error shapes, mapping technical failures to standard HTTP status codes like 404 and 400.

Working Examples

The service layer contract that decouples controllers from implementation details.

public interface DocumentService {\n  CreateDocumentResponse create(CreateDocumentRequest request);\n  DocumentResponse getById(Long id);\n  SearchResponse search(SearchRequest request);\n}

Transactional document creation ensuring persistence before external embedding calls.

@Override\n@Transactional\npublic CreateDocumentResponse create(CreateDocumentRequest request) {\n  Document saved = saveAsPending(request);\n  embedAndPersist(\n    saved.getId(),\n    saved.getTitle(),\n    saved.getContent()\n  );\n  return new CreateDocumentResponse(\n    saved.getId(),\n    DocumentStatus.READY\n  );\n}

Optimized pgvector search query using subqueries to handle cosine distance filtering.

SELECT * FROM (\n  SELECT id, title, content, metadata,\n  (embedding <=> ?::vector) AS cosine_distance\n  FROM documents\n  WHERE status = 'READY'\n  AND embedding IS NOT NULL\n  AND (metadata->>'category') = ?\n) AS sub\nWHERE (((1.0 - cosine_distance) + 1.0) / 2.0) >= ?\nORDER BY cosine_distance ASC\nLIMIT ? OFFSET ?;

Practical Applications

Use Case: Implementing a semantic search pipeline where documents are searchable only after vector generation completes to prevent stale results.
Pitfall: Embedding first and then saving; if the process fails, the document is lost without a trace, making debugging impossible.
Use Case: Dynamic metadata filtering in pgvector using QueryBuilder to append JSONB path expressions based on user input.
Pitfall: Direct string concatenation of user-provided metadata keys in SQL queries, leading to SQL injection vulnerabilities.

References:

https://dev.to/oozioma/the-service-layer-where-separate-components-become-a-system-4oeh

On This Page

The Service Layer: Where Separate Components Become a System

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Build Persistent AI Memory: A Guide to Mem0, OpenAI, and ChromaDB Integration

Production Node.js Caching: Implementing Redis, LRU, and CDN Edge Layers

Mid-Year Backend Reset: Optimizing Laravel Performance, Security, and Documentation for H2