When OpenSearch Is the Wrong Answer
When OpenSearch Is the Wrong Answer
The documentation platform runs on OpenSearch. Every search feature is built on OpenSearch. When the product manager asks for a “related documents” feature based on shared tags, the developer writes a terms aggregation with a sub-query. The implementation works but requires scanning the entire index for every “related documents” request. PostgreSQL’s JOIN on a document_tags table would return the same result in 2ms using an existing B-tree index.
When OpenSearch Is the Right Answer
OpenSearch excels when the workload has these characteristics:
- Full-text search with relevance ranking. No relational database matches OpenSearch’s analysis pipeline, BM25 scoring, and query DSL for text relevance.
- High query volume on denormalized data. OpenSearch’s inverted index and segment-based architecture serve reads efficiently at high concurrency.
- Faceted search with aggregations. Terms, range, and histogram aggregations on keyword fields are OpenSearch’s native strength.
- Schema flexibility. Adding fields to an OpenSearch mapping does not require downtime or migration (for additive changes).
- Horizontal scalability. Sharding distributes data and query load across nodes linearly.
When OpenSearch Is Not the Right Answer
Relational Queries
// FRAGILE: Simulating a JOIN in OpenSearch
// "Find all documents that share at least 2 tags with document X"
// Step 1: Fetch document X's tags
var docX = client.get(g -> g.index("docs-v1").id("doc-x"), DocPage.class);
List<String> tags = docX.source().tags();
// Step 2: Search for documents with matching tags
var response = client.search(s -> s
.index("docs-v1")
.query(q -> q.bool(b -> {
for (String tag : tags) {
b.should(sh -> sh.term(t -> t.field("tags").value(tag)));
}
b.minimumShouldMatch("2");
return b;
}))
.size(10),
DocPage.class
);
// This scans the inverted index for every tag value and intersects.
// PostgreSQL: SELECT * FROM documents d JOIN document_tags dt ...
// GROUP BY d.id HAVING COUNT(*) >= 2 → uses index, sub-millisecond.
If the primary access pattern is relational (JOINs, foreign key traversals, aggregations across related entities), PostgreSQL with proper indexing will be faster, simpler, and cheaper than OpenSearch.
Small Data Sets
For collections under 100,000 documents, PostgreSQL’s tsvector full-text search provides adequate relevance with zero operational overhead:
-- PostgreSQL full-text search: no additional infrastructure
ALTER TABLE documents ADD COLUMN search_vector tsvector;
CREATE INDEX idx_search ON documents USING GIN(search_vector);
UPDATE documents SET search_vector =
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B');
SELECT title, ts_rank(search_vector, query) AS rank
FROM documents, plainto_tsquery('english', 'connection timeout') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
This query executes in under 5ms on 100,000 documents. Adding OpenSearch for this workload adds operational complexity (cluster management, monitoring, synchronization) without meaningful search quality improvement.
Pure Vector Search
For workloads that are exclusively vector similarity search (recommendation engines, image search, embedding-based retrieval), purpose-built vector databases (Milvus, Weaviate, Pinecone) offer:
- Higher recall at lower latency for pure ANN queries
- More efficient vector index implementations (IVF-PQ, ScaNN)
- Built-in support for vector index management (automatic rebalancing, incremental indexing)
OpenSearch’s kNN implementation is adequate when vector search is one capability among many (hybrid search, aggregations, filtering). When vector search is the only capability, the overhead of OpenSearch’s general-purpose architecture adds cost without benefit.
Transactions and Consistency
OpenSearch is eventually consistent. A document indexed with refresh=false is not searchable for up to 1 second (the default refresh interval). For use cases requiring read-after-write consistency (e.g., “user saves a document and immediately sees it in search results”), OpenSearch requires refresh=true or refresh=wait_for, both of which degrade indexing throughput.
If the application requires ACID transactions, foreign key constraints, or strong consistency guarantees, those features belong in the relational database. OpenSearch can serve as a search view of the relational data via CDC (Chapter 18), but the relational database remains the system of record.
The Cost of Operational Complexity
| Component | OpenSearch Cluster | PostgreSQL (existing) |
|---|---|---|
| Nodes | 7 (3 manager, 4 data) | 0 additional |
| Monthly cost | $3,500 | $0 incremental |
| Monitoring | Dedicated dashboards | Existing DB monitoring |
| Backup | Snapshot repository | Existing DB backup |
| Schema migration | Reindex (hours) | ALTER TABLE (seconds) |
| Expertise required | Search engineering | General DBA |
For the documentation platform with 5 million documents, 50 tenants, and complex search requirements (multi-field relevance, faceted navigation, hybrid search), OpenSearch is the correct choice. The workload characteristics—full-text search with relevance, high query concurrency, aggregations—match OpenSearch’s strengths.
For a team adding a simple “search our blog posts” feature to a 500-post blog backed by PostgreSQL, adding OpenSearch is over-engineering. PostgreSQL’s tsvector with a GIN index handles this workload with zero additional infrastructure.
The Decision Rule
Evaluate the workload, not the technology. If the primary access pattern is full-text search with relevance ranking across more than 100,000 documents, OpenSearch is likely the right choice. If the primary access pattern is relational queries, transactional writes, or simple keyword matching on a small dataset, the existing relational database is likely sufficient.
Start with PostgreSQL’s full-text search. When search quality requirements exceed what tsvector can provide (multi-field boosting, synonym expansion, semantic search, complex analysis pipelines), migrate to OpenSearch with CDC synchronization. This progression avoids premature infrastructure complexity.
Never use OpenSearch as a primary data store. It is a search engine, not a database. The relational database is the system of record. OpenSearch is a derived, optimized view for search queries.