Search is a first-class feature in most applications. Databases are poor at full-text search — dedicated search engines like Elasticsearch handle it at scale with features like relevance ranking, faceting, and typo tolerance.
SELECT * FROM products WHERE description LIKE '%wireless headphones%';
Problems:
% can't use B-Tree indexesheadphons returns nothingThe industry standard. Distributed, scalable, feature-rich.
Key features:
Simpler, faster to set up, great typo tolerance. Better for smaller datasets and developer-friendly use cases.
Managed search-as-a-service. Excellent developer experience, very fast. Higher cost than self-hosted.
The core data structure. Instead of storing documents and scanning them, the engine maintains a mapping:
"wireless" → [doc 3, doc 7, doc 15]
"headphones" → [doc 1, doc 7, doc 22]
"audio" → [doc 1, doc 3, doc 8]
To find documents matching "wireless headphones", intersect the two lists: {3, 7}. Incredibly fast — no scanning.
Before indexing, text is processed:
The same pipeline runs on queries, so searches match the indexed tokens.
Not all matches are equal. Relevance scoring determines which documents appear first:
Search engines are secondary indexes — your source of truth is still your database. You must keep them in sync.
Write to database AND search engine in the same request.
Problem: What if the search engine write fails? You have data in DB but not in search.
Publish a "data changed" event to a message queue → Consumer reads the event and updates the search index.
DB Write → Kafka Event → Search Indexer → Elasticsearch
Benefits: Decoupled, retryable, search index can rebuild from event history.
Debezium or similar tools stream database changes (via the write-ahead log) to Kafka → Index consumer updates Elasticsearch. Zero application code changes.
User → API Gateway → Search Service → Elasticsearch Cluster
↑
Indexing Pipeline (Kafka Consumer)
↑
Kafka (change events)
↑
Primary Database (PostgreSQL)