The Semantic Leap
Keyword search finds documents containing specific words. Semantic search finds documents about specific concepts, regardless of vocabulary.
"How do I manage context?" might match documents about "context window management," "prompt engineering," "working memory," and "state handling"--even if none contain the exact query terms. The search understands meaning, not just strings.
This transformation is enabled by embeddings: dense vector representations that capture semantic content. Documents close in embedding space are close in meaning.
Implementing semantic search well requires understanding embeddings, configuring vector indices, tuning retrieval, and combining with traditional search.
Embedding Fundamentals
Embeddings map text to vectors. The mapping preserves semantic relationships: similar texts produce similar vectors.
How they work: Neural networks (transformers) process text through attention layers, producing a fixed-dimensional vector that represents the content's meaning.
What they capture: Semantic similarity, topic relatedness, conceptual proximity. "King" and "queen" are close. "Dog" and "cat" are close. "Dog" and "algorithm" are distant.
What they miss: Exact phrasing, negation (sometimes), fine-grained logical relationships. "The dog chased the cat" and "The cat chased the dog" might have similar embeddings despite opposite meanings.
Dimensionality trade-offs: More dimensions capture more nuance but cost more to store and compute. 384 to 1536 dimensions cover most use cases well.
Embeddings are lossy compression. A 500-word document becomes a 1536-dimensional vector. Information is lost. The goal is preserving the information most useful for retrieval.
pgvector Implementation
PostgreSQL's pgvector extension brings vector operations into the database:
Storage: Define vector columns with specific dimensions. embedding vector(1536).
Similarity search: Query by cosine similarity, Euclidean distance, or inner product. ORDER BY embedding <=> query_embedding LIMIT 10.
Indexing: HNSW or IVFFlat indices enable approximate nearest neighbor search at scale.
Integration: Vector search in the same transaction as other queries. Join with metadata, filter by attributes, combine with full-text search.
Setup is straightforward:
CREATE EXTENSION vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);The elegance is simplicity. No separate vector database. No synchronization. Standard PostgreSQL tooling works.
Index Selection: HNSW vs IVFFlat
Two main index types for vector search in pgvector:
IVFFlat: Inverted file index with flat quantization. Clusters vectors and searches within relevant clusters.
- Faster to build
- Uses less memory
- Slightly lower recall at same speed
HNSW: Hierarchical Navigable Small World graphs. Multi-layer graph structure for efficient approximate search.
- Slower to build
- Uses more memory
- Higher recall at same speed
- Better for query-heavy workloads
For most knowledge management systems, HNSW is the better choice. The build time and memory overhead are acceptable. The query quality improvement matters more.
Configuration parameters (m for connections, ef_construction for build quality, ef_search for query effort) allow tuning the speed/quality trade-off.
Chunking Strategies
Documents often exceed ideal embedding size. Chunking splits them:
Fixed-size chunks: Split at N tokens regardless of content. Simple but can break mid-sentence or mid-concept.
Semantic chunks: Split at natural boundaries--paragraphs, sections, topic shifts. Preserves coherence but varies in size.
Overlapping chunks: Include overlapping content between consecutive chunks. Reduces boundary artifacts.
Hierarchical chunking: Multiple levels--document summary, section summaries, individual paragraphs. Retrieve at different granularities.
Document + chunk: Embed both the full document and chunks within it. Document embedding for broad matching, chunk embeddings for precise localization.
The right strategy depends on your content. Long-form articles benefit from semantic chunking. Code benefits from function-level chunking. Conversations benefit from turn-level chunking.
Hybrid Search Implementation
Combining semantic and keyword search produces better results than either alone:
Parallel retrieval: Run vector search and full-text search simultaneously. Combine results.
Reciprocal Rank Fusion:
RRF_score = sum(1 / (k + rank_in_list)) for each retrieval method
Where k is typically 60. Higher RRF scores indicate documents ranked highly by multiple methods.
Score combination: Normalize scores from each method, combine with weights.
combined_score = alpha * semantic_score + (1 - alpha) * keyword_score
Alpha around 0.7 works well for most cases--weight semantic search more heavily but let keyword search boost exact matches.
Conditional fusion: Analyze the query first. Factual queries weight keyword search higher. Conceptual queries weight semantic search higher.
Hybrid search is particularly valuable when your vocabulary includes jargon, proper nouns, or domain-specific terms that semantic models might not capture well.
Relevance Tuning
Raw retrieval isn't enough. Relevance tuning improves result quality:
Query expansion: Augment queries with synonyms, related terms, or rephrased versions. Increases recall.
Metadata filtering: Restrict results by date, source, type. A query about "recent changes" should weight recent documents.
Personalization: Weight results by user history, preferences, or context.
Re-ranking: Use a more sophisticated model to reorder initial results. Cross-encoders score query-document pairs more accurately than embedding similarity.
Diversity: Ensure results cover different aspects of a topic rather than clustering on one interpretation.
Each technique adds complexity and latency. Apply them strategically based on what matters for your use case.
Handling Edge Cases
Semantic search fails predictably in certain situations:
Out-of-vocabulary terms: New jargon, typos, unusual proper nouns. Keyword fallback helps.
Negation: "Not about Python" might still retrieve Python documents. Query analysis can detect and handle negation.
Specificity mismatch: Query about "Python 3.11 datetime parsing" retrieves general Python documents. More specific chunking and metadata filtering help.
Ambiguity: "Apple" the company or the fruit? Context from the broader session can disambiguate.
Empty results: The knowledge base doesn't contain relevant content. Graceful handling rather than hallucinated results.
Building robust systems means handling these cases explicitly rather than hoping they don't occur.
Evaluation and Iteration
Semantic search quality requires measurement:
Retrieval metrics: Recall@k, Precision@k, Mean Reciprocal Rank (MRR), Normalized Discounted Cumulative Gain (NDCG).
End-to-end metrics: Does retrieved content lead to correct answers? User satisfaction with results?
A/B testing: Compare retrieval configurations on real queries.
Error analysis: Sample failed retrievals. Why didn't the right documents surface?
Embedding quality checks: Do similar documents have similar embeddings? Do dissimilar documents have different embeddings?
Iteration based on measurement beats intuition-based tuning. What feels like it should work often doesn't.