Skip to content
B1

Graph RAG Architecture

Part of Knowledge Graph Architecture

Graph RAG extends retrieval-augmented generation beyond flat document search by combining vector similarity with graph traversal, producing richer context through relationship-aware retrieval.

Beyond Flat Document Search

Standard retrieval-augmented generation treats documents as independent chunks. Embed them, store them in a vector database, retrieve the most similar chunks for any query.

This works for simple lookups but misses something crucial: relationships. Documents don't exist in isolation. Concepts connect. Ideas reference each other. Understanding grows through connections, not just individual facts.

Graph RAG extends retrieval to include structure. Not just "what documents are similar to this query" but "what documents are connected to relevant concepts" and "what paths through knowledge lead here."

The architecture is more complex than flat search. The results are substantially better for knowledge-intensive tasks.


The Core Architecture

A Graph RAG system has several components:

Document store: The actual content--notes, documents, code, transcripts. Stored as searchable text.

Vector index: Embeddings of documents or chunks. Enables semantic similarity search.

Knowledge graph: Entities and relationships extracted from documents. Nodes are concepts; edges are connections.

Query processor: Takes user queries, retrieves relevant content, and synthesizes responses.

Ingestion pipeline: Processes new documents, generates embeddings, extracts entities and relationships.

The integration is where value emerges. Vector search finds semantically similar content. Graph traversal finds structurally connected content. Combining both produces richer retrieval than either alone.


PostgreSQL as Foundation

While specialized graph databases exist, PostgreSQL with extensions provides a capable foundation:

pgvector extension: Enables vector storage and similarity search directly in PostgreSQL. No separate vector database required.

Native JSON support: Store flexible metadata alongside structured data.

Full-text search: Built-in text search with ranking and relevance scoring.

Graph queries with recursive CTEs: Common Table Expressions handle graph traversal without a dedicated graph database.

ACID transactions: Reliable consistency for knowledge that matters.

The advantage: one database for everything. No synchronization between separate systems. No operational complexity of multiple data stores. Standard SQL tooling works.

This stack handles millions of documents. It only becomes inadequate at scales beyond what most personal or small-team knowledge systems ever reach.


Entity and Relationship Extraction

The graph layer requires extracting structure from unstructured content:

Entity recognition: Identify notable things--people, concepts, technologies, projects. This can be rule-based (regex for patterns), ML-based (named entity recognition models), or LLM-based (ask a model to extract entities).

Relationship extraction: Identify connections between entities. "X depends on Y." "A is an instance of B." "C contradicts D."

Coreference resolution: Recognize when different mentions refer to the same entity. "Claude," "the model," "it" all pointing to the same thing.

Schema alignment: Map extracted entities to a consistent ontology. Is this "Python" the same as "python programming" and "Python 3.11"?

Extraction isn't perfect. You trade precision for recall. More aggressive extraction finds more connections but includes more noise. Conservative extraction misses connections but maintains accuracy.

The right balance depends on your use case. Personal knowledge management tolerates some noise. Production systems serving users need higher precision.


Hybrid Search Strategy

Neither vector search nor graph traversal alone is sufficient. Hybrid search combines them:

Vector retrieval: Find documents semantically similar to the query. Good for "documents about X" queries.

Keyword retrieval: Find documents containing specific terms. Good for precise matching when you know the vocabulary.

Graph expansion: Given retrieved documents, follow links to related content. Find what's connected, not just what's similar.

Reranking: Score combined results and return the most relevant.

The fusion strategy matters:

Reciprocal rank fusion (RRF): Combine rankings by summing reciprocal ranks. Simple and effective.

Score interpolation: Weight and combine similarity scores from different retrieval methods.

Two-stage retrieval: Broad first stage retrieves candidates; focused second stage reranks.

Hybrid search with graph expansion typically outperforms pure vector search significantly on knowledge-intensive tasks. The exact improvement depends on how well your graph captures the relevant structure.


Query Processing Pipeline

When a query arrives, the system processes it through stages:

Query understanding: What type of question is this? Factual lookup? Synthesis across sources? Exploration of a topic?

Retrieval planning: Which retrieval methods apply? How many documents to retrieve? What depth of graph traversal?

Retrieval execution: Run vector search, keyword search, graph queries. Combine results.

Context assembly: Select which retrieved content to include in the prompt. Manage context budget.

Generation: Produce the response using retrieved context.

Citation linking: Connect claims in the response to source documents.

Each stage can be tuned independently. Query understanding can be a classifier or an LLM call. Retrieval planning can be rule-based or learned. Context assembly involves the trade-offs discussed in Cluster A.


Embedding Selection and Configuration

Embeddings are fundamental to semantic search. Choices matter:

Model selection: General-purpose models (OpenAI embeddings, sentence-transformers) versus domain-specific fine-tuned models.

Dimensionality: Higher dimensions capture more nuance but cost more to store and search. 384 to 1536 dimensions are common.

Chunking strategy: How to split documents for embedding. Fixed size, semantic boundaries, hierarchical (document + section + paragraph).

Normalization: Normalize vectors for cosine similarity. Affects both storage and search.

For most knowledge management applications, general-purpose embeddings work well. Domain-specific fine-tuning helps when your vocabulary diverges significantly from common English.

Chunking deserves attention. Chunks too small lose context. Chunks too large dilute relevance. Overlapping chunks with parent document references often work better than simple fixed-size splitting.


Maintenance and Evolution

A knowledge graph isn't a static artifact. It evolves as knowledge grows:

Incremental updates: New documents add nodes and edges without reprocessing everything.

Entity merging: When you realize two entities are the same, merge them without breaking links.

Relationship updates: Connections change. What was accurate becomes outdated. Updates propagate.

Quality maintenance: Periodic review of entity quality, relationship accuracy, embedding drift.

Schema evolution: Your ontology improves over time. Migration without breaking existing queries.

Good architecture makes evolution cheap. Bad architecture makes every change expensive. Design for change from the start.

B4 (Incremental Indexing) and B6 (Graph Evolution Strategies) cover these topics in depth.


Performance Considerations

Knowledge graphs can become slow without attention to performance:

Index design: Vector indices (HNSW, IVF), text indices (GIN, GiST), graph indices for traversal.

Query optimization: EXPLAIN ANALYZE reveals bottlenecks. Optimize the slow queries.

Caching: Frequently accessed embeddings, common query results, hot paths in the graph.

Batch processing: Ingestion in batches rather than one document at a time.

Connection pooling: Database connections are expensive. Reuse them.

The PostgreSQL ecosystem provides mature tooling for performance analysis and optimization. The work isn't glamorous but determines whether your system stays responsive as knowledge grows.