negrini.io

Queries as Interface

Every interaction with a knowledge graph starts with a query. The query determines what gets retrieved. What gets retrieved determines what the AI agent or human user sees. The quality of queries determines the quality of knowledge access.

Query patterns aren't just database optimization. They're interface design. Getting them right means the difference between a knowledge system that frustrates and one that illuminates.

Common Query Types

Knowledge graphs support several fundamental query patterns:

Semantic similarity: "Find documents similar to this query/document."

Vector search against embeddings
Returns ranked list by similarity score

Keyword match: "Find documents containing these specific terms."

Full-text search with ranking
Good for precise terminology, proper nouns, code identifiers

Entity lookup: "What do we know about X?"

Find the entity node
Retrieve connected documents and relationships
Aggregate information from multiple sources

Relationship query: "How is X related to Y?"

Path finding between entities
Relationship type filtering
Useful for understanding connections

Temporal query: "What's changed recently?" "What was true at time T?"

Filter by creation/modification date
Version history queries
Trend analysis

Aggregation query: "How many documents about X?" "What topics cluster together?"

Statistical queries over the graph
Community detection
Coverage analysis

Different use cases emphasize different patterns. A retrieval-augmented generation system leans on semantic similarity. An entity-centric assistant emphasizes entity lookup. A research tool needs relationship queries.

Hybrid Query Construction

Most effective queries combine multiple patterns:

Semantic + keyword: Find documents semantically similar to the query AND containing specific terms. Captures both conceptual relevance and precise matches.

sql

SELECT d.*,
       (embedding <=> query_embedding) as semantic_score,
       ts_rank(to_tsvector(content), query) as keyword_score
FROM documents d
WHERE embedding <=> query_embedding < 0.3
  AND to_tsvector(content) @@ query
ORDER BY 0.7 * semantic_score + 0.3 * keyword_score DESC
LIMIT 10;

Entity + semantic: Find documents about a specific entity that are also semantically relevant to the current question.

Temporal + semantic: Find recent documents similar to the query. Useful when recency matters.

Graph expansion + similarity: Start with semantically similar documents, then expand to graph neighbors for broader context.

The art is knowing when to combine and with what weights.

Query Optimization Techniques

Fast queries make knowledge systems usable. Slow queries make them frustrating.

Index usage: Ensure queries use appropriate indices. EXPLAIN ANALYZE reveals whether vector indices and text indices are actually being used.

Filter before rank: Apply cheap filters (date range, source type) before expensive operations (vector similarity, full-text ranking).

Limit early: Retrieve only what you need. LIMIT clauses prevent over-fetching.

Batch queries: If you need results for multiple queries, batch them. One round-trip beats many.

Precompute common queries: If certain queries run frequently, cache results or materialize views.

Connection pooling: Database connections are expensive. Reuse them.

Query parameterization: Use prepared statements. Avoids repeated parsing and enables query plan caching.

Retrieval Strategy Design

Beyond individual query optimization, overall retrieval strategy affects quality:

Top-k vs threshold: Retrieve the top k most relevant documents, or retrieve all documents above a relevance threshold? Top-k gives consistent volume; threshold gives consistent quality.

Breadth vs depth: Retrieve many moderately relevant documents or fewer highly relevant ones? Depends on downstream use--synthesis wants breadth, factual lookup wants depth.

Diversification: Ensure retrieved documents cover different aspects of the query rather than redundantly covering one aspect.

Source balancing: If documents come from multiple sources, ensure balanced representation. Don't let one verbose source dominate.

Fallback strategies: When primary retrieval returns nothing useful, what's the backup? Broader search? Different method? Graceful "I don't know"?

Context Assembly

Retrieved documents feed into context for AI agents. Assembly choices matter:

Document ordering: Earlier positions in context may get more attention (though "lost in the middle" is real). Put most relevant content first and last.

Summarization vs full content: For long documents, summarize or truncate. Full content preserves detail but consumes budget.

Metadata inclusion: Include source, date, type metadata. Helps the agent understand and cite sources.

Relationship inclusion: When documents are connected by graph relationships, note the connections. "Document A links to Document B because..."

Budget management: Track token usage. Stop adding documents when budget is reached, prioritizing by relevance.

Context assembly is where retrieval meets generation. Poor assembly wastes good retrieval.

Query Templates

Common queries deserve templates:

Question answering: Semantic search for the question, retrieve top-k, assemble context, generate answer with citations.

Entity summary: Entity lookup, retrieve all connected documents, aggregate facts, synthesize summary.

Comparison: Parallel retrieval for two entities/topics, identify differences and similarities.

Timeline construction: Temporal query filtered by entity or topic, order by date, assemble narrative.

Gap analysis: Query for a topic, identify what's well-covered vs sparse, suggest areas for expansion.

Templates standardize quality and make the system easier to extend. New use cases combine existing templates.

Monitoring and Debugging

When queries underperform, you need visibility:

Query logging: Record queries, results, latency, user feedback. Essential for debugging and optimization.

Performance metrics: p50, p95, p99 latency. Query volume over time. Error rates.

Result quality tracking: User feedback, click-through rates, downstream task success rates.

A/B testing infrastructure: Compare query variations on real traffic.

Explain plans: Database EXPLAIN output shows why queries are slow.

Relevance debugging: For queries with poor results, examine what was retrieved and why expected results didn't surface.

Without measurement, optimization is guesswork.

Performance Boundaries

Know the limits of your system:

Query latency targets: What's acceptable? Real-time chat needs sub-second. Batch processing tolerates minutes.

Throughput limits: How many concurrent queries can the system handle? Where's the bottleneck?

Index size limits: How large can indices grow before performance degrades?

Memory constraints: Vector indices are memory-intensive. What's available?

Understanding boundaries helps you stay within them and plan for growth.

Scaling Strategies

When performance limits approach:

Read replicas: Distribute query load across multiple database replicas.

Caching: Cache frequent queries and expensive computations.

Sharding: Partition data across multiple databases.

Approximate methods: Accept slightly lower precision for faster results.

Precomputation: Calculate results ahead of time for known query patterns.

Scale when you need to, not before. Premature optimization wastes effort. But know the scaling path so growth doesn't catch you off guard.