LeetLLM
LearnFeaturesBlog
LeetLLM

Your go-to resource for mastering AI & LLM systems.

Product

  • Learn
  • Features
  • Blog

Legal

  • Terms of Service
  • Privacy Policy

© 2026 LeetLLM. All rights reserved.

All Topics
Your Progress
0%

0 of 155 articles completed

🛠️Computing Foundations0/6
NumPy and Tensor ShapesCUDA for ML TrainingMPS & Metal for ML on MacData Structures for AISQL and Data ModelingAlgorithms for ML Engineers
📊Math & Statistics0/8
Gradients and BackpropVectors, Matrices & TensorsLinear Algebra for MLAdam, Momentum, SchedulersProbability for Machine LearningStatistics and UncertaintyDistributions and SamplingHypothesis Tests, Intervals, and pass@k
📚Preparation & Prerequisites0/13
Neural Networks from ScratchCNNs from ScratchTraining & BackpropagationSoftmax, Cross-Entropy & OptimizationRNNs, LSTMs, GRUs, and Sequence ModelingAutoencoders and VAEsThe Transformer Architecture End-to-EndLanguage Modeling & Next TokensFrom GPT to Modern LLMsPrompt Engineering FundamentalsCalling LLM APIs in ProductionFirst AI App End-to-EndThe LLM Lifecycle
🧮ML Algorithms & Evaluation0/11
Linear Regression from ScratchLogistic Regression and MetricsDecision Trees, Forests, and BoostingReinforcement Learning BasicsValidation and LeakageClustering and PCACore Retrieval AlgorithmsDecoding AlgorithmsExperiment Design and A/B TestingPyTorch Training LoopsDataset Pipelines and Data Quality
📦Production ML Systems0/6
Feature Engineering for Production MLBatch and Streaming Feature PipelinesGradient Boosted Trees in ProductionRanking and Recommendation SystemsForecasting and Anomaly DetectionMonitoring Predictive Models
🧪Core LLM Foundations0/8
The Bitter Lesson & ComputeBPE, WordPiece, and SentencePieceStatic to Contextual EmbeddingsPerplexity & Model EvaluationFile Ingestion for AIChunking StrategiesLLM Benchmarks & LimitationsInstruction Tuning & Chat Templates
🧰Applied LLM Engineering0/23
Dimensionality Reduction for EmbeddingsCoT, ToT & Self-Consistency PromptingFunction Calling & Tool UseMCP & Tool Protocol StandardsPrompt Injection DefenseResponsible AI GovernanceData Labeling and Human FeedbackEvaluating AI AgentsProduction RAG PipelinesHybrid Search: Dense + SparseReranking and Cross-Encoders for RAGRAG Evaluation for Reliable AnswersLLM-as-a-Judge EvaluationBias & Fairness in LLMsHallucination Detection & MitigationLLM Observability & MonitoringExperiment Tracking with MLflow and W&BMixed Precision TrainingModel Versioning & DeploymentSemantic Caching & Cost OptimizationLLM Cost Engineering & Token EconomicsModel Gateways, Routing, and FallbacksDesign an Automated Support Agent
🎓Portfolio Capstones0/9
Capstone: Delivery ETA PredictionCapstone: Product RankingCapstone: Demand ForecastingCapstone: Image Damage ClassifierCapstone: Production ML PipelineCapstone: Document QACapstone: Eval DashboardCapstone: Fine-Tuned ClassifierCapstone: Production Agent
🧠Transformer Deep Dives0/8
Sentence Embeddings & Contrastive LossEmbedding Similarity & QuantizationScaled Dot-Product AttentionVision Transformers and Image EncodersPositional Encoding: RoPE & ALiBiLayer Normalization: Pre-LN vs Post-LNMechanistic InterpretabilityDecoding Strategies: Greedy to Nucleus
🧬Advanced Training & Adaptation0/16
Scaling Laws & Compute-Optimal TrainingPre-training Data at ScaleBuild GPT from Scratch LabContinued Pretraining for Domain ShiftSynthetic Data PipelinesSupervised Fine-Tuning PipelineDistributed Training: FSDP & ZeROLoRA & Parameter-Efficient TuningReward Modeling from Preference DataRLHF & DPO AlignmentConstitutional AI & Red TeamingRLVR & Verifiable RewardsKnowledge Distillation for LLMsModel Merging and Weight InterpolationPrompt Optimization with DSPyRecursive Language Models (RLM)
🤖Advanced Agents & Retrieval0/14
Vector DB Internals: HNSW & IVFAdvanced RAG: HyDE & Self-RAGGraphRAG & Knowledge GraphsRAG Security & Access ControlStructured Output GenerationReAct & Plan-and-ExecuteGuardrails & Safety FiltersCode Generation & SandboxingComputer-Use / GUI / Browser AgentsHuman-in-the-Loop Agent ArchitectureAI Coding Workflow with AgentsAgent Memory & PersistenceAgent Failure & RecoveryMulti-Agent Orchestration
⚡Inference & Production Scale0/20
Inference: TTFT, TPS & KV CacheMulti-Query & Grouped-Query AttentionKV Cache & PagedAttentionPrefix Caching and Prompt CachingFlashAttention & Memory EfficiencyContinuous Batching & SchedulingScaling LLM InferenceModel Parallelism for LLM InferenceModel Quantization: GPTQ, AWQ & GGUFLocal LLM DeploymentSLM Specialization & Edge DeploymentSpeculative DecodingLong Context Window ManagementContext EngineeringMixture of Experts ArchitectureMamba & State Space ModelsReasoning & Test-Time ComputeAdvanced MLOps & DevOps for AIGPU Serving & AutoscalingA/B Testing for LLMs
🏗️System Design Capstones0/9
Content Moderation SystemCode Completion SystemMulti-Tenant LLM PlatformLLM-Powered Search EngineVision-Language Models & CLIPMultimodal LLM ArchitectureDiffusion Models & Image GenerationReal-Time Voice AI AgentReasoning & Test-Time Compute
🎤AI Lab Interviewing0/4
AI Lab Coding Interview: Python SystemsAI Lab System Design InterviewAI Lab Behavioral InterviewAI Lab Technical Presentation
Back to Topics
LearnAdvanced Agents & RetrievalGraphRAG & Knowledge Graphs
🔍HardRAG & Retrieval

GraphRAG & Knowledge Graphs

Learn how GraphRAG uses entity graphs, hierarchical community reports, and embeddings to retrieve evidence for relationship-heavy and corpus-level questions.

36 min read
Learning path
Step 111 of 155 in the full curriculum
Advanced RAG: HyDE & Self-RAGRAG Security & Access Control

Advanced RAG made retrieval adaptive: rewrite the query, generate hypothetical evidence, critique retrieved context, and retry when evidence is weak. GraphRAG changes a different layer. It indexes relationship structure and summaries alongside text so a system can retrieve connected evidence or corpus-level themes, at additional extraction and query cost. This chapter explains when that tradeoff complements search and when it adds needless complexity.

Imagine running an online store with 50,000 customer support tickets. A shopper asks: "Why did my order take twelve days when the website promised five?" A simple retrieval system can find the ticket about that delay. But when an analyst asks: "What are the top three reasons our deliveries miss the promised window?" a small result set may represent only a few incidents. It doesn't itself provide coverage over recurring themes across the corpus.

Microsoft's GraphRAG architecture addresses that query class by building a graph-based index of entities and relationships, pregenerating hierarchical community reports, and text artifacts for retrieval [1][2][3]. In the original paper's global sensemaking evaluation, GraphRAG improved answer comprehensiveness and diversity over its vector RAG baseline [1]. Its index also supplies structured artifacts for entity-focused local context construction. A broader roadmap by Pan et al. places this in the wider effort to combine LLMs with structured knowledge graphs [4].

A diagram showing document chunks feeding a graph layer of entities, typed edges, and community reports, then basic, local, and global GraphRAG query paths. A diagram showing document chunks feeding a graph layer of entities, typed edges, and community reports, then basic, local, and global GraphRAG query paths.
GraphRAG keeps text retrieval, but adds graph artifacts and report layers for relationship-heavy and corpus-level queries.

The mental model is simple: vector RAG is a filing cabinet; GraphRAG is a case board. Vector search finds chunks that look like the question. A graph index records connections and summary layers that a query method can use when evidence spans multiple documents. Standard GraphRAG local search builds ranked mixed context from graph and text artifacts; a product that needs explicit path traversal must implement and evaluate that behavior.

Why small top-k retrieval can be insufficient

The global query problem

Consider a corpus of 10,000 e-commerce support tickets. We'll use this as our running example throughout the article.

Local queries (vector search handles well)

For a specific question about a single issue, vector search excels:

"What's the return policy for opened electronics?" Retrieval path: top-5 similar tickets, then answer from local context. This works because the needed evidence is concentrated in a few chunks.

The question is self-contained. A few chunks about returns and electronics contain everything the system needs.

Global queries (small top-k retrieval can miss coverage)

For a question that requires synthesizing across the entire corpus, vector search struggles:

"What are the top 3 recurring reasons orders miss the delivery promise?" Failure mode: no single chunk contains a cross-corpus summary, and a top-5 result set is unlikely to represent themes across 10,000 tickets. The resulting answer can be incomplete or misleading.

Why similarity-only retrieval struggles

Vector search retrieves the most similar chunks, not necessarily the most representative ones. When answering a global query, no single chunk may contain the complete answer. One way to create broader retrieval units is to add:

  1. Entity extraction across the entire corpus
  2. Relationship modeling between entities
  3. Community detection to find natural clusters of related information
  4. Hierarchical summarization at different levels of abstraction
FeatureVector Search (Standard RAG)GraphRAG
Available artifactsRanked text chunksText units, entities, relationships, community reports
Natural starting pointSpecific evidence lookup ("What is X?")Entity-focused or thematic analysis ("What are the trends?")
Context constructionSimilarity-ranked chunksRanked graph/text context or report map-reduce
Indexing workEmbedding and optional sparse indexAdditional extraction, clustering, reports, and embeddings
Query workRetrieval plus generationDepends on search mode; global search adds map-reduce calls

A concrete multi-hop example

Before we define terms, let's see the difference in action. Here's a tiny support corpus:

Ticket 1842: "Order #9021 shipped via FastCarrier. The package sat at the Memphis hub for five days because of a winter storm." Ticket 2031: "FastCarrier's Memphis hub uses outdated sort scanners. Replacement parts are back-ordered." Ticket 3155: "Order #9021 arrived twelve days late. Customer wants a refund on shipping fees."

Now ask:

"Why did Order #9021 arrive late, and is this likely to happen again?"

Illustrative vector-only result: Suppose retrieval finds Ticket 3155 (mentions "Order #9021" and "late") and Ticket 1842 (mentions "Order #9021" and "Memphis hub"), but not Ticket 2031 because it doesn't mention the order number. The answer can identify the weather delay yet miss the scanner problem.

Graph-enriched context: Suppose extraction produced these entities and relationships, with source-chunk provenance:

  • (Order #9021) --[shipped_via]--> (FastCarrier)
  • (Order #9021) --[delayed_at]--> (Memphis Hub)
  • (Memphis Hub) --[operated_by]--> (FastCarrier)
  • (Memphis Hub) --[affected_by]--> (Winter Storm)
  • (Memphis Hub) --[has_issue]--> (Outdated Scanners)
  • (Outdated Scanners) --[status]--> (Back-ordered Parts)

An implementation that supports evidence-backed traversal could start at Order #9021, follow delayed_at to Memphis Hub, and retrieve evidence about both Winter Storm and Outdated Scanners. It may then answer: "Order #9021 was delayed at the Memphis hub. The source material mentions a winter storm and also an unresolved scanner issue." It should not predict recurring delays unless source evidence supports that claim.

This is relationship-heavy retrieval: useful evidence is connected through entities that weren't all present in the query. Microsoft's standard local-search dataflow ranks graph and text artifacts into a context window; it isn't a promise that every answer executes a deterministic path. If explicit multi-hop paths matter, preserve edge provenance and test the traversal policy.

require-provenance-for-graph-paths.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Edge: 5 source: str 6 relation: str 7 target: str 8 source_chunks: tuple[str, ...] 9 10def cited_path(edges: list[Edge]) -> tuple[bool, list[str]]: 11 if any(not edge.source_chunks for edge in edges): 12 return False, [] 13 citations = sorted({chunk for edge in edges for chunk in edge.source_chunks}) 14 return True, citations 15 16path = [ 17 Edge("Order #9021", "delayed_at", "Memphis Hub", ("ticket-1842",)), 18 Edge("Memphis Hub", "has_issue", "Outdated Scanners", ("ticket-2031",)), 19] 20unsupported_path = path + [ 21 Edge("Outdated Scanners", "will_cause", "Future Delay", ()), 22] 23 24print("supported path:", cited_path(path)) 25print("unsupported prediction:", cited_path(unsupported_path))
Output
1supported path: (True, ['ticket-1842', 'ticket-2031']) 2unsupported prediction: (False, [])

What is a knowledge graph?

A knowledge graph represents information as a network of entities (nodes) and relationships (edges). Unlike a flat vector database that stores text chunks as isolated embeddings, a knowledge graph preserves the structure of how facts connect.

The fundamental unit is the triple: (Subject) - [Predicate] -> (Object). The edge direction should reflect the meaning of the relation itself, not some generic left-to-right convention.

From our running example:

  • (Order #9021) --[shipped_via]--> (FastCarrier)
  • (Memphis Hub) --[has_issue]--> (Outdated Scanners)
  • (Outdated Scanners) --[status]--> (Back-ordered Parts)

In production, you're usually working with a property graph rather than bare triples. That means nodes and edges carry metadata like type, description, source_chunk_ids, timestamps, or confidence scores. Those extra fields matter for filtering, provenance, and ranking during retrieval.

This graph structure mirrors how operations teams reason through associations. When a query mentions "Memphis hub," a retrieval system can use typed relationships and provenance to distinguish carrier delays, scanner outages, return-center capacity, and inventory transfers instead of relying on city-name similarity alone.

The GraphRAG pipeline

Phase 1: Graph construction (indexing)

The indexing phase transforms raw document chunks into a structured graph index. In Microsoft's GraphRAG docs, the standard pipeline extracts entities and relationships, optionally extracts claims, builds hierarchical communities, generates community reports, and embeds the resulting artifacts for downstream retrieval [2][3]:

A diagram showing GraphRAG indexing turning documents into chunks, entities and relationships, graph communities, and community reports, plus embedded artifacts for query-time retrieval. A diagram showing GraphRAG indexing turning documents into chunks, entities and relationships, graph communities, and community reports, plus embedded artifacts for query-time retrieval.
Standard GraphRAG pays substantial cost at indexing time. Documents become chunks, chunks become graph structure, and resulting artifacts are embedded for retrieval.

Step 1: Entity & relationship extraction

A diagram showing one support-ticket chunk turning into typed entities and typed relationships, with provenance preserved for later retrieval and debugging. A diagram showing one support-ticket chunk turning into typed entities and typed relationships, with provenance preserved for later retrieval and debugging.
The extraction stage converts isolated support-ticket text into nodes and edges. The missing link in vector search becomes explicit graph structure.

The first step is to pass each document chunk to an extraction system that identifies key entities and the relationships between them. In standard GraphRAG, this is LLM-driven. In FastGraphRAG, some of this work is replaced with cheaper NLP-based heuristics to cut indexing cost [5]. A simple extraction prompt looks like this:

step-1-entity-and-relationship-extraction.py
1EXTRACTION_PROMPT = """ 2Extract all entities and relationships from the following text. 3 4Entities should include: people, organizations, products, concepts, locations. 5Relationships should include: works_at, uses, depends_on, causes, relates_to. 6 7Text: {chunk_text} 8 9Output as JSON: 10{ 11 "entities": [ 12 {"name": "...", "type": "...", "description": "..."} 13 ], 14 "relationships": [ 15 { 16 "source": "...", 17 "target": "...", 18 "type": "...", 19 "description": "...", 20 "strength": 1.0 21 } 22 ] 23} 24"""

In production, don't rely on raw string prompting for JSON. Use structured output frameworks like Instructor (Python) or Zod (TypeScript) with function calling or tool-use APIs to enforce schema adherence. This prevents syntax errors when processing thousands of chunks.

The runnable version below uses a deterministic extractor so you can test the graph contract without calling an LLM. In production, replace extract_from_chunk() with a structured-output model call and keep the same Entity and Relationship objects.

step-1-entity-and-relationship-extraction-2.py
1from dataclasses import dataclass, field 2 3@dataclass 4class Entity: 5 name: str 6 type: str 7 description: str 8 source_chunk_ids: list[int] = field(default_factory=list) 9 10@dataclass 11class Relationship: 12 source: str 13 target: str 14 type: str 15 description: str 16 confidence: float 17 source_chunk_ids: list[int] = field(default_factory=list) 18 19def canonicalize(name: str) -> str: 20 return name.lower().replace(".", "").replace(" ", "") 21 22def extract_from_chunk(chunk_id: int, chunk: str) -> tuple[list[Entity], list[Relationship]]: 23 entities: list[Entity] = [] 24 relationships: list[Relationship] = [] 25 26 if "Order #9021" in chunk: 27 entities.append(Entity("Order #9021", "Order", "Delayed customer order", [chunk_id])) 28 if "FastCarrier" in chunk or "Fast Carrier" in chunk: 29 entities.append(Entity("FastCarrier", "Carrier", "Shipping carrier", [chunk_id])) 30 if "Memphis hub" in chunk or "Memphis Hub" in chunk: 31 entities.append(Entity("Memphis Hub", "Warehouse", "Regional sorting hub", [chunk_id])) 32 if "winter storm" in chunk: 33 entities.append(Entity("Winter Storm", "Event", "Weather disruption", [chunk_id])) 34 if "outdated sort scanners" in chunk: 35 entities.append(Entity("Outdated Scanners", "EquipmentIssue", "Aging sort scanners", [chunk_id])) 36 if "back-ordered" in chunk: 37 entities.append(Entity("Back-ordered Parts", "SupplyIssue", "Replacement parts unavailable", [chunk_id])) 38 39 names = {entity.name for entity in entities} 40 if {"Order #9021", "FastCarrier"} <= names: 41 relationships.append( 42 Relationship("Order #9021", "FastCarrier", "shipped_via", "Carrier for order", 1.0, [chunk_id]) 43 ) 44 if {"Memphis Hub", "FastCarrier"} <= names: 45 relationships.append( 46 Relationship("Memphis Hub", "FastCarrier", "operated_by", "Carrier operates hub", 0.8, [chunk_id]) 47 ) 48 if {"Order #9021", "Memphis Hub"} <= names: 49 relationships.append( 50 Relationship("Order #9021", "Memphis Hub", "delayed_at", "Delay location", 1.0, [chunk_id]) 51 ) 52 if {"Memphis Hub", "Winter Storm"} <= names: 53 relationships.append( 54 Relationship("Memphis Hub", "Winter Storm", "affected_by", "Weather caused hub delay", 0.9, [chunk_id]) 55 ) 56 if {"Memphis Hub", "Outdated Scanners"} <= names: 57 relationships.append( 58 Relationship("Memphis Hub", "Outdated Scanners", "has_issue", "Scanner outage risk", 0.8, [chunk_id]) 59 ) 60 if {"Outdated Scanners", "Back-ordered Parts"} <= names: 61 relationships.append( 62 Relationship("Outdated Scanners", "Back-ordered Parts", "status", "Replacement parts unavailable", 0.7, [chunk_id]) 63 ) 64 65 return entities, relationships 66 67def deduplicate_entities(entities: list[Entity]) -> list[Entity]: 68 merged: dict[str, Entity] = {} 69 for entity in entities: 70 key = canonicalize(entity.name) 71 if key not in merged: 72 merged[key] = entity 73 continue 74 merged[key].source_chunk_ids.extend(entity.source_chunk_ids) 75 return list(merged.values()) 76 77def extract_graph_elements(chunks: list[str]) -> tuple[list[Entity], list[Relationship]]: 78 all_entities: list[Entity] = [] 79 all_relationships: list[Relationship] = [] 80 entity_name_by_key: dict[str, str] = {} 81 82 for chunk_id, chunk in enumerate(chunks): 83 entities, relationships = extract_from_chunk(chunk_id, chunk) 84 all_entities.extend(entities) 85 all_relationships.extend(relationships) 86 87 deduped_entities = deduplicate_entities(all_entities) 88 for entity in deduped_entities: 89 entity_name_by_key[canonicalize(entity.name)] = entity.name 90 91 resolved_relationships = [ 92 Relationship( 93 source=entity_name_by_key[canonicalize(rel.source)], 94 target=entity_name_by_key[canonicalize(rel.target)], 95 type=rel.type, 96 description=rel.description, 97 confidence=rel.confidence, 98 source_chunk_ids=rel.source_chunk_ids, 99 ) 100 for rel in all_relationships 101 ] 102 103 return deduped_entities, resolved_relationships 104 105chunks = [ 106 "Order #9021 shipped via FastCarrier. The package sat at Memphis hub for five days because of a winter storm.", 107 "Fast Carrier's Memphis hub uses outdated sort scanners. Replacement parts are back-ordered.", 108] 109 110entities, relationships = extract_graph_elements(chunks) 111entity_names = {entity.name for entity in entities} 112relationship_types = {rel.type for rel in relationships} 113 114print("entities:", ", ".join(sorted(entity_names))) 115print("relationships:", ", ".join(sorted(relationship_types)))
Output
1entities: Back-ordered Parts, FastCarrier, Memphis Hub, Order #9021, Outdated Scanners, Winter Storm 2relationships: affected_by, delayed_at, has_issue, operated_by, shipped_via, status

Step 2: Build the knowledge graph

Once the entities and relationships are extracted and deduplicated, we can construct the graph data structure. Mapping isolated facts from individual chunks into an interconnected network gives later retrieval policies access to relationships that weren't explicit in any single chunk.

This function takes the lists of entities and relationships as input. Using a graph processing library like NetworkX, it adds each entity as a node and each typed relationship as a directed edge with provenance and weighting metadata:

step-2-build-the-knowledge-graph.py
1from dataclasses import dataclass, field 2 3import networkx as nx 4 5@dataclass 6class Entity: 7 name: str 8 type: str 9 description: str 10 source_chunk_ids: list[int] = field(default_factory=list) 11 12@dataclass 13class Relationship: 14 source: str 15 target: str 16 type: str 17 description: str 18 confidence: float 19 source_chunk_ids: list[int] = field(default_factory=list) 20 21def build_knowledge_graph( 22 entities: list[Entity], 23 relationships: list[Relationship] 24) -> nx.MultiDiGraph: 25 """Construct a directed property graph from extracted elements.""" 26 G = nx.MultiDiGraph() 27 28 for entity in entities: 29 G.add_node(entity.name, 30 type=entity.type, 31 description=entity.description, 32 source_chunk_ids=entity.source_chunk_ids) 33 34 for rel in relationships: 35 G.add_edge(rel.source, rel.target, 36 key=rel.type, 37 type=rel.type, 38 description=rel.description, 39 weight=rel.confidence, 40 source_chunk_ids=rel.source_chunk_ids) 41 42 return G 43 44entities = [ 45 Entity("Order #9021", "Order", "Delayed customer order", [0]), 46 Entity("FastCarrier", "Carrier", "Shipping carrier", [0]), 47 Entity("Memphis Hub", "Warehouse", "Regional sorting hub", [0, 1]), 48] 49relationships = [ 50 Relationship("Order #9021", "FastCarrier", "shipped_via", "Carrier for order", 1.0, [0]), 51 Relationship("Order #9021", "Memphis Hub", "delayed_at", "Delay location", 1.0, [0]), 52] 53 54graph = build_knowledge_graph(entities, relationships) 55 56print("nodes:", sorted(graph.nodes)) 57print("edges:", sorted((src, rel_type, dst) for src, dst, rel_type in graph.edges(keys=True))) 58print("Memphis Hub node type:", graph.nodes["Memphis Hub"]["type"]) 59print("shipped_via weight:", graph["Order #9021"]["FastCarrier"]["shipped_via"]["weight"])
Output
1nodes: ['FastCarrier', 'Memphis Hub', 'Order #9021'] 2edges: [('Order #9021', 'delayed_at', 'Memphis Hub'), ('Order #9021', 'shipped_via', 'FastCarrier')] 3Memphis Hub node type: Warehouse 4shipped_via weight: 1.0

Step 3: Community detection (Leiden algorithm)

After the graph is built, we need to find natural clusters of related entities. Think of it like grouping the support tickets into themes: shipping problems, refund problems, product defects. The difference is that the algorithm decides the groups based on how densely entities connect to each other, not on keyword matching.

The Leiden algorithm identifies these clusters. It's preferred over the classic Louvain method because Louvain can produce disconnected communities (nodes in the same group that aren't connected by edges). Traag et al. show that Leiden adds a refinement phase and guarantees connected communities [6]. That avoids a structural defect before report generation, although it can't correct bad entity extraction or unsupported edges. Microsoft's GraphRAG pipeline applies hierarchical Leiden recursively until communities hit a size threshold, which is how it gets both coarse and fine-grained views of the same corpus [3].

The algorithm optimizes modularity (QQQ), a measure of community structure quality in the extracted graph. In plain terms, modularity asks: "Are there more edges inside each group than we'd expect by chance?" A high QQQ indicates strong clustering in that graph; it doesn't prove that extraction captured the real-world topic correctly.

For an unweighted, undirected graph, the formula is:

Q=12m∑i,j[Aij−kikj2m]δ(ci,cj)Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)Q=2m1​∑i,j​[Aij​−2mki​kj​​]δ(ci​,cj​)

Where AijA_{ij}Aij​ is 1 if nodes iii and jjj are directly connected and 0 otherwise; kik_iki​ and kjk_jkj​ are the number of edges each node touches; mmm is the total number of edges in the graph; and δ(ci,cj)\delta(c_i, c_j)δ(ci​,cj​) is 1 if the two nodes are in the same community, 0 otherwise. Higher modularity indicates better community structure.

The runnable example below computes one weighted Leiden partition with python-igraph. Standard GraphRAG recursively applies community detection to build a hierarchy; keeping this example at one level makes the clustering contract visible and keeps it fast enough to run locally.

step-3-community-detection-leiden-algorithm.py
1import igraph as ig 2 3def detect_communities( 4 weighted_edges: list[tuple[str, str, float]], 5) -> list[list[str]]: 6 """Compute one Leiden partition of an undirected weighted entity graph.""" 7 node_names = sorted( 8 {node for source, target, _ in weighted_edges for node in (source, target)} 9 ) 10 graph = ig.Graph() 11 graph.add_vertices(node_names) 12 graph.add_edges([(source, target) for source, target, _ in weighted_edges]) 13 graph.es["weight"] = [weight for _, _, weight in weighted_edges] 14 partition = graph.community_leiden( 15 objective_function="modularity", 16 weights="weight", 17 ) 18 groups = [ 19 sorted(graph.vs[index]["name"] for index in community) 20 for community in partition 21 ] 22 return sorted(groups, key=lambda group: group[0]) 23 24weighted_edges = [ 25 ("Login", "Login Timeout", 1.0), 26 ("Login", "iOS", 1.0), 27 ("Crash", "Crash Loop", 1.0), 28 ("Payment", "Card Failure", 1.0), 29 ("Refund", "Auto Refund", 1.0), 30] 31groups = detect_communities(weighted_edges) 32 33print("community_count:", len(groups)) 34for group in groups: 35 print("-", ", ".join(group))
Output
1community_count: 4 2- Auto Refund, Refund 3- Card Failure, Payment 4- Crash, Crash Loop 5- Login, Login Timeout, iOS

The following diagram visualizes how entities are grouped into hierarchical communities, from specific concepts at the bottom to broad themes at the top:

A diagram showing small entity communities rolling up into broader product and billing reports, which GraphRAG can use for dataset-level questions. A diagram showing small entity communities rolling up into broader product and billing reports, which GraphRAG can use for dataset-level questions.
Leiden communities create a hierarchy. Low-level entity clusters roll up into broader reports that GraphRAG can search for dataset-level questions.

Step 4: Community summarization

This function takes the entities, their descriptions, and their relationships within a community as input. It passes these to a summarizer, returning a community report that can later be used for global search [1][3]:

step-4-community-summarization.py
1from dataclasses import dataclass 2from typing import Protocol 3 4@dataclass(frozen=True) 5class Entity: 6 name: str 7 description: str 8 9@dataclass(frozen=True) 10class Relationship: 11 source: str 12 type: str 13 target: str 14 15class CommunitySummarizer(Protocol): 16 def summarize(self, prompt: str) -> str: ... 17 18class FakeCommunitySummarizer: 19 def summarize(self, prompt: str) -> str: 20 return ( 21 "Memphis Hub delay cluster: FastCarrier orders are affected by storms " 22 "and scanner reliability issues." 23 ) 24 25def summarize_community( 26 community_entities: list[Entity], 27 community_relationships: list[Relationship], 28 level: int, 29 summarizer: CommunitySummarizer, 30) -> str: 31 entity_lines = "\n".join( 32 f"- {entity.name}: {entity.description}" for entity in community_entities 33 ) 34 relationship_lines = "\n".join( 35 f"- {rel.source} {rel.type} {rel.target}" for rel in community_relationships 36 ) 37 prompt = ( 38 f"Community level: {level}\n" 39 f"Entities:\n{entity_lines}\n" 40 f"Relationships:\n{relationship_lines}\n" 41 "Summarize the main operational pattern." 42 ) 43 return summarizer.summarize(prompt) 44 45summary = summarize_community( 46 [Entity("Memphis Hub", "Regional sorting hub")], 47 [Relationship("Memphis Hub", "has_issue", "Outdated Scanners")], 48 level=0, 49 summarizer=FakeCommunitySummarizer(), 50) 51 52print(summary)
Output
1Memphis Hub delay cluster: FastCarrier orders are affected by storms and scanner reliability issues.

Phase 2: Query processing (runtime)

GraphRAG's query engine exposes Basic Search, Local Search, Global Search, and DRIFT Search. Local and global search are the core mental model here: local search builds context for entity-centric questions, global search synthesizes over community reports, Basic Search is a text-focused baseline, and DRIFT combines community-level entry points with local exploration [7][8][9].

Local search (specific questions)

Local search isn't just "do NER on the query and grab neighboring nodes." In Microsoft's GraphRAG docs, local search maps the query into semantically related entities, then prioritizes a mixed context from connected entities, relationships, community reports, linked text units, and optionally covariates if claim extraction is enabled [7].

The runnable example below demonstrates a bounded neighborhood expansion for intuition. It's a custom graph-walk sketch, not an exact reproduction of the GraphRAG context builder or its ranking policy:

local-search-specific-questions.py
1import networkx as nx 2 3class EntityStore: 4 def __init__(self, entity_names: list[str]) -> None: 5 self.entity_names = entity_names 6 7 def similarity_search(self, query: str, k: int = 10) -> list[str]: 8 query_lower = query.lower() 9 matches = [ 10 entity for entity in self.entity_names if entity.lower() in query_lower 11 ] 12 return matches[:k] 13 14def expand_entity_neighborhood( 15 graph: nx.MultiDiGraph, entity_names: list[str], hops: int = 1 16) -> nx.MultiDiGraph: 17 nodes: set[str] = set(entity_names) 18 frontier: set[str] = set(entity_names) 19 for _ in range(hops): 20 next_frontier: set[str] = set() 21 for node in frontier: 22 next_frontier.update(graph.successors(node)) 23 next_frontier.update(graph.predecessors(node)) 24 nodes.update(next_frontier) 25 frontier = next_frontier 26 return graph.subgraph(nodes).copy() 27 28def render_local_context( 29 entities: list[str], 30 relationships: list[tuple[str, str, dict]], 31 text_units: dict[str, list[str]], 32 community_reports: dict[str, list[str]], 33) -> str: 34 return "\n".join( 35 [ 36 f"Entities: {entities}", 37 f"Relationships: {[(src, data['type'], dst) for src, dst, data in relationships]}", 38 f"Text units: {text_units}", 39 f"Reports: {community_reports}", 40 ] 41 ) 42 43def local_search( 44 query: str, 45 entity_store: EntityStore, 46 graph: nx.MultiDiGraph, 47 text_units: dict[str, list[str]], 48 community_reports: dict[str, list[str]], 49) -> str: 50 mapped_entities = entity_store.similarity_search(query, k=10) 51 neighborhood = expand_entity_neighborhood(graph, mapped_entities, hops=2) 52 neighborhood_entities = sorted(neighborhood.nodes) 53 context = render_local_context( 54 entities=neighborhood_entities, 55 relationships=list(neighborhood.edges(data=True)), 56 text_units={entity: text_units.get(entity, []) for entity in neighborhood_entities}, 57 community_reports={ 58 entity: community_reports.get(entity, []) for entity in neighborhood_entities 59 }, 60 ) 61 return f"Answer using local GraphRAG context:\n{context}" 62 63graph = nx.MultiDiGraph() 64graph.add_edge("Order #9021", "Memphis Hub", type="delayed_at") 65graph.add_edge("Memphis Hub", "Outdated Scanners", type="has_issue") 66 67answer = local_search( 68 "Why was Order #9021 late?", 69 EntityStore(["Order #9021", "Memphis Hub", "Outdated Scanners"]), 70 graph, 71 text_units={"Order #9021": ["Order arrived twelve days late."]}, 72 community_reports={"Memphis Hub": ["Hub has weather and scanner risks."]}, 73) 74 75print(answer)
Output
1Answer using local GraphRAG context: 2Entities: ['Memphis Hub', 'Order #9021', 'Outdated Scanners'] 3Relationships: [('Order #9021', 'delayed_at', 'Memphis Hub'), ('Memphis Hub', 'has_issue', 'Outdated Scanners')] 4Text units: {'Memphis Hub': [], 'Order #9021': ['Order arrived twelve days late.'], 'Outdated Scanners': []} 5Reports: {'Memphis Hub': ['Hub has weather and scanner risks.'], 'Order #9021': [], 'Outdated Scanners': []}

The actual context-builder problem includes a token budget: relevant entities, relationships, text units, and reports compete for room in one generation request. A production route needs ranking and provenance, not unbounded expansion.

pack-ranked-local-context-under-a-budget.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Candidate: 5 source: str 6 score: float 7 tokens: int 8 citation: str 9 10def pack_context(candidates: list[Candidate], token_budget: int) -> list[Candidate]: 11 selected: list[Candidate] = [] 12 used = 0 13 for candidate in sorted(candidates, key=lambda item: item.score, reverse=True): 14 if used + candidate.tokens <= token_budget: 15 selected.append(candidate) 16 used += candidate.tokens 17 return selected 18 19candidates = [ 20 Candidate("relationship: delayed_at", 0.98, 30, "ticket-1842"), 21 Candidate("text: scanner issue", 0.91, 55, "ticket-2031"), 22 Candidate("report: carrier overview", 0.62, 80, "community-7"), 23] 24selected = pack_context(candidates, token_budget=90) 25 26print("selected:", [item.source for item in selected]) 27print("citations:", [item.citation for item in selected]) 28print("tokens:", sum(item.tokens for item in selected))
Output
1selected: ['relationship: delayed_at', 'text: scanner issue'] 2citations: ['ticket-1842', 'ticket-2031'] 3tokens: 85

Global search (broad questions)

For queries that require cross-corpus understanding, GraphRAG uses a map-reduce pattern over community reports from a selected hierarchy level. The global-search docs batch community reports into chunks, produce rated intermediate points during the map step, then aggregate the highest-value points in the reduce step [1][8]:

global-search-broad-questions.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class RatedPoint: 5 text: str 6 rating: int 7 8def batch_reports(reports: list[str], batch_size: int = 2) -> list[list[str]]: 9 return [reports[index:index + batch_size] for index in range(0, len(reports), batch_size)] 10 11def map_report_batch(query: str, report_batch: list[str]) -> list[RatedPoint]: 12 points: list[RatedPoint] = [] 13 for report in report_batch: 14 report_lower = report.lower() 15 if "late" in report_lower or "delay" in report_lower: 16 points.append(RatedPoint(report, rating=9)) 17 elif "refund" in report_lower: 18 points.append(RatedPoint(report, rating=6)) 19 return points 20 21def select_top_points(points: list[RatedPoint], top_k: int) -> list[RatedPoint]: 22 return sorted(points, key=lambda point: point.rating, reverse=True)[:top_k] 23 24def global_search( 25 query: str, 26 reports_by_level: dict[int, list[str]], 27 level: int, 28) -> str: 29 """Answer dataset-level questions with map-reduce over community reports.""" 30 batches = batch_reports(reports_by_level[level], batch_size=2) 31 mapped_points = [ 32 point 33 for batch in batches 34 for point in map_report_batch(query, batch) 35 ] 36 top_points = select_top_points(mapped_points, top_k=3) 37 bullets = "\n".join(f"- {point.text}" for point in top_points) 38 return f"Top recurring themes for '{query}':\n{bullets}" 39 40reports_by_level = { 41 1: [ 42 "Late deliveries cluster around carrier weather delays.", 43 "Warehouse backlogs delay fragile item packing.", 44 "Refund complaints cluster around unclear label expiration.", 45 ] 46} 47 48answer = global_search( 49 "What are the main reasons orders miss the promised window?", 50 reports_by_level, 51 level=1, 52) 53 54print(answer)
Output
1Top recurring themes for 'What are the main reasons orders miss the promised window?': 2- Late deliveries cluster around carrier weather delays. 3- Warehouse backlogs delay fragile item packing. 4- Refund complaints cluster around unclear label expiration.

Hybrid graph-vector architecture

A useful production design doesn't treat vector search and GraphRAG as mutually exclusive. The GraphRAG stack already mixes graph structure with embeddings during both indexing and query-time context building [2][7]. One hybrid architecture looks like this:

A diagram showing a query router sending fact lookups to text units, entity questions to local graph neighborhoods, and corpus-trend questions to community reports before answer synthesis. A diagram showing a query router sending fact lookups to text units, entity questions to local graph neighborhoods, and corpus-trend questions to community reports before answer synthesis.
Evaluate text retrieval for direct questions and graph/report context for questions that need connected or corpus-level evidence.

A router is a policy to evaluate, not a guarantee. Measure supported-answer accuracy, latency, and spend for each query class. In a workload dominated by direct evidence lookups, a text route may cover most requests; an analyst-heavy workload may justify more graph/report queries.

release-graph-routes-from-evaluation.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class RouteResult: 5 route: str 6 query_class: str 7 supported_accuracy: float 8 p95_ms: int 9 10def release_route( 11 results: list[RouteResult], 12 query_class: str, 13 minimum_accuracy: float, 14 maximum_p95_ms: int, 15) -> str: 16 eligible = [ 17 result for result in results 18 if result.query_class == query_class 19 and result.supported_accuracy >= minimum_accuracy 20 and result.p95_ms <= maximum_p95_ms 21 ] 22 return max( 23 eligible, 24 key=lambda result: (result.supported_accuracy, -result.p95_ms), 25 ).route 26 27results = [ 28 RouteResult("basic", "fact_lookup", 0.96, 95), 29 RouteResult("local", "fact_lookup", 0.96, 240), 30 RouteResult("basic", "corpus_trend", 0.61, 92), 31 RouteResult("global", "corpus_trend", 0.91, 580), 32] 33 34print("fact route:", release_route(results, "fact_lookup", 0.90, 200)) 35print("trend route:", release_route(results, "corpus_trend", 0.90, 700))
Output
1fact route: basic 2trend route: global

One subtle but important point: GraphRAG the technique doesn't require a dedicated graph database. Microsoft's reference implementation writes structured output tables to disk and builds query context from those artifacts directly. A graph database like Neo4j or Neptune becomes useful when you need custom traversals, shared KG infrastructure, or analyst-facing graph queries outside the stock pipeline [10].

Query expansion and context bridging

Beyond simple routing, the knowledge graph layer can actively enrich vector results.

Query expansion uses entities found in initial vector results to expand the search query, finding semantically related but textually distinct content.

Context bridging is a custom retrieval option when two retrieved chunks don't directly connect. If edges retain supporting chunks, a traversal can identify intermediate entities for retrieval and citation. For example, if one chunk mentions "Order #9021 shipped via FastCarrier" and another mentions "FastCarrier's Memphis hub has outdated scanners," a policy can inspect Order #9021 -> FastCarrier -> Memphis Hub -> Outdated Scanners, then fetch source text before making a claim about the order.

Result ranking can boost text results with short, supported paths to query entities. Treat this as a ranker feature to evaluate, not proof that a nearby node supports the answer.

Performance considerations

Indexing cost

The most significant barrier to adopting standard GraphRAG is the upfront indexing cost. Compared with vector-only RAG, its standard pipeline adds LLM-heavy graph extraction, summarization, community report generation, and multiple embedding passes [1][3]. Whether the resulting query quality justifies that work is an evaluation question.

Mitigation strategies

To make this viable at scale, teams need to optimize the ingestion pipeline. Microsoft explicitly recommends starting with fast, inexpensive models while you learn the system, and their docs estimate that graph extraction (entity and relationship extraction plus their summarization) is roughly 75% of standard indexing cost [11][5]. If your use case is mostly global summarization, FastGraphRAG can reduce cost further: it replaces LLM-based entity extraction with NLP noun-phrase extraction (using libraries like NLTK or spaCy) and defines relationships by entity co-occurrence within a text unit. The graph is noisier and less reusable outside GraphRAG, but indexing is much cheaper [5].

LazyGraphRAG in practice

Microsoft Research introduced LazyGraphRAG in November 2024 [12]. It targets a central cost concern with standard GraphRAG: paying for an LLM-driven index before knowing how often graph-assisted queries will run.

LazyGraphRAG defers LLM use to query time. Its index uses NLP noun-phrase extraction and graph statistics for community structure, without entity summaries or precomputed community reports. In Microsoft's reported experiment, its indexing cost matched the vector RAG setup and was approximately 0.1% of full GraphRAG indexing cost [12]. Treat that as a benchmark result, not a universal constant for every corpus and deployment. At query time it blends vector similarity with community structure and exposes a relevance test budget that trades query cost for quality. Microsoft reports that, at evaluated budget levels, LazyGraphRAG was competitive with or better than the tested alternatives on local and global query criteria [12].

The practical takeaway is that precomputed community reports aren't the only candidate for global sensemaking. If the corpus changes frequently or global queries are rare, benchmark a lazy approach against standard GraphRAG. If reusable community reports are a product output, the standard index provides artifacts the lazy path intentionally omits.

Use observed quality and workload volume to compare indexing strategies rather than choosing from architecture labels:

choose-index-strategy-from-workload.py
1from dataclasses import dataclass 2 3@dataclass(frozen=True) 4class Strategy: 5 name: str 6 index_cost: float 7 query_cost: float 8 supported_accuracy: float 9 10def choose_strategy( 11 strategies: list[Strategy], query_count: int, accuracy_floor: float 12) -> tuple[str, float]: 13 passing = [ 14 strategy for strategy in strategies 15 if strategy.supported_accuracy >= accuracy_floor 16 ] 17 winner = min( 18 passing, 19 key=lambda strategy: strategy.index_cost + query_count * strategy.query_cost, 20 ) 21 total_cost = winner.index_cost + query_count * winner.query_cost 22 return winner.name, total_cost 23 24# Illustrative measured values from one product evaluation, not vendor benchmarks. 25strategies = [ 26 Strategy("lazy", index_cost=1.0, query_cost=1.7, supported_accuracy=0.91), 27 Strategy("precomputed_reports", index_cost=200.0, query_cost=0.8, supported_accuracy=0.93), 28] 29 30print("few queries:", choose_strategy(strategies, query_count=10, accuracy_floor=0.90)) 31print("many queries:", choose_strategy(strategies, query_count=500, accuracy_floor=0.90))
Output
1few queries: ('lazy', 18.0) 2many queries: ('precomputed_reports', 600.0)

Query cost

While indexing represents the bulk of the computational expense, runtime query costs can also be higher than standard vector search. The cost depends heavily on the query strategy used:

  • Local search: Usually moderate. You still have to build a mixed context from entity embeddings, graph neighborhoods, text units, and community reports, but the response path is much narrower than global search [7].
  • Global search: Potentially expensive. Cost grows with the number of community-report batches you need to map over and the hierarchy level you choose [8].

Mitigation strategies

To manage these runtime costs, standard GraphRAG precomputes community reports at multiple hierarchical levels. At query time, evaluate whether a coarse level answers the question adequately before paying for more detailed reports. Lower levels tend to yield more thorough responses, but they can also increase report volume and LLM work [8].

Graph maintenance

Knowledge graphs aren't static; they must evolve as the underlying document corpus changes. Keeping the graph in sync with a live, mutating dataset introduces significant engineering complexity. Current GraphRAG releases now expose explicit update flows and standard-update / fast-update methods, and the output tables include fields used for incremental update merges [13][10].

  • Additions: When new documents arrive, the system must extract entities and relationships, then merge them into the existing graph. Significant new connections may shift community structure, requiring reclustering and regeneration of affected reports.
  • Deletions: Removing a document isn't as simple as deleting a row in a database. The system must trace and remove nodes or edges supported solely by that document. If removed evidence changes graph structure, communities and reports may need recomputation.
  • Strategy: Even with update support, a deployment may prefer scheduled refreshes for structural artifacts. Entity merges, community boundaries, and report summaries can shift when new documents arrive, so online reclustering is harder to operate than plain vector re-indexing.

When to use GraphRAG

Because of the high indexing and maintenance costs, GraphRAG shouldn't be treated as a default replacement for all retrieval tasks; it's a specialized tool for complex analytical workloads. Evaluating the query profile of your application is essential before committing to a knowledge graph architecture. The following decision matrix highlights where each approach shines:

ScenarioStandard RAG (Vector)GraphRAG
Fact lookup ("What is the refund policy?")Strong baseline with low retrieval workAdditional structure may not pay off
Local context ("Summarize this specific ticket")Strong baselineUseful when entity-linked context matters
Global summary ("What are top 3 trends?")Small top-k may underrepresent themesDesigned to synthesize community reports
Relationship-heavy question ("How does X affect Y via Z?")Needs expansion or linked evidenceCan retrieve graph-linked evidence; test path behavior separately
Dataset sizeLower marginal indexing costCostly when graph extraction and reporting run over large corpora

Use this framework to decide:

Start with standard RAG (vector search)Consider GraphRAG
Simple fact lookup ("What's the refund policy?")Global summarization ("What are our top 3 customer complaints?")
Single-document questionsCross-document reasoning ("How does Project X impact Team Y's roadmap?")
Prototyping or small datasetsLarge, stable corpora with complex relationships
Real-time, cost-sensitive applicationsAnalytical workloads where query cost is acceptable
Frequently changing data with direct lookupsRepeated structural analysis or reusable reports

The pragmatic path: Start with vector search. It's cheaper, faster, and easier to maintain. Only add the graph layer when you hit specific limitations:

  1. Users ask questions that require synthesizing information from many documents
  2. You need to trace influence chains (who reports to whom, what depends on what)
  3. Global sensemaking queries are a core use case, not an edge case
  4. The corpus is large enough to justify the indexing investment

Mastery check

By the end of this lesson, you should be able to explain GraphRAG as an engineering tradeoff, not as a buzzword.

Key concepts

  • GraphRAG adds entities, typed relationships, and community reports on top of chunk retrieval.
  • Local search answers entity-centric questions with mixed graph and text context.
  • Global search answers corpus-level questions with map-reduce over community reports.
  • Hierarchical Leiden avoids disconnected communities before report generation; extraction quality still determines whether reports are useful.
  • LazyGraphRAG shifts most LLM cost from indexing time to query time.

Evaluation rubric

  • Foundational: Explain why small top-k similarity retrieval can underrepresent global sensemaking queries.
  • Intermediate: Describe the GraphRAG indexing pipeline: entity extraction, relationship extraction, entity and relationship summarization, community detection, community reports, and retrieval-time embeddings.
  • Advanced: Explain why hierarchical Leiden matters for community detection and report generation.
  • Advanced: Compare Local Search, which builds mixed entity/text/report context, with Global Search, which map-reduces over community reports.
  • Advanced: Analyze cost and latency tradeoffs: expensive indexing buys query capabilities that plain vector search doesn't have.
  • Advanced: Design a hybrid graph-vector architecture where simple lookups stay cheap and structural questions get graph context.
  • Advanced: Explain how to compare LazyGraphRAG with a precomputed-report pipeline using measured quality, query volume, and index/query cost.

Follow-up questions

Common pitfalls

When engineers first encounter knowledge graphs and GraphRAG, they often bring assumptions from traditional vector search or long-context LLMs. Here are the most common traps, what they look like in practice, and how to avoid them.

Mistake 1: "Use a bigger context window"

Symptom: You stuff 100,000 tokens into a long-context model and ask for a summary. The output misses key themes or contradicts itself.

Cause: A 1M token context window doesn't automatically solve global sensemaking. You still have to decide what to include, very large prompts are expensive to run, and models can still underuse information buried in the middle of long contexts [14].

Fix: Benchmark selective retrieval and structured indexing against long-context prompting. Standard GraphRAG is one candidate when repeated global questions justify precomputed reports; it isn't required for every long document.

Mistake 2: Treating GraphRAG as a replacement for vector search

Symptom: You replace your entire vector index with a knowledge graph. Simple lookups become slow and expensive.

Cause: GraphRAG is complementary, not substitutive. Local search uses entity-description embeddings and linked text units as part of context construction. Global search adds report synthesis for questions that require coverage across many documents.

Fix: Use a hybrid router. Route simple fact lookups to vector search, entity questions to local search, and global summaries to global search.

Mistake 3: Ignoring entity resolution

Symptom: Your graph has three separate nodes for "FastCarrier," "Fast Carrier," and "fastcarrier.com." Queries that should traverse through the carrier fail because the path is broken.

Cause: Entity extraction from unstructured text is inherently noisy. Skipping deduplication leaves near-duplicate nodes that fragment the graph.

Fix: Implement a three-stage resolution pipeline: (1) normalize names (lowercase, strip punctuation), (2) use embeddings or string distance to find candidate duplicates, (3) use a lightweight LLM verification step to confirm whether two names refer to the same real-world entity before merging.

Mistake 4: Extracting every noun as a node

Symptom: The graph balloons to millions of nodes, most of them useless. Common words like "customer," "order," and "email" become nodes with thousands of meaningless edges.

Cause: Without a schema, the extractor treats every noun as an entity.

Fix: Define an ontology (a typed schema) before extraction. Restrict nodes to domain-relevant types like Order, Carrier, Warehouse, ProductDefect, and RefundPolicy. Filter low-information entities in a post-processing step.

Mistake 5: Underestimating indexing cost

Symptom: The prototype works on 200 documents, then the first full-corpus run burns budget on extraction, summarization, and embedding passes.

Cause: Standard GraphRAG does much more than embed chunks. It asks models to extract entities, extract relationships, summarize repeated descriptions, generate community reports, and embed several downstream artifacts.

Fix: Measure indexing cost before committing to the architecture. Start with a small representative corpus, use inexpensive models while tuning prompts, consider FastGraphRAG for global-summary-heavy workloads, and set a refresh cadence instead of pretending every graph update is free.

Mistake 6: Ignoring graph maintenance

Symptom: Search results become stale or paths break after documents are added, deleted, or corrected.

Cause: A graph index has structure. New evidence can merge entities, change edge weights, move community boundaries, and invalidate old community reports.

Fix: Treat graph refresh as a product requirement. Track provenance with text_unit_ids, run incremental update flows where they are good enough, and schedule full rebuilds when entity resolution or community boundaries drift.

Try it yourself

Here's a short exercise to check your understanding. Work through it on paper or in a text editor before reading the solution sketch.

The corpus

Doc A: "Warehouse B handles returns for the West Coast. Warehouse B recently switched to AutoSort robots." Doc B: "AutoSort robots sometimes mislabel fragile items." Doc C: "Order #7712 (fragile glassware) was routed to Warehouse B, arrived broken, and received an approved refund."

The question: "Why did Order #7712 arrive broken, and what systemic issue might cause similar problems?"

Step 1: List the entities and relationships you'd extract from each document.

Step 2: Draw (or list) the triples that connect Order #7712 to the systemic issue.

Step 3: Explain why standard vector RAG might miss the systemic issue.

Solution sketch

Entities: Warehouse B, AutoSort robots, Order #7712, fragile glassware, refund.

Key triples

  • (Order #7712) --[contains]--> (fragile glassware)
  • (Order #7712) --[routed_to]--> (Warehouse B)
  • (Warehouse B) --[uses]--> (AutoSort robots)
  • (AutoSort robots) --[problem]--> (mislabels fragile items)
  • (Order #7712) --[result]--> (arrived broken)

Multi-hop path: Order #7712 --[routed_to]--> Warehouse B --[uses]--> AutoSort robots --[problem]--> mislabels fragile items. The contains -> fragile glassware edge explains why that robot failure is relevant to this order.

Why similarity retrieval may miss it: No single document contains both "Order #7712" and "mislabels fragile items." A search for "Why did Order #7712 arrive broken?" may retrieve Doc C while missing Docs A and B because neither mentions the order. If extraction is accurate, a graph makes the path from order to warehouse to robot issue explicit and retrievable with provenance.

Key takeaways

  1. GraphRAG addresses a global query problem that small top-k vector retrieval can handle poorly. Standard GraphRAG adds structure and precomputed summaries for corpus-level synthesis.

  2. The pipeline is more than graph extraction. Standard GraphRAG extracts entities and relationships, can optionally extract claims, builds hierarchical Leiden communities, generates community reports, and embeds multiple artifacts for retrieval.

  3. Local and global search are the core mental model. Local search (entity-centric + vector) handles specific questions like "What's the refund policy for opened electronics?" Global search (map-reduce over community summaries) handles broad questions like "What are the main themes?" DRIFT adds a hybrid middle ground, and Basic Search remains the plain vector fallback.

  4. Indexing is expensive, and query cost depends on the search mode. Local search is usually manageable. Global search can get expensive if you have to map over many community-report batches.

  5. Hybrid routing is an evaluation choice. Test simple questions on text retrieval and structural questions on graph/report context, then release routes that meet support, cost, and latency targets.

  6. Start with vector search, add graphs when needed. GraphRAG shines for complex, interconnected datasets and global sensemaking. For simple fact retrieval, vector search remains faster and more cost-effective.

  7. The expensive standard index isn't the only option. LazyGraphRAG (2024) moves LLM work to query time; Microsoft reported vector-RAG-like indexing cost in its comparison. Benchmark lazy and precomputed-report approaches on your query mix rather than assuming that ratio transfers unchanged.

Next Step
Continue to RAG Security & Access Control

There, you'll understand row-level security, document ACLs, and per-user filtering in vector stores to prevent RAG systems from leaking confidential data. That topic builds directly on the hybrid retrieval ideas here: once you're routing queries between vector and graph layers, you need to make sure each user only sees the entities and chunks they're authorized to access.

PreviousAdvanced RAG: HyDE & Self-RAG
Share this article
XFacebookLinkedInBlueskyRedditHacker NewsEmail
References

From Local to Global: A Graph RAG Approach to Query-Focused Summarization.

Edge, D., et al. · 2024 · arXiv preprint

GraphRAG Indexing Overview

Microsoft GraphRAG Documentation · 2025

GraphRAG Default Dataflow

Microsoft GraphRAG Documentation · 2025

Unifying Large Language Models and Knowledge Graphs: A Roadmap.

Pan, S., et al. · 2024 · arXiv preprint

GraphRAG Indexing Methods

Microsoft GraphRAG Documentation · 2025

From Louvain to Leiden: guaranteeing well-connected communities.

Traag, V.A., Waltman, L., & van Eck, N.J. · 2019 · Scientific Reports

GraphRAG Local Search

Microsoft GraphRAG Documentation · 2025

GraphRAG Global Search

Microsoft GraphRAG Documentation · 2025

GraphRAG DRIFT Search

Microsoft GraphRAG Documentation · 2025

GraphRAG Outputs

Microsoft GraphRAG Documentation · 2025

GraphRAG Getting Started

Microsoft GraphRAG Documentation · 2025

LazyGraphRAG: Setting a New Standard for Quality and Cost

Larson, J. & Truitt, S. (Microsoft Research) · 2024

GraphRAG CLI Reference

Microsoft GraphRAG Documentation · 2025

Lost in the Middle: How Language Models Use Long Contexts

Liu, N.F., et al. · 2023 · TACL 2023