Learn how GraphRAG uses entity graphs, hierarchical community reports, and embeddings to retrieve evidence for relationship-heavy and corpus-level questions.
Advanced RAG made retrieval adaptive: rewrite the query, generate hypothetical evidence, critique retrieved context, and retry when evidence is weak. GraphRAG changes a different layer. It indexes relationship structure and summaries alongside text so a system can retrieve connected evidence or corpus-level themes, at additional extraction and query cost. This chapter explains when that tradeoff complements search and when it adds needless complexity.
Imagine running an online store with 50,000 customer support tickets. A shopper asks: "Why did my order take twelve days when the website promised five?" A simple retrieval system can find the ticket about that delay. But when an analyst asks: "What are the top three reasons our deliveries miss the promised window?" a small result set may represent only a few incidents. It doesn't itself provide coverage over recurring themes across the corpus.
Microsoft's GraphRAG architecture addresses that query class by building a graph-based index of entities and relationships, pregenerating hierarchical community reports, and text artifacts for retrieval [1][2][3]. In the original paper's global sensemaking evaluation, GraphRAG improved answer comprehensiveness and diversity over its vector RAG baseline [1]. Its index also supplies structured artifacts for entity-focused local context construction. A broader roadmap by Pan et al. places this in the wider effort to combine LLMs with structured knowledge graphs [4].
The mental model is simple: vector RAG is a filing cabinet; GraphRAG is a case board. Vector search finds chunks that look like the question. A graph index records connections and summary layers that a query method can use when evidence spans multiple documents. Standard GraphRAG local search builds ranked mixed context from graph and text artifacts; a product that needs explicit path traversal must implement and evaluate that behavior.
Consider a corpus of 10,000 e-commerce support tickets. We'll use this as our running example throughout the article.
For a specific question about a single issue, vector search excels:
"What's the return policy for opened electronics?" Retrieval path: top-5 similar tickets, then answer from local context. This works because the needed evidence is concentrated in a few chunks.
The question is self-contained. A few chunks about returns and electronics contain everything the system needs.
For a question that requires synthesizing across the entire corpus, vector search struggles:
"What are the top 3 recurring reasons orders miss the delivery promise?" Failure mode: no single chunk contains a cross-corpus summary, and a top-5 result set is unlikely to represent themes across 10,000 tickets. The resulting answer can be incomplete or misleading.
Vector search retrieves the most similar chunks, not necessarily the most representative ones. When answering a global query, no single chunk may contain the complete answer. One way to create broader retrieval units is to add:
| Feature | Vector Search (Standard RAG) | GraphRAG |
|---|---|---|
| Available artifacts | Ranked text chunks | Text units, entities, relationships, community reports |
| Natural starting point | Specific evidence lookup ("What is X?") | Entity-focused or thematic analysis ("What are the trends?") |
| Context construction | Similarity-ranked chunks | Ranked graph/text context or report map-reduce |
| Indexing work | Embedding and optional sparse index | Additional extraction, clustering, reports, and embeddings |
| Query work | Retrieval plus generation | Depends on search mode; global search adds map-reduce calls |
Before we define terms, let's see the difference in action. Here's a tiny support corpus:
Ticket 1842: "Order #9021 shipped via FastCarrier. The package sat at the Memphis hub for five days because of a winter storm." Ticket 2031: "FastCarrier's Memphis hub uses outdated sort scanners. Replacement parts are back-ordered." Ticket 3155: "Order #9021 arrived twelve days late. Customer wants a refund on shipping fees."
Now ask:
"Why did Order #9021 arrive late, and is this likely to happen again?"
Illustrative vector-only result: Suppose retrieval finds Ticket 3155 (mentions "Order #9021" and "late") and Ticket 1842 (mentions "Order #9021" and "Memphis hub"), but not Ticket 2031 because it doesn't mention the order number. The answer can identify the weather delay yet miss the scanner problem.
Graph-enriched context: Suppose extraction produced these entities and relationships, with source-chunk provenance:
(Order #9021) --[shipped_via]--> (FastCarrier)(Order #9021) --[delayed_at]--> (Memphis Hub)(Memphis Hub) --[operated_by]--> (FastCarrier)(Memphis Hub) --[affected_by]--> (Winter Storm)(Memphis Hub) --[has_issue]--> (Outdated Scanners)(Outdated Scanners) --[status]--> (Back-ordered Parts)An implementation that supports evidence-backed traversal could start at Order #9021, follow delayed_at to Memphis Hub, and retrieve evidence about both Winter Storm and Outdated Scanners. It may then answer: "Order #9021 was delayed at the Memphis hub. The source material mentions a winter storm and also an unresolved scanner issue." It should not predict recurring delays unless source evidence supports that claim.
This is relationship-heavy retrieval: useful evidence is connected through entities that weren't all present in the query. Microsoft's standard local-search dataflow ranks graph and text artifacts into a context window; it isn't a promise that every answer executes a deterministic path. If explicit multi-hop paths matter, preserve edge provenance and test the traversal policy.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Edge:
5 source: str
6 relation: str
7 target: str
8 source_chunks: tuple[str, ...]
9
10def cited_path(edges: list[Edge]) -> tuple[bool, list[str]]:
11 if any(not edge.source_chunks for edge in edges):
12 return False, []
13 citations = sorted({chunk for edge in edges for chunk in edge.source_chunks})
14 return True, citations
15
16path = [
17 Edge("Order #9021", "delayed_at", "Memphis Hub", ("ticket-1842",)),
18 Edge("Memphis Hub", "has_issue", "Outdated Scanners", ("ticket-2031",)),
19]
20unsupported_path = path + [
21 Edge("Outdated Scanners", "will_cause", "Future Delay", ()),
22]
23
24print("supported path:", cited_path(path))
25print("unsupported prediction:", cited_path(unsupported_path))1supported path: (True, ['ticket-1842', 'ticket-2031'])
2unsupported prediction: (False, [])A knowledge graph represents information as a network of entities (nodes) and relationships (edges). Unlike a flat vector database that stores text chunks as isolated embeddings, a knowledge graph preserves the structure of how facts connect.
The fundamental unit is the triple: (Subject) - [Predicate] -> (Object). The edge direction should reflect the meaning of the relation itself, not some generic left-to-right convention.
From our running example:
(Order #9021) --[shipped_via]--> (FastCarrier)(Memphis Hub) --[has_issue]--> (Outdated Scanners)(Outdated Scanners) --[status]--> (Back-ordered Parts)In production, you're usually working with a property graph rather than bare triples. That means nodes and edges carry metadata like type, description, source_chunk_ids, timestamps, or confidence scores. Those extra fields matter for filtering, provenance, and ranking during retrieval.
This graph structure mirrors how operations teams reason through associations. When a query mentions "Memphis hub," a retrieval system can use typed relationships and provenance to distinguish carrier delays, scanner outages, return-center capacity, and inventory transfers instead of relying on city-name similarity alone.
The indexing phase transforms raw document chunks into a structured graph index. In Microsoft's GraphRAG docs, the standard pipeline extracts entities and relationships, optionally extracts claims, builds hierarchical communities, generates community reports, and embeds the resulting artifacts for downstream retrieval [2][3]:
The first step is to pass each document chunk to an extraction system that identifies key entities and the relationships between them. In standard GraphRAG, this is LLM-driven. In FastGraphRAG, some of this work is replaced with cheaper NLP-based heuristics to cut indexing cost [5]. A simple extraction prompt looks like this:
1EXTRACTION_PROMPT = """
2Extract all entities and relationships from the following text.
3
4Entities should include: people, organizations, products, concepts, locations.
5Relationships should include: works_at, uses, depends_on, causes, relates_to.
6
7Text: {chunk_text}
8
9Output as JSON:
10{
11 "entities": [
12 {"name": "...", "type": "...", "description": "..."}
13 ],
14 "relationships": [
15 {
16 "source": "...",
17 "target": "...",
18 "type": "...",
19 "description": "...",
20 "strength": 1.0
21 }
22 ]
23}
24"""In production, don't rely on raw string prompting for JSON. Use structured output frameworks like Instructor (Python) or Zod (TypeScript) with function calling or tool-use APIs to enforce schema adherence. This prevents syntax errors when processing thousands of chunks.
The runnable version below uses a deterministic extractor so you can test the graph contract without calling an LLM. In production, replace extract_from_chunk() with a structured-output model call and keep the same Entity and Relationship objects.
1from dataclasses import dataclass, field
2
3@dataclass
4class Entity:
5 name: str
6 type: str
7 description: str
8 source_chunk_ids: list[int] = field(default_factory=list)
9
10@dataclass
11class Relationship:
12 source: str
13 target: str
14 type: str
15 description: str
16 confidence: float
17 source_chunk_ids: list[int] = field(default_factory=list)
18
19def canonicalize(name: str) -> str:
20 return name.lower().replace(".", "").replace(" ", "")
21
22def extract_from_chunk(chunk_id: int, chunk: str) -> tuple[list[Entity], list[Relationship]]:
23 entities: list[Entity] = []
24 relationships: list[Relationship] = []
25
26 if "Order #9021" in chunk:
27 entities.append(Entity("Order #9021", "Order", "Delayed customer order", [chunk_id]))
28 if "FastCarrier" in chunk or "Fast Carrier" in chunk:
29 entities.append(Entity("FastCarrier", "Carrier", "Shipping carrier", [chunk_id]))
30 if "Memphis hub" in chunk or "Memphis Hub" in chunk:
31 entities.append(Entity("Memphis Hub", "Warehouse", "Regional sorting hub", [chunk_id]))
32 if "winter storm" in chunk:
33 entities.append(Entity("Winter Storm", "Event", "Weather disruption", [chunk_id]))
34 if "outdated sort scanners" in chunk:
35 entities.append(Entity("Outdated Scanners", "EquipmentIssue", "Aging sort scanners", [chunk_id]))
36 if "back-ordered" in chunk:
37 entities.append(Entity("Back-ordered Parts", "SupplyIssue", "Replacement parts unavailable", [chunk_id]))
38
39 names = {entity.name for entity in entities}
40 if {"Order #9021", "FastCarrier"} <= names:
41 relationships.append(
42 Relationship("Order #9021", "FastCarrier", "shipped_via", "Carrier for order", 1.0, [chunk_id])
43 )
44 if {"Memphis Hub", "FastCarrier"} <= names:
45 relationships.append(
46 Relationship("Memphis Hub", "FastCarrier", "operated_by", "Carrier operates hub", 0.8, [chunk_id])
47 )
48 if {"Order #9021", "Memphis Hub"} <= names:
49 relationships.append(
50 Relationship("Order #9021", "Memphis Hub", "delayed_at", "Delay location", 1.0, [chunk_id])
51 )
52 if {"Memphis Hub", "Winter Storm"} <= names:
53 relationships.append(
54 Relationship("Memphis Hub", "Winter Storm", "affected_by", "Weather caused hub delay", 0.9, [chunk_id])
55 )
56 if {"Memphis Hub", "Outdated Scanners"} <= names:
57 relationships.append(
58 Relationship("Memphis Hub", "Outdated Scanners", "has_issue", "Scanner outage risk", 0.8, [chunk_id])
59 )
60 if {"Outdated Scanners", "Back-ordered Parts"} <= names:
61 relationships.append(
62 Relationship("Outdated Scanners", "Back-ordered Parts", "status", "Replacement parts unavailable", 0.7, [chunk_id])
63 )
64
65 return entities, relationships
66
67def deduplicate_entities(entities: list[Entity]) -> list[Entity]:
68 merged: dict[str, Entity] = {}
69 for entity in entities:
70 key = canonicalize(entity.name)
71 if key not in merged:
72 merged[key] = entity
73 continue
74 merged[key].source_chunk_ids.extend(entity.source_chunk_ids)
75 return list(merged.values())
76
77def extract_graph_elements(chunks: list[str]) -> tuple[list[Entity], list[Relationship]]:
78 all_entities: list[Entity] = []
79 all_relationships: list[Relationship] = []
80 entity_name_by_key: dict[str, str] = {}
81
82 for chunk_id, chunk in enumerate(chunks):
83 entities, relationships = extract_from_chunk(chunk_id, chunk)
84 all_entities.extend(entities)
85 all_relationships.extend(relationships)
86
87 deduped_entities = deduplicate_entities(all_entities)
88 for entity in deduped_entities:
89 entity_name_by_key[canonicalize(entity.name)] = entity.name
90
91 resolved_relationships = [
92 Relationship(
93 source=entity_name_by_key[canonicalize(rel.source)],
94 target=entity_name_by_key[canonicalize(rel.target)],
95 type=rel.type,
96 description=rel.description,
97 confidence=rel.confidence,
98 source_chunk_ids=rel.source_chunk_ids,
99 )
100 for rel in all_relationships
101 ]
102
103 return deduped_entities, resolved_relationships
104
105chunks = [
106 "Order #9021 shipped via FastCarrier. The package sat at Memphis hub for five days because of a winter storm.",
107 "Fast Carrier's Memphis hub uses outdated sort scanners. Replacement parts are back-ordered.",
108]
109
110entities, relationships = extract_graph_elements(chunks)
111entity_names = {entity.name for entity in entities}
112relationship_types = {rel.type for rel in relationships}
113
114print("entities:", ", ".join(sorted(entity_names)))
115print("relationships:", ", ".join(sorted(relationship_types)))1entities: Back-ordered Parts, FastCarrier, Memphis Hub, Order #9021, Outdated Scanners, Winter Storm
2relationships: affected_by, delayed_at, has_issue, operated_by, shipped_via, statusOnce the entities and relationships are extracted and deduplicated, we can construct the graph data structure. Mapping isolated facts from individual chunks into an interconnected network gives later retrieval policies access to relationships that weren't explicit in any single chunk.
This function takes the lists of entities and relationships as input. Using a graph processing library like NetworkX, it adds each entity as a node and each typed relationship as a directed edge with provenance and weighting metadata:
1from dataclasses import dataclass, field
2
3import networkx as nx
4
5@dataclass
6class Entity:
7 name: str
8 type: str
9 description: str
10 source_chunk_ids: list[int] = field(default_factory=list)
11
12@dataclass
13class Relationship:
14 source: str
15 target: str
16 type: str
17 description: str
18 confidence: float
19 source_chunk_ids: list[int] = field(default_factory=list)
20
21def build_knowledge_graph(
22 entities: list[Entity],
23 relationships: list[Relationship]
24) -> nx.MultiDiGraph:
25 """Construct a directed property graph from extracted elements."""
26 G = nx.MultiDiGraph()
27
28 for entity in entities:
29 G.add_node(entity.name,
30 type=entity.type,
31 description=entity.description,
32 source_chunk_ids=entity.source_chunk_ids)
33
34 for rel in relationships:
35 G.add_edge(rel.source, rel.target,
36 key=rel.type,
37 type=rel.type,
38 description=rel.description,
39 weight=rel.confidence,
40 source_chunk_ids=rel.source_chunk_ids)
41
42 return G
43
44entities = [
45 Entity("Order #9021", "Order", "Delayed customer order", [0]),
46 Entity("FastCarrier", "Carrier", "Shipping carrier", [0]),
47 Entity("Memphis Hub", "Warehouse", "Regional sorting hub", [0, 1]),
48]
49relationships = [
50 Relationship("Order #9021", "FastCarrier", "shipped_via", "Carrier for order", 1.0, [0]),
51 Relationship("Order #9021", "Memphis Hub", "delayed_at", "Delay location", 1.0, [0]),
52]
53
54graph = build_knowledge_graph(entities, relationships)
55
56print("nodes:", sorted(graph.nodes))
57print("edges:", sorted((src, rel_type, dst) for src, dst, rel_type in graph.edges(keys=True)))
58print("Memphis Hub node type:", graph.nodes["Memphis Hub"]["type"])
59print("shipped_via weight:", graph["Order #9021"]["FastCarrier"]["shipped_via"]["weight"])1nodes: ['FastCarrier', 'Memphis Hub', 'Order #9021']
2edges: [('Order #9021', 'delayed_at', 'Memphis Hub'), ('Order #9021', 'shipped_via', 'FastCarrier')]
3Memphis Hub node type: Warehouse
4shipped_via weight: 1.0After the graph is built, we need to find natural clusters of related entities. Think of it like grouping the support tickets into themes: shipping problems, refund problems, product defects. The difference is that the algorithm decides the groups based on how densely entities connect to each other, not on keyword matching.
The Leiden algorithm identifies these clusters. It's preferred over the classic Louvain method because Louvain can produce disconnected communities (nodes in the same group that aren't connected by edges). Traag et al. show that Leiden adds a refinement phase and guarantees connected communities [6]. That avoids a structural defect before report generation, although it can't correct bad entity extraction or unsupported edges. Microsoft's GraphRAG pipeline applies hierarchical Leiden recursively until communities hit a size threshold, which is how it gets both coarse and fine-grained views of the same corpus [3].
The algorithm optimizes modularity (), a measure of community structure quality in the extracted graph. In plain terms, modularity asks: "Are there more edges inside each group than we'd expect by chance?" A high indicates strong clustering in that graph; it doesn't prove that extraction captured the real-world topic correctly.
For an unweighted, undirected graph, the formula is:
Where is 1 if nodes and are directly connected and 0 otherwise; and are the number of edges each node touches; is the total number of edges in the graph; and is 1 if the two nodes are in the same community, 0 otherwise. Higher modularity indicates better community structure.
The runnable example below computes one weighted Leiden partition with python-igraph. Standard GraphRAG recursively applies community detection to build a hierarchy; keeping this example at one level makes the clustering contract visible and keeps it fast enough to run locally.
1import igraph as ig
2
3def detect_communities(
4 weighted_edges: list[tuple[str, str, float]],
5) -> list[list[str]]:
6 """Compute one Leiden partition of an undirected weighted entity graph."""
7 node_names = sorted(
8 {node for source, target, _ in weighted_edges for node in (source, target)}
9 )
10 graph = ig.Graph()
11 graph.add_vertices(node_names)
12 graph.add_edges([(source, target) for source, target, _ in weighted_edges])
13 graph.es["weight"] = [weight for _, _, weight in weighted_edges]
14 partition = graph.community_leiden(
15 objective_function="modularity",
16 weights="weight",
17 )
18 groups = [
19 sorted(graph.vs[index]["name"] for index in community)
20 for community in partition
21 ]
22 return sorted(groups, key=lambda group: group[0])
23
24weighted_edges = [
25 ("Login", "Login Timeout", 1.0),
26 ("Login", "iOS", 1.0),
27 ("Crash", "Crash Loop", 1.0),
28 ("Payment", "Card Failure", 1.0),
29 ("Refund", "Auto Refund", 1.0),
30]
31groups = detect_communities(weighted_edges)
32
33print("community_count:", len(groups))
34for group in groups:
35 print("-", ", ".join(group))1community_count: 4
2- Auto Refund, Refund
3- Card Failure, Payment
4- Crash, Crash Loop
5- Login, Login Timeout, iOSThe following diagram visualizes how entities are grouped into hierarchical communities, from specific concepts at the bottom to broad themes at the top:
This function takes the entities, their descriptions, and their relationships within a community as input. It passes these to a summarizer, returning a community report that can later be used for global search [1][3]:
1from dataclasses import dataclass
2from typing import Protocol
3
4@dataclass(frozen=True)
5class Entity:
6 name: str
7 description: str
8
9@dataclass(frozen=True)
10class Relationship:
11 source: str
12 type: str
13 target: str
14
15class CommunitySummarizer(Protocol):
16 def summarize(self, prompt: str) -> str: ...
17
18class FakeCommunitySummarizer:
19 def summarize(self, prompt: str) -> str:
20 return (
21 "Memphis Hub delay cluster: FastCarrier orders are affected by storms "
22 "and scanner reliability issues."
23 )
24
25def summarize_community(
26 community_entities: list[Entity],
27 community_relationships: list[Relationship],
28 level: int,
29 summarizer: CommunitySummarizer,
30) -> str:
31 entity_lines = "\n".join(
32 f"- {entity.name}: {entity.description}" for entity in community_entities
33 )
34 relationship_lines = "\n".join(
35 f"- {rel.source} {rel.type} {rel.target}" for rel in community_relationships
36 )
37 prompt = (
38 f"Community level: {level}\n"
39 f"Entities:\n{entity_lines}\n"
40 f"Relationships:\n{relationship_lines}\n"
41 "Summarize the main operational pattern."
42 )
43 return summarizer.summarize(prompt)
44
45summary = summarize_community(
46 [Entity("Memphis Hub", "Regional sorting hub")],
47 [Relationship("Memphis Hub", "has_issue", "Outdated Scanners")],
48 level=0,
49 summarizer=FakeCommunitySummarizer(),
50)
51
52print(summary)1Memphis Hub delay cluster: FastCarrier orders are affected by storms and scanner reliability issues.GraphRAG's query engine exposes Basic Search, Local Search, Global Search, and DRIFT Search. Local and global search are the core mental model here: local search builds context for entity-centric questions, global search synthesizes over community reports, Basic Search is a text-focused baseline, and DRIFT combines community-level entry points with local exploration [7][8][9].
Local search isn't just "do NER on the query and grab neighboring nodes." In Microsoft's GraphRAG docs, local search maps the query into semantically related entities, then prioritizes a mixed context from connected entities, relationships, community reports, linked text units, and optionally covariates if claim extraction is enabled [7].
The runnable example below demonstrates a bounded neighborhood expansion for intuition. It's a custom graph-walk sketch, not an exact reproduction of the GraphRAG context builder or its ranking policy:
1import networkx as nx
2
3class EntityStore:
4 def __init__(self, entity_names: list[str]) -> None:
5 self.entity_names = entity_names
6
7 def similarity_search(self, query: str, k: int = 10) -> list[str]:
8 query_lower = query.lower()
9 matches = [
10 entity for entity in self.entity_names if entity.lower() in query_lower
11 ]
12 return matches[:k]
13
14def expand_entity_neighborhood(
15 graph: nx.MultiDiGraph, entity_names: list[str], hops: int = 1
16) -> nx.MultiDiGraph:
17 nodes: set[str] = set(entity_names)
18 frontier: set[str] = set(entity_names)
19 for _ in range(hops):
20 next_frontier: set[str] = set()
21 for node in frontier:
22 next_frontier.update(graph.successors(node))
23 next_frontier.update(graph.predecessors(node))
24 nodes.update(next_frontier)
25 frontier = next_frontier
26 return graph.subgraph(nodes).copy()
27
28def render_local_context(
29 entities: list[str],
30 relationships: list[tuple[str, str, dict]],
31 text_units: dict[str, list[str]],
32 community_reports: dict[str, list[str]],
33) -> str:
34 return "\n".join(
35 [
36 f"Entities: {entities}",
37 f"Relationships: {[(src, data['type'], dst) for src, dst, data in relationships]}",
38 f"Text units: {text_units}",
39 f"Reports: {community_reports}",
40 ]
41 )
42
43def local_search(
44 query: str,
45 entity_store: EntityStore,
46 graph: nx.MultiDiGraph,
47 text_units: dict[str, list[str]],
48 community_reports: dict[str, list[str]],
49) -> str:
50 mapped_entities = entity_store.similarity_search(query, k=10)
51 neighborhood = expand_entity_neighborhood(graph, mapped_entities, hops=2)
52 neighborhood_entities = sorted(neighborhood.nodes)
53 context = render_local_context(
54 entities=neighborhood_entities,
55 relationships=list(neighborhood.edges(data=True)),
56 text_units={entity: text_units.get(entity, []) for entity in neighborhood_entities},
57 community_reports={
58 entity: community_reports.get(entity, []) for entity in neighborhood_entities
59 },
60 )
61 return f"Answer using local GraphRAG context:\n{context}"
62
63graph = nx.MultiDiGraph()
64graph.add_edge("Order #9021", "Memphis Hub", type="delayed_at")
65graph.add_edge("Memphis Hub", "Outdated Scanners", type="has_issue")
66
67answer = local_search(
68 "Why was Order #9021 late?",
69 EntityStore(["Order #9021", "Memphis Hub", "Outdated Scanners"]),
70 graph,
71 text_units={"Order #9021": ["Order arrived twelve days late."]},
72 community_reports={"Memphis Hub": ["Hub has weather and scanner risks."]},
73)
74
75print(answer)1Answer using local GraphRAG context:
2Entities: ['Memphis Hub', 'Order #9021', 'Outdated Scanners']
3Relationships: [('Order #9021', 'delayed_at', 'Memphis Hub'), ('Memphis Hub', 'has_issue', 'Outdated Scanners')]
4Text units: {'Memphis Hub': [], 'Order #9021': ['Order arrived twelve days late.'], 'Outdated Scanners': []}
5Reports: {'Memphis Hub': ['Hub has weather and scanner risks.'], 'Order #9021': [], 'Outdated Scanners': []}The actual context-builder problem includes a token budget: relevant entities, relationships, text units, and reports compete for room in one generation request. A production route needs ranking and provenance, not unbounded expansion.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Candidate:
5 source: str
6 score: float
7 tokens: int
8 citation: str
9
10def pack_context(candidates: list[Candidate], token_budget: int) -> list[Candidate]:
11 selected: list[Candidate] = []
12 used = 0
13 for candidate in sorted(candidates, key=lambda item: item.score, reverse=True):
14 if used + candidate.tokens <= token_budget:
15 selected.append(candidate)
16 used += candidate.tokens
17 return selected
18
19candidates = [
20 Candidate("relationship: delayed_at", 0.98, 30, "ticket-1842"),
21 Candidate("text: scanner issue", 0.91, 55, "ticket-2031"),
22 Candidate("report: carrier overview", 0.62, 80, "community-7"),
23]
24selected = pack_context(candidates, token_budget=90)
25
26print("selected:", [item.source for item in selected])
27print("citations:", [item.citation for item in selected])
28print("tokens:", sum(item.tokens for item in selected))1selected: ['relationship: delayed_at', 'text: scanner issue']
2citations: ['ticket-1842', 'ticket-2031']
3tokens: 85For queries that require cross-corpus understanding, GraphRAG uses a map-reduce pattern over community reports from a selected hierarchy level. The global-search docs batch community reports into chunks, produce rated intermediate points during the map step, then aggregate the highest-value points in the reduce step [1][8]:
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class RatedPoint:
5 text: str
6 rating: int
7
8def batch_reports(reports: list[str], batch_size: int = 2) -> list[list[str]]:
9 return [reports[index:index + batch_size] for index in range(0, len(reports), batch_size)]
10
11def map_report_batch(query: str, report_batch: list[str]) -> list[RatedPoint]:
12 points: list[RatedPoint] = []
13 for report in report_batch:
14 report_lower = report.lower()
15 if "late" in report_lower or "delay" in report_lower:
16 points.append(RatedPoint(report, rating=9))
17 elif "refund" in report_lower:
18 points.append(RatedPoint(report, rating=6))
19 return points
20
21def select_top_points(points: list[RatedPoint], top_k: int) -> list[RatedPoint]:
22 return sorted(points, key=lambda point: point.rating, reverse=True)[:top_k]
23
24def global_search(
25 query: str,
26 reports_by_level: dict[int, list[str]],
27 level: int,
28) -> str:
29 """Answer dataset-level questions with map-reduce over community reports."""
30 batches = batch_reports(reports_by_level[level], batch_size=2)
31 mapped_points = [
32 point
33 for batch in batches
34 for point in map_report_batch(query, batch)
35 ]
36 top_points = select_top_points(mapped_points, top_k=3)
37 bullets = "\n".join(f"- {point.text}" for point in top_points)
38 return f"Top recurring themes for '{query}':\n{bullets}"
39
40reports_by_level = {
41 1: [
42 "Late deliveries cluster around carrier weather delays.",
43 "Warehouse backlogs delay fragile item packing.",
44 "Refund complaints cluster around unclear label expiration.",
45 ]
46}
47
48answer = global_search(
49 "What are the main reasons orders miss the promised window?",
50 reports_by_level,
51 level=1,
52)
53
54print(answer)1Top recurring themes for 'What are the main reasons orders miss the promised window?':
2- Late deliveries cluster around carrier weather delays.
3- Warehouse backlogs delay fragile item packing.
4- Refund complaints cluster around unclear label expiration.A useful production design doesn't treat vector search and GraphRAG as mutually exclusive. The GraphRAG stack already mixes graph structure with embeddings during both indexing and query-time context building [2][7]. One hybrid architecture looks like this:
A router is a policy to evaluate, not a guarantee. Measure supported-answer accuracy, latency, and spend for each query class. In a workload dominated by direct evidence lookups, a text route may cover most requests; an analyst-heavy workload may justify more graph/report queries.
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class RouteResult:
5 route: str
6 query_class: str
7 supported_accuracy: float
8 p95_ms: int
9
10def release_route(
11 results: list[RouteResult],
12 query_class: str,
13 minimum_accuracy: float,
14 maximum_p95_ms: int,
15) -> str:
16 eligible = [
17 result for result in results
18 if result.query_class == query_class
19 and result.supported_accuracy >= minimum_accuracy
20 and result.p95_ms <= maximum_p95_ms
21 ]
22 return max(
23 eligible,
24 key=lambda result: (result.supported_accuracy, -result.p95_ms),
25 ).route
26
27results = [
28 RouteResult("basic", "fact_lookup", 0.96, 95),
29 RouteResult("local", "fact_lookup", 0.96, 240),
30 RouteResult("basic", "corpus_trend", 0.61, 92),
31 RouteResult("global", "corpus_trend", 0.91, 580),
32]
33
34print("fact route:", release_route(results, "fact_lookup", 0.90, 200))
35print("trend route:", release_route(results, "corpus_trend", 0.90, 700))1fact route: basic
2trend route: globalOne subtle but important point: GraphRAG the technique doesn't require a dedicated graph database. Microsoft's reference implementation writes structured output tables to disk and builds query context from those artifacts directly. A graph database like Neo4j or Neptune becomes useful when you need custom traversals, shared KG infrastructure, or analyst-facing graph queries outside the stock pipeline [10].
Beyond simple routing, the knowledge graph layer can actively enrich vector results.
Query expansion uses entities found in initial vector results to expand the search query, finding semantically related but textually distinct content.
Context bridging is a custom retrieval option when two retrieved chunks don't directly connect. If edges retain supporting chunks, a traversal can identify intermediate entities for retrieval and citation. For example, if one chunk mentions "Order #9021 shipped via FastCarrier" and another mentions "FastCarrier's Memphis hub has outdated scanners," a policy can inspect Order #9021 -> FastCarrier -> Memphis Hub -> Outdated Scanners, then fetch source text before making a claim about the order.
Result ranking can boost text results with short, supported paths to query entities. Treat this as a ranker feature to evaluate, not proof that a nearby node supports the answer.
The most significant barrier to adopting standard GraphRAG is the upfront indexing cost. Compared with vector-only RAG, its standard pipeline adds LLM-heavy graph extraction, summarization, community report generation, and multiple embedding passes [1][3]. Whether the resulting query quality justifies that work is an evaluation question.
To make this viable at scale, teams need to optimize the ingestion pipeline. Microsoft explicitly recommends starting with fast, inexpensive models while you learn the system, and their docs estimate that graph extraction (entity and relationship extraction plus their summarization) is roughly 75% of standard indexing cost [11][5]. If your use case is mostly global summarization, FastGraphRAG can reduce cost further: it replaces LLM-based entity extraction with NLP noun-phrase extraction (using libraries like NLTK or spaCy) and defines relationships by entity co-occurrence within a text unit. The graph is noisier and less reusable outside GraphRAG, but indexing is much cheaper [5].
Microsoft Research introduced LazyGraphRAG in November 2024 [12]. It targets a central cost concern with standard GraphRAG: paying for an LLM-driven index before knowing how often graph-assisted queries will run.
LazyGraphRAG defers LLM use to query time. Its index uses NLP noun-phrase extraction and graph statistics for community structure, without entity summaries or precomputed community reports. In Microsoft's reported experiment, its indexing cost matched the vector RAG setup and was approximately 0.1% of full GraphRAG indexing cost [12]. Treat that as a benchmark result, not a universal constant for every corpus and deployment. At query time it blends vector similarity with community structure and exposes a relevance test budget that trades query cost for quality. Microsoft reports that, at evaluated budget levels, LazyGraphRAG was competitive with or better than the tested alternatives on local and global query criteria [12].
The practical takeaway is that precomputed community reports aren't the only candidate for global sensemaking. If the corpus changes frequently or global queries are rare, benchmark a lazy approach against standard GraphRAG. If reusable community reports are a product output, the standard index provides artifacts the lazy path intentionally omits.
Use observed quality and workload volume to compare indexing strategies rather than choosing from architecture labels:
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Strategy:
5 name: str
6 index_cost: float
7 query_cost: float
8 supported_accuracy: float
9
10def choose_strategy(
11 strategies: list[Strategy], query_count: int, accuracy_floor: float
12) -> tuple[str, float]:
13 passing = [
14 strategy for strategy in strategies
15 if strategy.supported_accuracy >= accuracy_floor
16 ]
17 winner = min(
18 passing,
19 key=lambda strategy: strategy.index_cost + query_count * strategy.query_cost,
20 )
21 total_cost = winner.index_cost + query_count * winner.query_cost
22 return winner.name, total_cost
23
24# Illustrative measured values from one product evaluation, not vendor benchmarks.
25strategies = [
26 Strategy("lazy", index_cost=1.0, query_cost=1.7, supported_accuracy=0.91),
27 Strategy("precomputed_reports", index_cost=200.0, query_cost=0.8, supported_accuracy=0.93),
28]
29
30print("few queries:", choose_strategy(strategies, query_count=10, accuracy_floor=0.90))
31print("many queries:", choose_strategy(strategies, query_count=500, accuracy_floor=0.90))1few queries: ('lazy', 18.0)
2many queries: ('precomputed_reports', 600.0)While indexing represents the bulk of the computational expense, runtime query costs can also be higher than standard vector search. The cost depends heavily on the query strategy used:
To manage these runtime costs, standard GraphRAG precomputes community reports at multiple hierarchical levels. At query time, evaluate whether a coarse level answers the question adequately before paying for more detailed reports. Lower levels tend to yield more thorough responses, but they can also increase report volume and LLM work [8].
Knowledge graphs aren't static; they must evolve as the underlying document corpus changes. Keeping the graph in sync with a live, mutating dataset introduces significant engineering complexity. Current GraphRAG releases now expose explicit update flows and standard-update / fast-update methods, and the output tables include fields used for incremental update merges [13][10].
Because of the high indexing and maintenance costs, GraphRAG shouldn't be treated as a default replacement for all retrieval tasks; it's a specialized tool for complex analytical workloads. Evaluating the query profile of your application is essential before committing to a knowledge graph architecture. The following decision matrix highlights where each approach shines:
| Scenario | Standard RAG (Vector) | GraphRAG |
|---|---|---|
| Fact lookup ("What is the refund policy?") | Strong baseline with low retrieval work | Additional structure may not pay off |
| Local context ("Summarize this specific ticket") | Strong baseline | Useful when entity-linked context matters |
| Global summary ("What are top 3 trends?") | Small top-k may underrepresent themes | Designed to synthesize community reports |
| Relationship-heavy question ("How does X affect Y via Z?") | Needs expansion or linked evidence | Can retrieve graph-linked evidence; test path behavior separately |
| Dataset size | Lower marginal indexing cost | Costly when graph extraction and reporting run over large corpora |
Use this framework to decide:
| Start with standard RAG (vector search) | Consider GraphRAG |
|---|---|
| Simple fact lookup ("What's the refund policy?") | Global summarization ("What are our top 3 customer complaints?") |
| Single-document questions | Cross-document reasoning ("How does Project X impact Team Y's roadmap?") |
| Prototyping or small datasets | Large, stable corpora with complex relationships |
| Real-time, cost-sensitive applications | Analytical workloads where query cost is acceptable |
| Frequently changing data with direct lookups | Repeated structural analysis or reusable reports |
The pragmatic path: Start with vector search. It's cheaper, faster, and easier to maintain. Only add the graph layer when you hit specific limitations:
By the end of this lesson, you should be able to explain GraphRAG as an engineering tradeoff, not as a buzzword.
When engineers first encounter knowledge graphs and GraphRAG, they often bring assumptions from traditional vector search or long-context LLMs. Here are the most common traps, what they look like in practice, and how to avoid them.
Symptom: You stuff 100,000 tokens into a long-context model and ask for a summary. The output misses key themes or contradicts itself.
Cause: A 1M token context window doesn't automatically solve global sensemaking. You still have to decide what to include, very large prompts are expensive to run, and models can still underuse information buried in the middle of long contexts [14].
Fix: Benchmark selective retrieval and structured indexing against long-context prompting. Standard GraphRAG is one candidate when repeated global questions justify precomputed reports; it isn't required for every long document.
Symptom: You replace your entire vector index with a knowledge graph. Simple lookups become slow and expensive.
Cause: GraphRAG is complementary, not substitutive. Local search uses entity-description embeddings and linked text units as part of context construction. Global search adds report synthesis for questions that require coverage across many documents.
Fix: Use a hybrid router. Route simple fact lookups to vector search, entity questions to local search, and global summaries to global search.
Symptom: Your graph has three separate nodes for "FastCarrier," "Fast Carrier," and "fastcarrier.com." Queries that should traverse through the carrier fail because the path is broken.
Cause: Entity extraction from unstructured text is inherently noisy. Skipping deduplication leaves near-duplicate nodes that fragment the graph.
Fix: Implement a three-stage resolution pipeline: (1) normalize names (lowercase, strip punctuation), (2) use embeddings or string distance to find candidate duplicates, (3) use a lightweight LLM verification step to confirm whether two names refer to the same real-world entity before merging.
Symptom: The graph balloons to millions of nodes, most of them useless. Common words like "customer," "order," and "email" become nodes with thousands of meaningless edges.
Cause: Without a schema, the extractor treats every noun as an entity.
Fix: Define an ontology (a typed schema) before extraction. Restrict nodes to domain-relevant types like Order, Carrier, Warehouse, ProductDefect, and RefundPolicy. Filter low-information entities in a post-processing step.
Symptom: The prototype works on 200 documents, then the first full-corpus run burns budget on extraction, summarization, and embedding passes.
Cause: Standard GraphRAG does much more than embed chunks. It asks models to extract entities, extract relationships, summarize repeated descriptions, generate community reports, and embed several downstream artifacts.
Fix: Measure indexing cost before committing to the architecture. Start with a small representative corpus, use inexpensive models while tuning prompts, consider FastGraphRAG for global-summary-heavy workloads, and set a refresh cadence instead of pretending every graph update is free.
Symptom: Search results become stale or paths break after documents are added, deleted, or corrected.
Cause: A graph index has structure. New evidence can merge entities, change edge weights, move community boundaries, and invalidate old community reports.
Fix: Treat graph refresh as a product requirement. Track provenance with text_unit_ids, run incremental update flows where they are good enough, and schedule full rebuilds when entity resolution or community boundaries drift.
Here's a short exercise to check your understanding. Work through it on paper or in a text editor before reading the solution sketch.
Doc A: "Warehouse B handles returns for the West Coast. Warehouse B recently switched to AutoSort robots." Doc B: "AutoSort robots sometimes mislabel fragile items." Doc C: "Order #7712 (fragile glassware) was routed to Warehouse B, arrived broken, and received an approved refund."
The question: "Why did Order #7712 arrive broken, and what systemic issue might cause similar problems?"
Step 1: List the entities and relationships you'd extract from each document.
Step 2: Draw (or list) the triples that connect Order #7712 to the systemic issue.
Step 3: Explain why standard vector RAG might miss the systemic issue.
Entities: Warehouse B, AutoSort robots, Order #7712, fragile glassware, refund.
(Order #7712) --[contains]--> (fragile glassware)(Order #7712) --[routed_to]--> (Warehouse B)(Warehouse B) --[uses]--> (AutoSort robots)(AutoSort robots) --[problem]--> (mislabels fragile items)(Order #7712) --[result]--> (arrived broken)Multi-hop path: Order #7712 --[routed_to]--> Warehouse B --[uses]--> AutoSort robots --[problem]--> mislabels fragile items. The contains -> fragile glassware edge explains why that robot failure is relevant to this order.
Why similarity retrieval may miss it: No single document contains both "Order #7712" and "mislabels fragile items." A search for "Why did Order #7712 arrive broken?" may retrieve Doc C while missing Docs A and B because neither mentions the order. If extraction is accurate, a graph makes the path from order to warehouse to robot issue explicit and retrievable with provenance.
GraphRAG addresses a global query problem that small top-k vector retrieval can handle poorly. Standard GraphRAG adds structure and precomputed summaries for corpus-level synthesis.
The pipeline is more than graph extraction. Standard GraphRAG extracts entities and relationships, can optionally extract claims, builds hierarchical Leiden communities, generates community reports, and embeds multiple artifacts for retrieval.
Local and global search are the core mental model. Local search (entity-centric + vector) handles specific questions like "What's the refund policy for opened electronics?" Global search (map-reduce over community summaries) handles broad questions like "What are the main themes?" DRIFT adds a hybrid middle ground, and Basic Search remains the plain vector fallback.
Indexing is expensive, and query cost depends on the search mode. Local search is usually manageable. Global search can get expensive if you have to map over many community-report batches.
Hybrid routing is an evaluation choice. Test simple questions on text retrieval and structural questions on graph/report context, then release routes that meet support, cost, and latency targets.
Start with vector search, add graphs when needed. GraphRAG shines for complex, interconnected datasets and global sensemaking. For simple fact retrieval, vector search remains faster and more cost-effective.
The expensive standard index isn't the only option. LazyGraphRAG (2024) moves LLM work to query time; Microsoft reported vector-RAG-like indexing cost in its comparison. Benchmark lazy and precomputed-report approaches on your query mix rather than assuming that ratio transfers unchanged.
From Local to Global: A Graph RAG Approach to Query-Focused Summarization.
Edge, D., et al. · 2024 · arXiv preprint
GraphRAG Indexing Overview
Microsoft GraphRAG Documentation · 2025
GraphRAG Default Dataflow
Microsoft GraphRAG Documentation · 2025
Unifying Large Language Models and Knowledge Graphs: A Roadmap.
Pan, S., et al. · 2024 · arXiv preprint
GraphRAG Indexing Methods
Microsoft GraphRAG Documentation · 2025
From Louvain to Leiden: guaranteeing well-connected communities.
Traag, V.A., Waltman, L., & van Eck, N.J. · 2019 · Scientific Reports
GraphRAG Local Search
Microsoft GraphRAG Documentation · 2025
GraphRAG Global Search
Microsoft GraphRAG Documentation · 2025
GraphRAG DRIFT Search
Microsoft GraphRAG Documentation · 2025
GraphRAG Outputs
Microsoft GraphRAG Documentation · 2025
GraphRAG Getting Started
Microsoft GraphRAG Documentation · 2025
LazyGraphRAG: Setting a New Standard for Quality and Cost
Larson, J. & Truitt, S. (Microsoft Research) · 2024
GraphRAG CLI Reference
Microsoft GraphRAG Documentation · 2025
Lost in the Middle: How Language Models Use Long Contexts
Liu, N.F., et al. · 2023 · TACL 2023