LearnAdvanced Agents & RetrievalGraphRAG & Knowledge Graphs

🔍HardRAG & Retrieval

GraphRAG & Knowledge Graphs

Learn how GraphRAG uses entity graphs, hierarchical community reports, and embeddings to retrieve evidence for relationship-heavy and corpus-level questions.

38 min read

Learning path

Step 113 of 158 in the full curriculum

Advanced RAG: HyDE & Self-RAG RAG Security & Access Control

An AI platform has 10,000 incident reports, runbooks, and trace summaries. An on-call engineer asks: "Why did inference-api breach the latency SLO during release 2026.06.14?" A simple retrieval system can find the incident about that service. But when an analyst asks: "What are the top three recurring reasons model-serving incidents breach latency SLOs?" a small top-k result set may represent only a few incidents. It doesn't itself provide coverage over recurring themes across the corpus.

The previous chapter made retrieval-augmented generation (RAG) adaptive: rewrite the query, generate hypothetical evidence, critique retrieved context, and retry when evidence is weak. GraphRAG changes a different layer. It indexes relationship structure and summaries alongside text so a system can retrieve connected evidence or corpus-level themes, at additional extraction and query cost. The engineering question is when that tradeoff complements vector search and when it adds needless complexity.

Microsoft's GraphRAG architecture addresses that query class by building a graph-based index of entities and relationships, pregenerating hierarchical community reports, and embedding text artifacts for retrieval ^{[1]Reference 1From Local to Global: A Graph RAG Approach to Query-Focused Summarization.https://arxiv.org/abs/2404.16130}^{[2]Reference 2GraphRAG Indexing Overviewhttps://microsoft.github.io/graphrag/index/overview/}^{[3]Reference 3GraphRAG Default Dataflowhttps://microsoft.github.io/graphrag/index/default_dataflow/}. In the original paper's global sensemaking evaluation, GraphRAG improved answer comprehensiveness and diversity over its vector RAG baseline ^{[1]Reference 1From Local to Global: A Graph RAG Approach to Query-Focused Summarization.https://arxiv.org/abs/2404.16130}. Its index also supplies structured artifacts for entity-focused local context construction. A broader roadmap by Pan et al. places this in the wider effort to combine LLMs with structured knowledge graphs ^{[4]Reference 4Unifying Large Language Models and Knowledge Graphs: A Roadmap.https://arxiv.org/abs/2306.08302}.

GraphRAG overview where the same corpus keeps text chunks, adds graph structure and summary reports, then routes fact lookups to text, entity questions to mixed local graph context, and broad trend questions to global reports. — GraphRAG keeps chunk retrieval, adds graph artifacts and report layers, then routes evidence by question shape.

Picture vector RAG as a filing cabinet and GraphRAG as a case board. Vector search finds chunks that look like the question. A graph index records connections and summary layers that a query method can use when evidence spans multiple documents. Standard GraphRAG local search builds ranked mixed context from graph and text artifacts; a product that needs explicit path traversal must implement and evaluate that behavior.

Why small top-k retrieval can be insufficient

The global query problem

Consider a corpus of 10,000 incident reports, runbooks, traces, and postmortems for an AI product. It stays as the running example throughout the article.

Local queries (vector search handles well)

For a specific question about a single issue, vector search excels:

"What is the timeout for the embeddings API?" Retrieval path: top-5 similar documents, then answer from local context. This works because the needed evidence is concentrated in a few chunks.

The question is self-contained. A few chunks about the API contract and runbook contain everything the system needs.

Global queries (small top-k retrieval can miss coverage)

For a question that requires synthesizing across the entire corpus, small top-k vector retrieval can struggle:

"What are the top 3 recurring reasons model-serving incidents breach the latency SLO?" Failure mode: no single chunk contains a cross-corpus summary, and a top-5 result set is unlikely to represent themes across 10,000 documents. The resulting answer can be incomplete or misleading.

Why similarity-only retrieval struggles

Vector search retrieves the most similar chunks, not necessarily the most representative ones. When answering a global query, no single chunk may contain the complete answer. One way to create broader retrieval units is to add:

Entity extraction across the entire corpus
Relationship modeling between entities
Community detection to find natural clusters of related information
Hierarchical summarization at different levels of abstraction

Feature	Vector Search (Standard RAG)	GraphRAG
Available artifacts	Ranked text chunks	Text units, entities, relationships, community reports
Natural starting point	Specific evidence lookup ("What is X?")	Entity-focused or thematic analysis ("What are the trends?")
Context construction	Similarity-ranked chunks	Ranked graph/text context or report map-reduce
Indexing work	Embedding and optional sparse index	Additional extraction, clustering, reports, and embeddings
Query work	Retrieval plus generation	Depends on search mode; global search adds map-reduce calls

A concrete multi-hop example

Before the definitions, compare the two approaches on a tiny incident corpus:

Incident 1842: "inference-api release 2026.06.14 ran in us-east-1 and hit p95 latency after redis-cache began evicting hot keys." Incident 2031: "redis-cache in us-east-1 still uses the old maxmemory policy. The config migration is blocked by a compatibility test." Incident 3155: "inference-api breached the inference latency SLO twelve minutes after deploy. Rollback restored latency."

Now ask:

"Why did inference-api breach the latency SLO, and is this likely to happen again?"

Illustrative vector-only result: Suppose retrieval finds Incident 3155 (mentions "inference-api" and "latency SLO") and Incident 1842 (mentions "inference-api" and "redis-cache"), but not Incident 2031 because it doesn't mention the service name. The answer can identify the rollout and cache eviction but miss the unresolved maxmemory policy.

Graph-enriched context: Suppose extraction produced these entities and relationships, with source-chunk provenance:

(inference-api) --[ran_in]--> (us-east-1)
(inference-api) --[depends_on]--> (redis-cache)
(inference-api) --[triggered_by]--> (release 2026.06.14)
(redis-cache) --[located_in]--> (us-east-1)
(redis-cache) --[has_issue]--> (old maxmemory policy)
(old maxmemory policy) --[blocked_by]--> (compatibility test)

An implementation that supports evidence-backed traversal could start at inference-api, follow depends_on to redis-cache, and retrieve evidence about both the eviction event and the unresolved maxmemory policy. It may then answer: "inference-api breached the latency SLO after a rollout, and source material ties the incident to cache evictions plus an unresolved cache policy." It shouldn't predict another breach unless source evidence supports that claim.

This is relationship-heavy retrieval: useful evidence is connected through entities that weren't all present in the query. Microsoft's standard local-search dataflow ranks graph and text artifacts into a context window; it isn't a promise that every answer executes a deterministic path. If explicit multi-hop paths matter, preserve edge provenance and test the traversal policy.

require-provenance-for-graph-paths.py

from dataclasses import dataclass

@dataclass(frozen=True)
class Edge:
    source: str
    relation: str
    target: str
    source_chunks: tuple[str, ...]

def cited_path(edges: list[Edge]) -> tuple[bool, list[str]]:
    if any(not edge.source_chunks for edge in edges):
        return False, []
    citations = sorted({chunk for edge in edges for chunk in edge.source_chunks})
    return True, citations

path = [
    Edge("inference-api", "depends_on", "redis-cache", ("incident-1842",)),
    Edge("redis-cache", "has_issue", "old maxmemory policy", ("incident-2031",)),
]
unsupported_path = path + [
    Edge("old maxmemory policy", "will_cause", "future SLO breach", ()),
]

print("supported path:", cited_path(path))
print("unsupported prediction:", cited_path(unsupported_path))

Output

supported path: (True, ['incident-1842', 'incident-2031'])
unsupported prediction: (False, [])

What is a knowledge graph?

A knowledge graph represents information as a network of entities (nodes) and relationships (edges). Unlike a flat vector database that stores text chunks as isolated embeddings, a knowledge graph preserves the structure of how facts connect.

The fundamental unit is the triple: (Subject) - [Predicate] -> (Object). The edge direction should reflect the meaning of the relation itself, not some generic left-to-right convention.

From our running example:

(inference-api) --[depends_on]--> (redis-cache)
(redis-cache) --[has_issue]--> (old maxmemory policy)
(old maxmemory policy) --[blocked_by]--> (compatibility test)

In production, you're usually working with a property graph rather than bare triples. That means nodes and edges carry metadata like type, description, source_chunk_ids, timestamps, or confidence scores. Those extra fields matter for filtering, provenance, and ranking during retrieval.

This graph structure mirrors how operations teams reason through associations. When a query mentions redis-cache, a retrieval system can use typed relationships and provenance to distinguish service dependencies, region-specific incidents, config drift, and release side effects instead of relying on service-name similarity alone.

The GraphRAG pipeline

Phase 1: Graph construction (indexing)

The indexing phase transforms raw document chunks into a structured graph index. In Microsoft's GraphRAG docs, the standard pipeline extracts entities and relationships, optionally extracts claims, builds hierarchical communities, generates community reports, and embeds the resulting artifacts for downstream retrieval ^{[2]Reference 2GraphRAG Indexing Overviewhttps://microsoft.github.io/graphrag/index/overview/}^{[3]Reference 3GraphRAG Default Dataflowhttps://microsoft.github.io/graphrag/index/default_dataflow/}:

GraphRAG indexing turns documents into chunks, graph facts, communities, reports, and retrieval artifacts. — GraphRAG spends upfront. Text becomes graph structure, reports are prepared before queries arrive, and only then are retrieval artifacts embedded.

Step 1: Entity & relationship extraction

Extraction view where one incident chunk becomes typed service, cache, region, and release entities with explicit relationships and shared source-chunk references. — Extraction turns plain text into explicit graph facts. Nodes, edges, and provenance all become first-class retrieval artifacts.

The first step is to pass each document chunk to an extraction system that identifies key entities and the relationships between them. In standard GraphRAG, this is LLM-driven. In FastGraphRAG, some of this work is replaced with cheaper NLP-based heuristics to cut indexing cost ^{[5]Reference 5GraphRAG Indexing Methodshttps://microsoft.github.io/graphrag/index/methods/}. A simple extraction prompt looks like this:

step-1-entity-and-relationship-extraction.py

EXTRACTION_PROMPT = """
Extract all entities and relationships from the following text.

Entities should include: services, datastores, regions, releases, incidents, and config issues.
Relationships should include: ran_in, depends_on, triggered_by, located_in, has_issue, blocked_by.

Text: {chunk_text}

Output as JSON:
{
  "entities": [
    {"name": "...", "type": "...", "description": "..."}
  ],
  "relationships": [
    {
      "source": "...",
      "target": "...",
      "type": "...",
      "description": "...",
      "strength": 1.0
    }
  ]
}
"""

In production, don't rely on raw string prompting for JSON. Use structured output frameworks like Instructor (Python) or Zod (TypeScript) with function calling or tool-use APIs to enforce schema adherence. This prevents syntax errors when processing thousands of chunks.

The runnable version below uses a deterministic extractor so you can test the graph contract without calling an LLM. In production, replace extract_from_chunk() with a structured-output model call and keep the same Entity and Relationship objects.

step-1-entity-and-relationship-extraction-2.py

from dataclasses import dataclass, field

@dataclass
class Entity:
    name: str
    type: str
    description: str
    source_chunk_ids: list[int] = field(default_factory=list)

@dataclass
class Relationship:
    source: str
    target: str
    type: str
    description: str
    confidence: float
    source_chunk_ids: list[int] = field(default_factory=list)

def canonicalize(name: str) -> str:
    return name.lower().replace(".", "").replace(" ", "")

def extract_from_chunk(chunk_id: int, chunk: str) -> tuple[list[Entity], list[Relationship]]:
    entities: list[Entity] = []
    relationships: list[Relationship] = []

    if "inference-api" in chunk:
        entities.append(Entity("inference-api", "Service", "Inference API service", [chunk_id]))
    if "redis-cache" in chunk:
        entities.append(Entity("Redis Cache", "Datastore", "Shared cache dependency", [chunk_id]))
    if "us-east-1" in chunk:
        entities.append(Entity("US East 1", "Region", "Production region", [chunk_id]))
    if "release 2026.06.14" in chunk:
        entities.append(Entity("Release 2026.06.14", "Release", "Inference API deployment", [chunk_id]))
    if "old maxmemory policy" in chunk:
        entities.append(Entity("Old Maxmemory Policy", "ConfigIssue", "Cache eviction policy risk", [chunk_id]))
    if "compatibility test" in chunk:
        entities.append(Entity("Compatibility Test", "TestGate", "Migration-blocking test", [chunk_id]))

    names = {entity.name for entity in entities}
    if {"inference-api", "Redis Cache"} <= names:
        relationships.append(
            Relationship("inference-api", "Redis Cache", "depends_on", "Service uses cache", 1.0, [chunk_id])
        )
    if {"inference-api", "US East 1"} <= names:
        relationships.append(
            Relationship("inference-api", "US East 1", "ran_in", "Service ran in region", 0.9, [chunk_id])
        )
    if {"inference-api", "Release 2026.06.14"} <= names:
        relationships.append(
            Relationship("inference-api", "Release 2026.06.14", "triggered_by", "Latency followed release", 0.8, [chunk_id])
        )
    if {"Redis Cache", "US East 1"} <= names:
        relationships.append(
            Relationship("Redis Cache", "US East 1", "located_in", "Cache region", 0.8, [chunk_id])
        )
    if {"Redis Cache", "Old Maxmemory Policy"} <= names:
        relationships.append(
            Relationship("Redis Cache", "Old Maxmemory Policy", "has_issue", "Eviction policy risk", 0.8, [chunk_id])
        )
    if {"Old Maxmemory Policy", "Compatibility Test"} <= names:
        relationships.append(
            Relationship("Old Maxmemory Policy", "Compatibility Test", "blocked_by", "Migration gate", 0.7, [chunk_id])
        )

    return entities, relationships

def deduplicate_entities(entities: list[Entity]) -> list[Entity]:
    merged: dict[str, Entity] = {}
    for entity in entities:
        key = canonicalize(entity.name)
        if key not in merged:
            merged[key] = entity
            continue
        merged[key].source_chunk_ids.extend(entity.source_chunk_ids)
    return list(merged.values())

def extract_graph_elements(chunks: list[str]) -> tuple[list[Entity], list[Relationship]]:
    all_entities: list[Entity] = []
    all_relationships: list[Relationship] = []
    entity_name_by_key: dict[str, str] = {}

    for chunk_id, chunk in enumerate(chunks):
        entities, relationships = extract_from_chunk(chunk_id, chunk)
        all_entities.extend(entities)
        all_relationships.extend(relationships)

    deduped_entities = deduplicate_entities(all_entities)
    for entity in deduped_entities:
        entity_name_by_key[canonicalize(entity.name)] = entity.name

    resolved_relationships = [
        Relationship(
            source=entity_name_by_key[canonicalize(rel.source)],
            target=entity_name_by_key[canonicalize(rel.target)],
            type=rel.type,
            description=rel.description,
            confidence=rel.confidence,
            source_chunk_ids=rel.source_chunk_ids,
        )
        for rel in all_relationships
    ]

    return deduped_entities, resolved_relationships

chunks = [
    "inference-api hit p95 latency in us-east-1 because redis-cache started evicting hot keys during release 2026.06.14.",
    "redis-cache in us-east-1 still uses the old maxmemory policy. Config migration is blocked by a compatibility test.",
]

entities, relationships = extract_graph_elements(chunks)
entity_names = {entity.name for entity in entities}
relationship_types = {rel.type for rel in relationships}

print("entities:", ", ".join(sorted(entity_names)))
print("relationships:", ", ".join(sorted(relationship_types)))

Output

entities: Compatibility Test, Old Maxmemory Policy, Redis Cache, Release 2026.06.14, US East 1, inference-api
relationships: blocked_by, depends_on, has_issue, located_in, ran_in, triggered_by

Step 2: Build the knowledge graph

Once the entities and relationships are extracted and deduplicated, construct the graph data structure. Mapping isolated facts from individual chunks into an interconnected network gives later retrieval policies access to relationships that weren't explicit in any single chunk.

This function takes the lists of entities and relationships as input. Using a graph processing library like NetworkX, it adds each entity as a node and each typed relationship as a directed edge with provenance and weighting metadata:

step-2-build-the-knowledge-graph.py

from dataclasses import dataclass, field

import networkx as nx

@dataclass
class Entity:
    name: str
    type: str
    description: str
    source_chunk_ids: list[int] = field(default_factory=list)

@dataclass
class Relationship:
    source: str
    target: str
    type: str
    description: str
    confidence: float
    source_chunk_ids: list[int] = field(default_factory=list)

def build_knowledge_graph(
    entities: list[Entity],
    relationships: list[Relationship]
) -> nx.MultiDiGraph:
    """Construct a directed property graph from extracted elements."""
    G = nx.MultiDiGraph()

    for entity in entities:
        G.add_node(entity.name,
                   type=entity.type,
                   description=entity.description,
                   source_chunk_ids=entity.source_chunk_ids)

    for rel in relationships:
        G.add_edge(rel.source, rel.target,
                   key=rel.type,
                   type=rel.type,
                   description=rel.description,
                   weight=rel.confidence,
                   source_chunk_ids=rel.source_chunk_ids)

    return G

entities = [
    Entity("inference-api", "Service", "Inference API service", [0]),
    Entity("Redis Cache", "Datastore", "Shared cache dependency", [0]),
    Entity("US East 1", "Region", "Production region", [0, 1]),
]
relationships = [
    Relationship("inference-api", "Redis Cache", "depends_on", "Service uses cache", 1.0, [0]),
    Relationship("inference-api", "US East 1", "ran_in", "Service ran in region", 0.9, [0]),
]

graph = build_knowledge_graph(entities, relationships)

print("nodes:", sorted(graph.nodes))
print("edges:", sorted((src, rel_type, dst) for src, dst, rel_type in graph.edges(keys=True)))
print("US East 1 node type:", graph.nodes["US East 1"]["type"])
print("depends_on weight:", graph["inference-api"]["Redis Cache"]["depends_on"]["weight"])

Output

nodes: ['Redis Cache', 'US East 1', 'inference-api']
edges: [('inference-api', 'depends_on', 'Redis Cache'), ('inference-api', 'ran_in', 'US East 1')]
US East 1 node type: Region
depends_on weight: 1.0

Step 3: Community detection (Leiden algorithm)

After the graph is built, we need to find natural clusters of related entities. In an incident corpus, those clusters might become themes such as cache pressure, queue starvation, release risk, and eval flakiness. The algorithm decides the groups based on how densely entities connect to each other, not on keyword matching.

The Leiden algorithm identifies these clusters. It's preferred over the classic Louvain method because Louvain can produce disconnected communities (nodes in the same group that aren't connected by edges). Traag et al. show that Leiden adds a refinement phase and guarantees connected communities ^{[6]Reference 6From Louvain to Leiden: guaranteeing well-connected communities.https://arxiv.org/abs/1810.08473}. That avoids a structural defect before report generation, although it can't correct bad entity extraction or unsupported edges. Microsoft's GraphRAG pipeline applies hierarchical Leiden recursively until communities hit a size threshold, which is how it gets both coarse and fine-grained views of the same corpus ^{[3]Reference 3GraphRAG Default Dataflowhttps://microsoft.github.io/graphrag/index/default_dataflow/}.

One quality function Leiden can optimize is modularity ( $Q$ ), a measure of community structure quality in the extracted graph. In plain terms, modularity asks: "Are there more edges inside each group than we'd expect by chance?" A high $Q$ indicates strong clustering in that graph; it doesn't prove that extraction captured the underlying topic correctly.

For an unweighted, undirected graph, the formula is:

$Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)$

Where $A_{ij}$ is 1 if nodes $i$ and $j$ are directly connected and 0 otherwise; $k_i$ and $k_j$ are the number of edges each node touches; $m$ is the total number of edges in the graph; and $\delta(c_i, c_j)$ is 1 if the two nodes are in the same community, 0 otherwise. Higher modularity indicates better community structure.

The runnable example below computes one weighted Leiden partition with python-igraph. Standard GraphRAG recursively applies community detection to build a hierarchy; keeping this example at one level makes the clustering contract visible and keeps it fast enough to run locally.

step-3-community-detection-leiden-algorithm.py

import igraph as ig

def detect_communities(
    weighted_edges: list[tuple[str, str, float]],
) -> list[list[str]]:
    """Compute one Leiden partition of an undirected weighted entity graph."""
    node_names = sorted(
        {node for source, target, _ in weighted_edges for node in (source, target)}
    )
    graph = ig.Graph()
    graph.add_vertices(node_names)
    graph.add_edges([(source, target) for source, target, _ in weighted_edges])
    graph.es["weight"] = [weight for _, _, weight in weighted_edges]
    partition = graph.community_leiden(
        objective_function="modularity",
        weights="weight",
    )
    groups = [
        sorted(graph.vs[index]["name"] for index in community)
        for community in partition
    ]
    return sorted(groups, key=lambda group: group[0])

weighted_edges = [
    ("inference-api", "Latency SLO", 1.0),
    ("inference-api", "Rollback", 1.0),
    ("Redis Cache", "Eviction Storm", 1.0),
    ("GPU Queue", "Preemption Bug", 1.0),
    ("Eval Harness", "Flaky Judge", 1.0),
]
groups = detect_communities(weighted_edges)

print("community_count:", len(groups))
for group in groups:
    print("-", ", ".join(group))

Output

community_count: 4
- Eval Harness, Flaky Judge
- Eviction Storm, Redis Cache
- GPU Queue, Preemption Bug
- Latency SLO, Rollback, inference-api

The diagram visualizes how entities group into hierarchical communities, from specific concepts at the bottom to broad themes at the top:

Hierarchy view where service, cache, queue, and eval entities cluster into incident communities, then roll up into broad reports that global GraphRAG search can summarize before drilling back down. — Communities compress many local entities into a few higher-level reports. Global search starts high, then drills down only when the answer needs detail.

Step 4: Community summarization

This function takes the entities, their descriptions, and their relationships within a community as input. It passes these to a summarizer, returning a community report that can later be used for global search ^{[1]Reference 1From Local to Global: A Graph RAG Approach to Query-Focused Summarization.https://arxiv.org/abs/2404.16130}^{[3]Reference 3GraphRAG Default Dataflowhttps://microsoft.github.io/graphrag/index/default_dataflow/}:

step-4-community-summarization.py

from dataclasses import dataclass
from typing import Protocol

@dataclass(frozen=True)
class Entity:
    name: str
    description: str

@dataclass(frozen=True)
class Relationship:
    source: str
    type: str
    target: str

class CommunitySummarizer(Protocol):
    def summarize(self, prompt: str) -> str: ...

class FakeCommunitySummarizer:
    def summarize(self, prompt: str) -> str:
        return (
            "Cache latency cluster: inference-api depends on redis-cache, "
            "and cache evictions recur when the old maxmemory policy remains active."
        )

def summarize_community(
    community_entities: list[Entity],
    community_relationships: list[Relationship],
    level: int,
    summarizer: CommunitySummarizer,
) -> str:
    entity_lines = "\n".join(
        f"- {entity.name}: {entity.description}" for entity in community_entities
    )
    relationship_lines = "\n".join(
        f"- {rel.source} {rel.type} {rel.target}" for rel in community_relationships
    )
    prompt = (
        f"Community level: {level}\n"
        f"Entities:\n{entity_lines}\n"
        f"Relationships:\n{relationship_lines}\n"
        "Summarize the main operational pattern."
    )
    return summarizer.summarize(prompt)

summary = summarize_community(
    [Entity("Redis Cache", "Shared cache dependency")],
    [Relationship("Redis Cache", "has_issue", "Old Maxmemory Policy")],
    level=0,
    summarizer=FakeCommunitySummarizer(),
)

print(summary)

Output

Cache latency cluster: inference-api depends on redis-cache, and cache evictions recur when the old maxmemory policy remains active.

Phase 2: Query processing (runtime)

GraphRAG's query engine exposes Basic Search, Local Search, Global Search, and DRIFT Search. Local and global search are the core mental model here: local search builds context for entity-centric questions, global search synthesizes over community reports, Basic Search is a text-focused baseline, and DRIFT combines community-level entry points with local exploration ^{[7]Reference 7GraphRAG Local Searchhttps://microsoft.github.io/graphrag/query/local_search/}^{[8]Reference 8GraphRAG Global Searchhttps://microsoft.github.io/graphrag/query/global_search/}^{[9]Reference 9GraphRAG DRIFT Searchhttps://microsoft.github.io/graphrag/query/drift_search/}.

Local search (specific questions)

Local search goes beyond named-entity matching and one-hop neighbor expansion. In Microsoft's GraphRAG docs, local search maps the query into semantically related entities, then prioritizes a mixed context from connected entities, relationships, community reports, linked text units, and optionally covariates if claim extraction is enabled ^{[7]Reference 7GraphRAG Local Searchhttps://microsoft.github.io/graphrag/query/local_search/}.

The runnable example below demonstrates a bounded neighborhood expansion for intuition. It's a custom graph-walk sketch, not an exact reproduction of the GraphRAG context builder or its ranking policy:

local-search-specific-questions.py

import networkx as nx

class EntityStore:
    def __init__(self, entity_names: list[str]) -> None:
        self.entity_names = entity_names

    def similarity_search(self, query: str, k: int = 10) -> list[str]:
        query_lower = query.lower()
        matches = [
            entity for entity in self.entity_names if entity.lower() in query_lower
        ]
        return matches[:k]

def expand_entity_neighborhood(
    graph: nx.MultiDiGraph, entity_names: list[str], hops: int = 1
) -> nx.MultiDiGraph:
    nodes: set[str] = set(entity_names)
    frontier: set[str] = set(entity_names)
    for _ in range(hops):
        next_frontier: set[str] = set()
        for node in frontier:
            next_frontier.update(graph.successors(node))
            next_frontier.update(graph.predecessors(node))
        nodes.update(next_frontier)
        frontier = next_frontier
    return graph.subgraph(nodes).copy()

def render_local_context(
    entities: list[str],
    relationships: list[tuple[str, str, dict]],
    text_units: dict[str, list[str]],
    community_reports: dict[str, list[str]],
) -> str:
    return "\n".join(
        [
            f"Entities: {entities}",
            f"Relationships: {[(src, data['type'], dst) for src, dst, data in relationships]}",
            f"Text units: {text_units}",
            f"Reports: {community_reports}",
        ]
    )

def local_search(
    query: str,
    entity_store: EntityStore,
    graph: nx.MultiDiGraph,
    text_units: dict[str, list[str]],
    community_reports: dict[str, list[str]],
) -> str:
    mapped_entities = entity_store.similarity_search(query, k=10)
    neighborhood = expand_entity_neighborhood(graph, mapped_entities, hops=2)
    neighborhood_entities = sorted(neighborhood.nodes)
    context = render_local_context(
        entities=neighborhood_entities,
        relationships=list(neighborhood.edges(data=True)),
        text_units={entity: text_units.get(entity, []) for entity in neighborhood_entities},
        community_reports={
            entity: community_reports.get(entity, []) for entity in neighborhood_entities
        },
    )
    return f"Answer using local GraphRAG context:\n{context}"

graph = nx.MultiDiGraph()
graph.add_edge("inference-api", "Redis Cache", type="depends_on")
graph.add_edge("Redis Cache", "Old Maxmemory Policy", type="has_issue")

answer = local_search(
    "Why did inference-api breach the latency SLO?",
    EntityStore(["inference-api", "Redis Cache", "Old Maxmemory Policy"]),
    graph,
    text_units={"inference-api": ["inference-api breached the latency SLO after release 2026.06.14."]},
    community_reports={"Redis Cache": ["Cache eviction reports point to maxmemory policy risk."]},
)

print(answer)

Output

Answer using local GraphRAG context:
Entities: ['Old Maxmemory Policy', 'Redis Cache', 'inference-api']
Relationships: [('inference-api', 'depends_on', 'Redis Cache'), ('Redis Cache', 'has_issue', 'Old Maxmemory Policy')]
Text units: {'Old Maxmemory Policy': [], 'Redis Cache': [], 'inference-api': ['inference-api breached the latency SLO after release 2026.06.14.']}
Reports: {'Old Maxmemory Policy': [], 'Redis Cache': ['Cache eviction reports point to maxmemory policy risk.'], 'inference-api': []}

The actual context-builder problem includes a token budget: relevant entities, relationships, text units, and reports compete for room in one generation request. A production route needs ranking and provenance, not unbounded expansion.

pack-ranked-local-context-under-a-budget.py

from dataclasses import dataclass

@dataclass(frozen=True)
class Candidate:
    source: str
    score: float
    tokens: int
    citation: str

def pack_context(candidates: list[Candidate], token_budget: int) -> list[Candidate]:
    selected: list[Candidate] = []
    used = 0
    for candidate in sorted(candidates, key=lambda item: item.score, reverse=True):
        if used + candidate.tokens <= token_budget:
            selected.append(candidate)
            used += candidate.tokens
    return selected

candidates = [
    Candidate("relationship: depends_on", 0.98, 30, "incident-1842"),
    Candidate("text: maxmemory issue", 0.91, 55, "incident-2031"),
    Candidate("report: cache overview", 0.62, 80, "community-7"),
]
selected = pack_context(candidates, token_budget=90)

print("selected:", [item.source for item in selected])
print("citations:", [item.citation for item in selected])
print("tokens:", sum(item.tokens for item in selected))

Output

selected: ['relationship: depends_on', 'text: maxmemory issue']
citations: ['incident-1842', 'incident-2031']
tokens: 85

Global search (broad questions)

For queries that require cross-corpus understanding, GraphRAG uses a map-reduce pattern over community reports from a selected hierarchy level. The global-search docs batch community reports into chunks, produce rated intermediate points during the map step, then aggregate the highest-value points in the reduce step ^{[1]Reference 1From Local to Global: A Graph RAG Approach to Query-Focused Summarization.https://arxiv.org/abs/2404.16130}^{[8]Reference 8GraphRAG Global Searchhttps://microsoft.github.io/graphrag/query/global_search/}:

global-search-broad-questions.py

from dataclasses import dataclass

@dataclass(frozen=True)
class RatedPoint:
    text: str
    rating: int

def batch_reports(reports: list[str], batch_size: int = 2) -> list[list[str]]:
    return [reports[index:index + batch_size] for index in range(0, len(reports), batch_size)]

def map_report_batch(query: str, report_batch: list[str]) -> list[RatedPoint]:
    points: list[RatedPoint] = []
    query_lower = query.lower()
    asks_about_latency = any(term in query_lower for term in ("latency", "slo", "miss", "delay"))
    for report in report_batch:
        report_lower = report.lower()
        if asks_about_latency and any(term in report_lower for term in ("latency", "slo", "queue", "eviction")):
            points.append(RatedPoint(report, rating=9))
        elif "security" in query_lower and "credential" in report_lower:
            points.append(RatedPoint(report, rating=6))
    return points

def select_top_points(points: list[RatedPoint], top_k: int) -> list[RatedPoint]:
    return sorted(points, key=lambda point: point.rating, reverse=True)[:top_k]

def global_search(
    query: str,
    reports_by_level: dict[int, list[str]],
    level: int,
) -> str:
    """Answer dataset-level questions with map-reduce over community reports."""
    batches = batch_reports(reports_by_level[level], batch_size=2)
    mapped_points = [
        point
        for batch in batches
        for point in map_report_batch(query, batch)
    ]
    top_points = select_top_points(mapped_points, top_k=3)
    bullets = "\n".join(f"- {point.text}" for point in top_points)
    return f"Top recurring themes for '{query}':\n{bullets}"

reports_by_level = {
    1: [
        "Latency breaches cluster around cache eviction storms.",
        "GPU queue starvation delays batch inference jobs.",
        "Evaluation failures cluster around flaky judge prompts.",
        "SLO breaches recur after cache config drift.",
    ]
}

answer = global_search(
    "What are the main reasons services miss the latency SLO?",
    reports_by_level,
    level=1,
)

print(answer)

Output

Top recurring themes for 'What are the main reasons services miss the latency SLO?':
- Latency breaches cluster around cache eviction storms.
- GPU queue starvation delays batch inference jobs.
- SLO breaches recur after cache config drift.

The level argument above hides a cost problem. Static global search maps over every community report at chosen level, so cost scales with that whole level even when most communities are irrelevant. Dynamic community selection starts near hierarchy top, rates each report against query, prunes irrelevant branches, and expands only sub-communities under surviving reports. Microsoft reports finer-grained selection with fewer reports entering map step and lower token cost at comparable answer quality.^{[10]Reference 10GraphRAG: Improving global search via dynamic community selection.https://www.microsoft.com/en-us/research/blog/graphrag-improving-global-search-via-dynamic-community-selection/} Rating calls add work, so benchmark dynamic selection against static search on your workload.

Hybrid graph-vector architecture

A useful production design doesn't treat vector search and GraphRAG as mutually exclusive. The GraphRAG stack already mixes graph structure with embeddings during both indexing and query-time context building ^{[2]Reference 2GraphRAG Indexing Overviewhttps://microsoft.github.io/graphrag/index/overview/}^{[7]Reference 7GraphRAG Local Searchhttps://microsoft.github.io/graphrag/query/local_search/}. One hybrid architecture looks like this:

Hybrid GraphRAG router where one question pool feeds a routing gate, then direct fact lookups stay on text, connected entity questions use graph neighbors plus chunks, and broad trend questions use report rollups before one shared answer surface. — Hybrid GraphRAG keeps direct lookups on text and spends graph or report work only when question shape needs it.

A router is a policy to evaluate, not a guarantee. Measure supported-answer accuracy, p95 latency, and spend for each query class. In a workload dominated by direct evidence lookups, a text route may cover most requests; an analyst-heavy workload may justify more graph/report queries.

release-graph-routes-from-evaluation.py

from dataclasses import dataclass

@dataclass(frozen=True)
class RouteResult:
    route: str
    query_class: str
    supported_accuracy: float
    p95_ms: int

def release_route(
    results: list[RouteResult],
    query_class: str,
    minimum_accuracy: float,
    maximum_p95_ms: int,
) -> str:
    eligible = [
        result for result in results
        if result.query_class == query_class
        and result.supported_accuracy >= minimum_accuracy
        and result.p95_ms <= maximum_p95_ms
    ]
    return max(
        eligible,
        key=lambda result: (result.supported_accuracy, -result.p95_ms),
    ).route

results = [
    RouteResult("basic", "fact_lookup", 0.96, 95),
    RouteResult("local", "fact_lookup", 0.96, 240),
    RouteResult("basic", "corpus_trend", 0.61, 92),
    RouteResult("global", "corpus_trend", 0.91, 580),
]

print("fact route:", release_route(results, "fact_lookup", 0.90, 200))
print("trend route:", release_route(results, "corpus_trend", 0.90, 700))

Output

fact route: basic
trend route: global

One subtle detail: GraphRAG the technique doesn't require a dedicated graph database. Microsoft's reference implementation writes structured output tables to disk and builds query context from those artifacts directly. A graph database like Neo4j or Neptune becomes useful when you need custom traversals, shared KG infrastructure, or analyst-facing graph queries outside the stock pipeline ^{[11]Reference 11GraphRAG Outputshttps://microsoft.github.io/graphrag/index/outputs/}.

Query expansion and context bridging

Beyond simple routing, the knowledge graph layer can actively enrich vector results.

Query expansion uses entities found in initial vector results to expand the search query, finding semantically related but textually distinct content.

Context bridging is a custom retrieval option when two retrieved chunks don't directly connect. If edges retain supporting chunks, a traversal can identify intermediate entities for retrieval and citation. For example, if one chunk mentions "inference-api depends on redis-cache" and another mentions "redis-cache still uses the old maxmemory policy," a policy can inspect inference-api -> redis-cache -> old maxmemory policy, then fetch source text before making a claim about the latency incident.

Result ranking can boost text results with short, supported paths to query entities. Treat this as a ranker feature to evaluate, not proof that a nearby node supports the answer.

Performance considerations

Indexing cost

The biggest barrier to adopting standard GraphRAG is upfront indexing cost. Compared with vector-only RAG, its standard pipeline adds LLM-heavy graph extraction, summarization, community report generation, and multiple embedding passes ^{[1]Reference 1From Local to Global: A Graph RAG Approach to Query-Focused Summarization.https://arxiv.org/abs/2404.16130}^{[3]Reference 3GraphRAG Default Dataflowhttps://microsoft.github.io/graphrag/index/default_dataflow/}. Whether the resulting query quality justifies that work is an evaluation question.

Mitigation strategies

To make this viable at scale, teams need to optimize the ingestion pipeline. Microsoft explicitly recommends starting with fast, inexpensive models while you learn the system ^{[12]Reference 12GraphRAG Getting Startedhttps://microsoft.github.io/graphrag/get_started/}, and their indexing-methods docs estimate that graph extraction (entity and relationship extraction plus their summarization) is roughly 75% of standard indexing cost ^{[5]Reference 5GraphRAG Indexing Methodshttps://microsoft.github.io/graphrag/index/methods/}. If your use case is mostly global summarization, FastGraphRAG can reduce cost further: it replaces LLM-based entity extraction with NLP noun-phrase extraction (using libraries like NLTK or spaCy) and defines relationships by entity co-occurrence within a text unit. The graph is noisier and less reusable outside GraphRAG, but indexing is much cheaper ^{[5]Reference 5GraphRAG Indexing Methodshttps://microsoft.github.io/graphrag/index/methods/}.

LazyGraphRAG in practice

Microsoft Research introduced LazyGraphRAG in November 2024 ^{[13]Reference 13LazyGraphRAG: Setting a New Standard for Quality and Costhttps://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/}. It targets a central cost concern with standard GraphRAG: paying for an LLM-driven index before knowing how often graph-assisted queries will run.

LazyGraphRAG defers LLM use to query time. Its index uses NLP noun-phrase extraction and graph statistics for community structure, without entity summaries or precomputed community reports. In Microsoft's reported experiment, its indexing cost matched the vector RAG setup and was approximately 0.1% of full GraphRAG indexing cost ^{[13]Reference 13LazyGraphRAG: Setting a New Standard for Quality and Costhttps://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/}. Treat that as a benchmark result, not a universal constant for every corpus and deployment. At query time it blends vector similarity with community structure and exposes a relevance test budget that trades query cost for quality. Microsoft reports that, at evaluated budget levels, LazyGraphRAG was competitive with or better than the tested alternatives on local and global query criteria ^{[13]Reference 13LazyGraphRAG: Setting a New Standard for Quality and Costhttps://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/}.

The practical takeaway is that precomputed community reports aren't the only candidate for global sensemaking. If the corpus changes frequently or global queries are rare, benchmark a lazy approach against standard GraphRAG. If reusable community reports are a product output, the standard index provides artifacts the lazy path intentionally omits.

Use observed quality and workload volume to compare indexing strategies rather than choosing from architecture labels:

choose-index-strategy-from-workload.py

from dataclasses import dataclass

@dataclass(frozen=True)
class Strategy:
    name: str
    index_cost: float
    query_cost: float
    supported_accuracy: float

def choose_strategy(
    strategies: list[Strategy], query_count: int, accuracy_floor: float
) -> tuple[str, float]:
    passing = [
        strategy for strategy in strategies
        if strategy.supported_accuracy >= accuracy_floor
    ]
    winner = min(
        passing,
        key=lambda strategy: strategy.index_cost + query_count * strategy.query_cost,
    )
    total_cost = winner.index_cost + query_count * winner.query_cost
    return winner.name, total_cost

# Illustrative measured values from one product evaluation, not vendor benchmarks.
strategies = [
    Strategy("lazy", index_cost=1.0, query_cost=1.7, supported_accuracy=0.91),
    Strategy("precomputed_reports", index_cost=200.0, query_cost=0.8, supported_accuracy=0.93),
]

print("few queries:", choose_strategy(strategies, query_count=10, accuracy_floor=0.90))
print("many queries:", choose_strategy(strategies, query_count=500, accuracy_floor=0.90))

Output

few queries: ('lazy', 18.0)
many queries: ('precomputed_reports', 600.0)

Query cost

While indexing represents the bulk of the computational expense, runtime query costs can also be higher than standard vector search. The cost depends heavily on the query strategy used:

Local search: Usually moderate. You still have to build a mixed context from entity embeddings, graph neighborhoods, text units, and community reports, but the response path is much narrower than global search ^{[7]Reference 7GraphRAG Local Searchhttps://microsoft.github.io/graphrag/query/local_search/}.
Global search: Potentially expensive. Cost grows with the number of community-report batches you need to map over and the hierarchy level you choose ^{[8]Reference 8GraphRAG Global Searchhttps://microsoft.github.io/graphrag/query/global_search/}.

Mitigation strategies

To manage these runtime costs, standard GraphRAG precomputes community reports at multiple hierarchical levels. At query time, evaluate whether a coarse level answers the question adequately before paying for more detailed reports. Lower levels tend to yield more thorough responses, but they can also increase report volume and LLM work ^{[8]Reference 8GraphRAG Global Searchhttps://microsoft.github.io/graphrag/query/global_search/}.

Graph maintenance

Knowledge graphs aren't static; they must evolve as the underlying document corpus changes. Keeping the graph in sync with a live, mutating dataset adds engineering complexity. Current GraphRAG releases now expose explicit update flows and standard-update / fast-update methods, and the output tables include fields used for incremental update merges ^{[14]Reference 14GraphRAG CLI Referencehttps://microsoft.github.io/graphrag/cli/}^{[11]Reference 11GraphRAG Outputshttps://microsoft.github.io/graphrag/index/outputs/}.

Additions: When new documents arrive, the system must extract entities and relationships, then merge them into the existing graph. New connections can shift community structure, requiring reclustering and regeneration of affected reports.
Deletions: Removing a document isn't as simple as deleting a row in a database. The system must trace and remove nodes or edges supported solely by that document. If removed evidence changes graph structure, communities and reports may need recomputation.
Strategy: Even with update support, a deployment may prefer scheduled refreshes for structural artifacts. Entity merges, community boundaries, and report summaries can shift when new documents arrive, so online reclustering is harder to operate than plain vector re-indexing.

When to use GraphRAG

Because of the high indexing and maintenance costs, GraphRAG shouldn't be treated as a default replacement for all retrieval tasks; it's a specialized tool for complex analytical workloads. Evaluate the query profile of your application before committing to a knowledge graph architecture. This decision matrix shows where each approach fits:

Scenario	Standard RAG (Vector)	GraphRAG
Fact lookup ("What is the API timeout?")	Strong baseline with low retrieval work	Additional structure may not pay off
Local context ("Summarize this specific ticket")	Strong baseline	Useful when entity-linked context matters
Global summary ("What are top 3 trends?")	Small top-k may underrepresent themes	Designed to synthesize community reports
Relationship-heavy question ("How does X affect Y via Z?")	Needs expansion or linked evidence	Can retrieve graph-linked evidence; test path behavior separately
Dataset size	Lower marginal indexing cost	Costly when graph extraction and reporting run over large corpora

Use this framework to decide:

Start with standard RAG (vector search)	Consider GraphRAG
Simple fact lookup ("What is the API timeout?")	Global summarization ("What are our top incident themes?")
Single-document questions	Cross-document reasoning ("How does Redis config drift affect inference latency?")
Prototyping or small datasets	Large, stable corpora with complex relationships
Real-time, cost-sensitive applications	Analytical workloads where query cost is acceptable
Frequently changing data with direct lookups	Repeated structural analysis or reusable reports

The pragmatic path: Start with vector search. It's cheaper, faster, and easier to maintain. Only add the graph layer when you hit specific limitations:

Users ask questions that require synthesizing information from many documents
You need to trace influence chains (who reports to whom, what depends on what)
Global sensemaking queries are a core use case, not an edge case
The corpus is large enough to justify the indexing investment

Explain GraphRAG as a tradeoff

Treat GraphRAG as an engineering tradeoff, not a buzzword. If you can explain when graph structure earns its indexing and query cost, you can defend the design in a review or interview.

Key concepts

GraphRAG adds entities, typed relationships, and community reports on top of chunk retrieval.
Local search answers entity-centric questions with mixed graph and text context.
Global search answers corpus-level questions with map-reduce over community reports.
Hierarchical Leiden avoids disconnected communities before report generation; extraction quality still determines whether reports are useful.
LazyGraphRAG shifts most LLM cost from indexing time to query time.

Skill levels to aim for

Foundational: Explain why small top-k similarity retrieval can underrepresent global sensemaking queries.
Intermediate: Describe the GraphRAG indexing pipeline: entity extraction, relationship extraction, entity and relationship summarization, community detection, community reports, and retrieval-time embeddings.
Advanced: Explain why hierarchical Leiden matters for community detection and report generation.
Advanced: Compare Local Search, which builds mixed entity/text/report context, with Global Search, which map-reduces over community reports.
Advanced: Analyze cost and latency tradeoffs: expensive indexing buys query capabilities that plain vector search doesn't have.
Advanced: Design a hybrid graph-vector architecture where simple lookups stay cheap and structural questions get graph context.
Advanced: Explain how to compare LazyGraphRAG with a precomputed-report pipeline using measured quality, query volume, and index/query cost.

Follow-up questions

Common pitfalls

When engineers first encounter knowledge graphs and GraphRAG, they often bring assumptions from traditional vector search or long-context LLMs. These are the most common traps, what they look like in practice, and how to avoid them.

Mistake 1: "Use a bigger context window"

Symptom: You stuff 100,000 tokens into a long-context model and ask for a summary. The output misses key themes or contradicts itself.
Cause: A 1M token context window doesn't automatically solve global sensemaking. You still have to decide what to include, very large prompts are expensive to run, and models can still underuse information buried in the middle of long contexts ^{[15]Reference 15Lost in the Middle: How Language Models Use Long Contextshttps://arxiv.org/abs/2307.03172}.
Fix: Benchmark selective retrieval and structured indexing against long-context prompting. Standard GraphRAG is one candidate when repeated global questions justify precomputed reports; it isn't required for every long document.

Mistake 2: Treating GraphRAG as a replacement for vector search

Symptom: You replace your entire vector index with a knowledge graph. Simple lookups become slow and expensive.
Cause: GraphRAG is complementary, not substitutive. Local search uses entity-description embeddings and linked text units as part of context construction. Global search adds report synthesis for questions that require coverage across many documents.
Fix: Use a hybrid router. Route simple fact lookups to vector search, entity questions to local search, and global summaries to global search.

Mistake 3: Ignoring entity resolution

Symptom: Your graph has three separate nodes for RedisCache, redis-cache, and redis.internal. Queries that should traverse through the cache dependency fail because the path is broken.
Cause: Entity extraction from unstructured text is inherently noisy. Skipping deduplication leaves near-duplicate nodes that fragment the graph.
Fix: Implement a three-stage resolution pipeline: (1) normalize names (lowercase, strip punctuation), (2) use embeddings or string distance to find candidate duplicates, (3) use a lightweight LLM verification step to confirm whether two names refer to the same entity before merging.

Mistake 4: Extracting every noun as a node

Symptom: The graph balloons to millions of nodes, most of them useless. Common words like "service," "alert," and "team" become nodes with thousands of meaningless edges.
Cause: Without a schema, the extractor treats every noun as an entity.
Fix: Define an ontology (a typed schema) before extraction. Restrict nodes to domain-relevant types like Service, Datastore, Region, Incident, ConfigIssue, and Runbook. Filter low-information entities in a post-processing step.

Mistake 5: Underestimating indexing cost

Symptom: The prototype works on 200 documents, then the first full-corpus run burns budget on extraction, summarization, and embedding passes.
Cause: Standard GraphRAG does much more than embed chunks. It asks models to extract entities, extract relationships, summarize repeated descriptions, generate community reports, and embed several downstream artifacts.
Fix: Measure indexing cost before committing to the architecture. Start with a small representative corpus, use inexpensive models while tuning prompts, consider FastGraphRAG for global-summary-heavy workloads, and set a refresh cadence instead of pretending every graph update is free.

Mistake 6: Ignoring graph maintenance

Symptom: Search results become stale or paths break after documents are added, deleted, or corrected.
Cause: A graph index has structure. New evidence can merge entities, change edge weights, move community boundaries, and invalidate old community reports.
Fix: Treat graph refresh as a product requirement. Track provenance with text_unit_ids, run incremental update flows where they are good enough, and schedule full rebuilds when entity resolution or community boundaries drift.

Try it yourself

Here's a short exercise to check your understanding. Work through it on paper or in a text editor before reading the solution sketch.

The corpus

Doc A: "The inference gateway routes overflow traffic to GPU Queue B during peak load." Doc B: "GPU Queue B has a priority inversion bug that can starve batch jobs." Doc C: "Batch job #7712 was routed through the inference gateway, waited in GPU Queue B, and timed out after 18 minutes."

The question: "Why did batch job #7712 time out, and what systemic issue might cause similar failures?"

Step 1: List the entities and relationships you'd extract from each document.

Step 2: Draw (or list) the triples that connect batch job #7712 to the systemic issue.

Step 3: Explain why standard vector RAG might miss the systemic issue.

Solution sketch

Entities: inference gateway, GPU Queue B, priority inversion bug, batch job #7712, timeout.

Key triples

(batch job #7712) --[routed_through]--> (inference gateway)
(inference gateway) --[routes_to]--> (GPU Queue B)
(GPU Queue B) --[has_issue]--> (priority inversion bug)
(priority inversion bug) --[can_cause]--> (batch job starvation)
(batch job #7712) --[result]--> (timeout)

Multi-hop path: batch job #7712 --[routed_through]--> inference gateway --[routes_to]--> GPU Queue B --[has_issue]--> priority inversion bug. The queue edge explains why that scheduler bug is relevant to this specific timeout.

Why similarity retrieval may miss it: No single document contains both "batch job #7712" and "priority inversion bug." A search for "Why did batch job #7712 time out?" may retrieve Doc C while missing Docs A and B because neither mentions the job. If extraction is accurate, a graph makes the path from job to gateway to queue bug explicit and retrievable with provenance.

When the graph earns its cost

GraphRAG addresses a global query problem that small top-k vector retrieval can handle poorly. Standard GraphRAG adds structure and precomputed summaries for corpus-level synthesis.
Standard GraphRAG is more than graph extraction. It extracts entities and relationships, can optionally extract claims, builds hierarchical Leiden communities, generates community reports, and embeds multiple artifacts for retrieval.
Local search (entity-centric + vector) handles specific questions like "Why did this batch job time out?" Global search (map-reduce over community summaries) handles broad questions like "What are the main themes?" DRIFT adds a hybrid middle ground, and Basic Search remains the plain vector fallback.
Indexing is expensive, and query cost depends on the search mode. Local search is usually manageable. Global search can get expensive if you have to map over many community-report batches.
Hybrid routing is an evaluation choice. Test simple questions on text retrieval and structural questions on graph/report context, then release routes that meet support, cost, and latency targets.
Start with vector search, add graphs when needed. GraphRAG helps with complex, interconnected datasets and global sensemaking. For simple fact retrieval, vector search remains faster and cheaper.
The expensive standard index isn't the only option. LazyGraphRAG (2024) moves LLM work to query time; Microsoft reported vector-RAG-like indexing cost in its comparison. Benchmark lazy and precomputed-report approaches on your query mix rather than assuming that ratio transfers unchanged.

Start by labeling 30 real queries as fact lookup, entity-specific investigation, or corpus-wide trend. Use that label table as a release gate: ship GraphRAG only for the bucket where vector retrieval fails and graph/report context improves cited-answer quality within your latency budget.

Next Step

Continue to RAG Security & Access Control

There, you'll understand row-level security, document <span data-glossary="acl">ACLs</span>, and per-user filtering in vector stores to prevent RAG systems from leaking confidential data. That topic builds directly on the hybrid retrieval ideas here: once you're routing queries between vector and graph layers, you need to make sure each user only sees the entities and chunks they're authorized to access.

PreviousAdvanced RAG: HyDE & Self-RAG

Share this article

X Facebook LinkedIn Bluesky Reddit Hacker News Email

References

From Local to Global: A Graph RAG Approach to Query-Focused Summarization.

Edge, D., et al. · 2024 · arXiv preprint

GraphRAG Indexing Overview