Skip to content
ceaksan

Pre-injection vs MCP Tool Loop: Context Strategies for AI Coding Agents

GrapeRoot injects context before the model starts, CodeGraphContext uses MCP tool loops. Same repo, same model, same prompts. One comes out 31% cheaper. Where's the architectural difference? Benchmark data and comparison with my own system.

Feb 21, 2026 7 min read
TL;DR

AI coding agents need context to understand codebases. Two fundamental strategies exist: injecting context before the model starts (pre-injection) or pulling it via tool calls while the model runs (MCP tool loop). GrapeRoot with pre-injection runs 31% cheaper on average, while CodeGraphContext with MCP tool loops costs more than even vanilla Claude. The difference is architectural: when the model starts reasoning immediately, turn count drops and token consumption decreases. But pre-injection has its own risks.

Same repo, same model (Claude Sonnet 4.6), same 20 prompts. GrapeRoot averages $0.17 per prompt, CodeGraphContext $0.27, vanilla Claude $0.25. The tool-using agent costs more than Claude without any tools. The problem isn’t the tool, it’s the architecture.

Two Different Architectures

AI coding agents need context to understand codebases. There are two fundamental strategies for providing this context:

Pre-injection: Before the Model Starts

Graph lookup happens before the model begins. Relevant files, functions, and dependencies are injected into context. The model already has the necessary information on its first turn.

Query -> Graph lookup -> Find relevant files -> Add to context -> Model starts reasoning

The model doesn’t make tool calls, doesn’t search for files, doesn’t go on discovery tours. It starts answering directly.

MCP Tool Loop: While the Model Runs

The model starts working, decides what it needs, calls an MCP tool, reads the result, thinks again. This cycle can repeat multiple times.

Query -> Model thinks -> "I don't know this" -> Tool call -> Read result -> Think again -> ...

Every tool call is an additional turn. Every turn is additional tokens. The loop continues until the model finds the answer.

Benchmark Data

Benchmark shared by the GrapeRoot developer1 (mid-sized repo, 20 tasks):

MetricVanilla ClaudeGrapeRootCodeGraphContext
Avg cost/prompt$0.25$0.17$0.27
Cost winner3/2016/201/20
Quality (regex)66.073.866.2
Quality (LLM judge)86.287.987.2
Avg turn count10.68.911.7

Notable observations:

CodeGraphContext costs more than vanilla Claude. Because the MCP tool loop adds extra turns, using a tool can sometimes be more expensive than not using one at all. Every tool call decision, invocation, and result reading gets added to context.

GrapeRoot leads in quality too. Not just cheaper, it scores highest in both regex validation and LLM judge scores. Better results with fewer turns.

Turn count directly correlates with cost. GrapeRoot 8.9, vanilla 10.6, CGC 11.7 turns. The ranking maps one-to-one with cost.

Caveats

This benchmark shouldn’t be taken at face value:

  • GrapeRoot’s own benchmark, on their own repo
  • 20 tasks, mid-sized repo (results may differ on large monorepos)
  • No independent verification
  • Task distribution (symbol lookup, endpoint tracing, architecture reasoning) covers areas where GrapeRoot excels

But the architectural argument holds: fewer turns = fewer tokens. This is true regardless of the benchmark.

Academic support exists too: Shahnovsky and Dror (2026) mapped LLM agent planning approaches to classical AI paradigms2. Step-by-step agents (BFS, decide at each step = MCP tool loop) showed 38.41% overall success rate, while plan-ahead agents (DFS, plan upfront = pre-injection-like) reached 89% element accuracy. So pre-injection isn’t just cheaper, it’s potentially more accurate.

Architecture of Three Tools

GrapeRoot: Dual Graph + Pre-injection

GrapeRoot maintains two separate graphs:

Code Graph (Information Graph): Files, functions, and dependencies. The project’s static structure. Which file depends on which, how functions call each other.

Session Graph (Action Graph): What the model has read and edited in the current session. This graph grows as the session progresses. After the first question, instead of re-reading the same files, the model is guided by the session graph.

The session graph idea is powerful: when the model reads a file, the graph remembers it. We solve a simpler version of this with PreCompact/SessionStart hooks: before session compaction, changed files, branch, and git status are saved, then automatically restored when a new session starts. GrapeRoot’s session graph is a much more sophisticated version: not just a file list, but ranking by relevance x recency x edit weight. If the same file is needed in the next query, it’s served from cache instead of being re-read. This prevents the context window from filling up with the same information repeatedly.

Integration: Works with Claude Code and Codex CLI via MCP protocol. Started with dgc /path/to/project, launches a local MCP server.

CodeGraphContext: Graph DB + MCP Tool Loop

CodeGraphContext (CGC) uses the classic MCP tool approach:

  • AST parsing with Tree-sitter (14 languages)
  • Store in Graph DB (KuzuDB default, FalkorDB or Neo4j optional)
  • Serve to AI as MCP server
  • Let the model decide what it needs

Strength: Relationship query support (caller/callee, class hierarchy, call chain). The model can ask “who calls this function?” and get a complete answer from the graph.

Weakness: Every query is a tool call. The model decides, calls, reads the result, thinks. This loop gets added to context every turn.

code-review-graph: Blast Radius + MCP

A third approach: code-review-graph parses AST with Tree-sitter, stores in local SQLite, and performs “blast radius” analysis.

When a file changes, it finds all files that call it, inherit from it, and test it. During code review, the model reads only the affected files.

Benchmark results are impressive:

ScenarioToken reduction
Code review (httpx)26.2x
Code review (FastAPI)8.1x
Next.js monorepo coding49.1x (739K -> 15K tokens)

This also uses MCP tool loops but with a different focus: not general search, but change impact analysis.

Martin Fowler’s Context Engineering Taxonomy

Martin Fowler’s context engineering article3 defines three context loading strategies:

Pre-injection (Always Loaded)

CLAUDE.md files are loaded deterministically at session start. Every session, always. The project’s fundamental rules and structure live here.

What GrapeRoot does is the same, but at the code level: not just rules, but relevant files loaded at session start.

Reactive (On-Demand)

Skills and MCP servers are loaded based on the model’s decision. They activate when the model says “I need this.”

CodeGraphContext and dnomia-knowledge fall in this category. The model expresses what it needs via tool calls, the MCP server returns results.

Agent-Triggered (Deterministic)

Hooks run at fixed lifecycle points. Not dependent on the model’s decision, triggered by events.

dnomia-knowledge’s PostToolUse hook is in this category: automatic interaction logging after every Read/Edit. The model doesn’t request this, the hook triggers.

Trade-offs of Each Approach

Pre-injectionMCP Tool LoopHook-based
StrengthFew turns, low costFlexibility, model decidesDeterministic, model-independent
WeaknessWrong file selection riskMany turns, high costPassive, only collects data
Best scenarioStructural queriesTasks requiring explorationLong-term pattern analysis
Worst scenarioComplex, multi-file tasksSimple, single-file queriesSituations needing immediate decisions

When pre-injection breaks: If the query requires 10 different files and the graph selects only 3, the model reasons with incomplete information. In a tool loop, the model can say “I need more information” and make additional calls.

When tool loops are unnecessary: For simple structural queries like “what are this function’s parameters?”, the model gets the answer on the first tool call, but the loop mechanism still adds extra turns (deciding, calling, reading).

The Fourth Way: Hybrid

The best strategy is probably not a single approach, but a combination of both:

  1. Session start: Inject structural summaries of the most frequently accessed files using trace hot data (pre-injection)
  2. During work: Deepen with MCP tools, specific searches (reactive)
  3. In the background: Collect interaction data with hooks, improve pre-injection quality for subsequent sessions (agent-triggered)

This is cyclical: hooks collect data, data improves pre-injection, pre-injection reduces turn count, lower turn count produces better data.

Where Does dnomia-knowledge Stand?

Currently in the MCP tool loop category. But several features bring it closer to hybrid:

  • Interaction boost: Frequently accessed files rank higher in search results. This is, indirectly, “learning from past sessions.”
  • PreToolUse hook: Blocks large file reads and redirects to search. This is a deterministic intervention that preserves the context window.
  • Trace analytics: Hot files, gaps, and decay data accumulate. This data is a ready source for pre-injection.

The missing piece has been partially addressed: trace-informed review was added to court and critique skills. Before a review starts, dnomia-knowledge trace hot and trace gaps are called automatically, with hot files evaluated as high-risk change areas and gaps as documentation debt. Full pre-injection doesn’t exist yet, but trace data is now fed into decision processes.

Cost or Quality?

If you use Claude Code Max (fixed monthly fee), the per-token cost argument is moot. But turn count still matters:

  • Fewer turns = faster completion. 9 turns vs 12 turns, noticeable difference in total time.
  • Fewer turns = less context consumption. Every turn gets added to context. Fewer turns fill the context window more slowly.
  • Less context consumption = less compression. When the context window fills, conversation compression kicks in and previous information is lost.

So even if the cost calculation doesn’t directly apply, the turn reduction that pre-injection provides improves the quality of the experience.

Conclusion

Context engineering is the next competitive frontier for AI coding tools. Most current tools use MCP tool loops (CodeGraphContext, dnomia-knowledge, code-review-graph). GrapeRoot chose a different path with pre-injection and stood out in benchmarks.

But there is no single “best” approach. It varies based on task complexity, codebase size, and use case. In the future, all tools will likely evolve toward a hybrid approach: static context via pre-injection, dynamic deepening via MCP, background data collection via hooks.

Source code:

Footnotes

  1. GrapeRoot benchmark: Reddit post, benchmark repo. 20 tasks, Claude Sonnet 4.6, mid-sized repo.
  2. Shahnovsky, O., Dror, R. (2026). AI Planning Framework for LLM-Based Web Agents. arXiv. Step-by-step (BFS) vs plan-ahead (DFS) agent comparison, with 794 human-labeled trajectories.
  3. Martin Fowler, “Context Engineering for Coding Agents”: https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html
Key Takeaways
  • 01 Pre-injection (GrapeRoot) is 31% cheaper on average, up to 90% on some tasks. Because the model doesn't enter a tool call loop, it starts reasoning directly.
  • 02 MCP tool loop (CodeGraphContext) can cost more than even vanilla Claude. Every tool call adds an extra turn, every turn adds extra tokens.
  • 03 Session graph (tracking what the model has read) is the most interesting idea. It prevents re-discovering the same files over and over.
  • 04 A hybrid approach is probably best: initial context via pre-injection, deepening via MCP tools.
Frequently Asked Questions (FAQ)
+ Is pre-injection always better?

No. If the wrong files are selected, tokens are wasted. Pre-injection depends on the initial routing quality of the graph. For complex, multi-file tasks, MCP tool loops are more flexible because the model decides what it needs.

+ Are these benchmarks reliable?

It's GrapeRoot's own benchmark, on their own repo, with 20 tasks. No independent verification. But the architectural argument (fewer turns = fewer tokens) is logically consistent.

+ Does cost matter if I use Claude Code Max?

If you're not paying per token, the cost argument is moot. But turn count still matters: fewer turns = faster completion, less context window consumption, less compression risk.

+ Which category does dnomia-knowledge fall into?

MCP tool loop. But interaction boost brings it closer to session tracking. Pre-injection mode doesn't exist yet, but context injection at session start using trace hot data is planned.