Same repo, same model (Claude Sonnet 4.6), same 20 prompts. GrapeRoot averages $0.17 per prompt, CodeGraphContext $0.27, vanilla Claude $0.25. The tool-using agent costs more than Claude without any tools. The problem isn’t the tool, it’s the architecture.
Two Different Architectures
AI coding agents need context to understand codebases. There are two fundamental strategies for providing this context:
Pre-injection: Before the Model Starts
Graph lookup happens before the model begins. Relevant files, functions, and dependencies are injected into context. The model already has the necessary information on its first turn.
Query -> Graph lookup -> Find relevant files -> Add to context -> Model starts reasoning
The model doesn’t make tool calls, doesn’t search for files, doesn’t go on discovery tours. It starts answering directly.
MCP Tool Loop: While the Model Runs
The model starts working, decides what it needs, calls an MCP tool, reads the result, thinks again. This cycle can repeat multiple times.
Query -> Model thinks -> "I don't know this" -> Tool call -> Read result -> Think again -> ...
Every tool call is an additional turn. Every turn is additional tokens. The loop continues until the model finds the answer.
Benchmark Data
Benchmark shared by the GrapeRoot developer1 (mid-sized repo, 20 tasks):
| Metric | Vanilla Claude | GrapeRoot | CodeGraphContext |
|---|---|---|---|
| Avg cost/prompt | $0.25 | $0.17 | $0.27 |
| Cost winner | 3/20 | 16/20 | 1/20 |
| Quality (regex) | 66.0 | 73.8 | 66.2 |
| Quality (LLM judge) | 86.2 | 87.9 | 87.2 |
| Avg turn count | 10.6 | 8.9 | 11.7 |
Notable observations:
CodeGraphContext costs more than vanilla Claude. Because the MCP tool loop adds extra turns, using a tool can sometimes be more expensive than not using one at all. Every tool call decision, invocation, and result reading gets added to context.
GrapeRoot leads in quality too. Not just cheaper, it scores highest in both regex validation and LLM judge scores. Better results with fewer turns.
Turn count directly correlates with cost. GrapeRoot 8.9, vanilla 10.6, CGC 11.7 turns. The ranking maps one-to-one with cost.
Caveats
This benchmark shouldn’t be taken at face value:
- GrapeRoot’s own benchmark, on their own repo
- 20 tasks, mid-sized repo (results may differ on large monorepos)
- No independent verification
- Task distribution (symbol lookup, endpoint tracing, architecture reasoning) covers areas where GrapeRoot excels
But the architectural argument holds: fewer turns = fewer tokens. This is true regardless of the benchmark.
Academic support exists too: Shahnovsky and Dror (2026) mapped LLM agent planning approaches to classical AI paradigms2. Step-by-step agents (BFS, decide at each step = MCP tool loop) showed 38.41% overall success rate, while plan-ahead agents (DFS, plan upfront = pre-injection-like) reached 89% element accuracy. So pre-injection isn’t just cheaper, it’s potentially more accurate.
Architecture of Three Tools
GrapeRoot: Dual Graph + Pre-injection
GrapeRoot maintains two separate graphs:
Code Graph (Information Graph): Files, functions, and dependencies. The project’s static structure. Which file depends on which, how functions call each other.
Session Graph (Action Graph): What the model has read and edited in the current session. This graph grows as the session progresses. After the first question, instead of re-reading the same files, the model is guided by the session graph.
The session graph idea is powerful: when the model reads a file, the graph remembers it. We solve a simpler version of this with PreCompact/SessionStart hooks: before session compaction, changed files, branch, and git status are saved, then automatically restored when a new session starts. GrapeRoot’s session graph is a much more sophisticated version: not just a file list, but ranking by relevance x recency x edit weight. If the same file is needed in the next query, it’s served from cache instead of being re-read. This prevents the context window from filling up with the same information repeatedly.
Integration: Works with Claude Code and Codex CLI via MCP protocol. Started with dgc /path/to/project, launches a local MCP server.
CodeGraphContext: Graph DB + MCP Tool Loop
CodeGraphContext (CGC) uses the classic MCP tool approach:
- AST parsing with Tree-sitter (14 languages)
- Store in Graph DB (KuzuDB default, FalkorDB or Neo4j optional)
- Serve to AI as MCP server
- Let the model decide what it needs
Strength: Relationship query support (caller/callee, class hierarchy, call chain). The model can ask “who calls this function?” and get a complete answer from the graph.
Weakness: Every query is a tool call. The model decides, calls, reads the result, thinks. This loop gets added to context every turn.
code-review-graph: Blast Radius + MCP
A third approach: code-review-graph parses AST with Tree-sitter, stores in local SQLite, and performs “blast radius” analysis.
When a file changes, it finds all files that call it, inherit from it, and test it. During code review, the model reads only the affected files.
Benchmark results are impressive:
| Scenario | Token reduction |
|---|---|
| Code review (httpx) | 26.2x |
| Code review (FastAPI) | 8.1x |
| Next.js monorepo coding | 49.1x (739K -> 15K tokens) |
This also uses MCP tool loops but with a different focus: not general search, but change impact analysis.
Martin Fowler’s Context Engineering Taxonomy
Martin Fowler’s context engineering article3 defines three context loading strategies:
Pre-injection (Always Loaded)
CLAUDE.md files are loaded deterministically at session start. Every session, always. The project’s fundamental rules and structure live here.
What GrapeRoot does is the same, but at the code level: not just rules, but relevant files loaded at session start.
Reactive (On-Demand)
Skills and MCP servers are loaded based on the model’s decision. They activate when the model says “I need this.”
CodeGraphContext and dnomia-knowledge fall in this category. The model expresses what it needs via tool calls, the MCP server returns results.
Agent-Triggered (Deterministic)
Hooks run at fixed lifecycle points. Not dependent on the model’s decision, triggered by events.
dnomia-knowledge’s PostToolUse hook is in this category: automatic interaction logging after every Read/Edit. The model doesn’t request this, the hook triggers.
Trade-offs of Each Approach
| Pre-injection | MCP Tool Loop | Hook-based | |
|---|---|---|---|
| Strength | Few turns, low cost | Flexibility, model decides | Deterministic, model-independent |
| Weakness | Wrong file selection risk | Many turns, high cost | Passive, only collects data |
| Best scenario | Structural queries | Tasks requiring exploration | Long-term pattern analysis |
| Worst scenario | Complex, multi-file tasks | Simple, single-file queries | Situations needing immediate decisions |
When pre-injection breaks: If the query requires 10 different files and the graph selects only 3, the model reasons with incomplete information. In a tool loop, the model can say “I need more information” and make additional calls.
When tool loops are unnecessary: For simple structural queries like “what are this function’s parameters?”, the model gets the answer on the first tool call, but the loop mechanism still adds extra turns (deciding, calling, reading).
The Fourth Way: Hybrid
The best strategy is probably not a single approach, but a combination of both:
- Session start: Inject structural summaries of the most frequently accessed files using trace hot data (pre-injection)
- During work: Deepen with MCP tools, specific searches (reactive)
- In the background: Collect interaction data with hooks, improve pre-injection quality for subsequent sessions (agent-triggered)
This is cyclical: hooks collect data, data improves pre-injection, pre-injection reduces turn count, lower turn count produces better data.
Where Does dnomia-knowledge Stand?
Currently in the MCP tool loop category. But several features bring it closer to hybrid:
- Interaction boost: Frequently accessed files rank higher in search results. This is, indirectly, “learning from past sessions.”
- PreToolUse hook: Blocks large file reads and redirects to search. This is a deterministic intervention that preserves the context window.
- Trace analytics: Hot files, gaps, and decay data accumulate. This data is a ready source for pre-injection.
The missing piece has been partially addressed: trace-informed review was added to court and critique skills. Before a review starts, dnomia-knowledge trace hot and trace gaps are called automatically, with hot files evaluated as high-risk change areas and gaps as documentation debt. Full pre-injection doesn’t exist yet, but trace data is now fed into decision processes.
Cost or Quality?
If you use Claude Code Max (fixed monthly fee), the per-token cost argument is moot. But turn count still matters:
- Fewer turns = faster completion. 9 turns vs 12 turns, noticeable difference in total time.
- Fewer turns = less context consumption. Every turn gets added to context. Fewer turns fill the context window more slowly.
- Less context consumption = less compression. When the context window fills, conversation compression kicks in and previous information is lost.
So even if the cost calculation doesn’t directly apply, the turn reduction that pre-injection provides improves the quality of the experience.
Conclusion
Context engineering is the next competitive frontier for AI coding tools. Most current tools use MCP tool loops (CodeGraphContext, dnomia-knowledge, code-review-graph). GrapeRoot chose a different path with pre-injection and stood out in benchmarks.
But there is no single “best” approach. It varies based on task complexity, codebase size, and use case. In the future, all tools will likely evolve toward a hybrid approach: static context via pre-injection, dynamic deepening via MCP, background data collection via hooks.
Source code:
- dnomia-knowledge (hybrid search + interaction tracking)
- GrapeRoot (dual graph + pre-injection)
- CodeGraphContext (graph DB + MCP)
- code-review-graph (blast radius + MCP)
Related Posts
- AI Agent Protocols Guide: Positioning MCP alongside 5 other open protocols
- Context Engineering Ecosystem: MCP’s role in the four-layer context architecture
Footnotes
- GrapeRoot benchmark: Reddit post, benchmark repo. 20 tasks, Claude Sonnet 4.6, mid-sized repo. ↩
- Shahnovsky, O., Dror, R. (2026). AI Planning Framework for LLM-Based Web Agents. arXiv. Step-by-step (BFS) vs plan-ahead (DFS) agent comparison, with 794 human-labeled trajectories. ↩
- Martin Fowler, “Context Engineering for Coding Agents”: https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html ↩
- 01 Pre-injection (GrapeRoot) is 31% cheaper on average, up to 90% on some tasks. Because the model doesn't enter a tool call loop, it starts reasoning directly.
- 02 MCP tool loop (CodeGraphContext) can cost more than even vanilla Claude. Every tool call adds an extra turn, every turn adds extra tokens.
- 03 Session graph (tracking what the model has read) is the most interesting idea. It prevents re-discovering the same files over and over.
- 04 A hybrid approach is probably best: initial context via pre-injection, deepening via MCP tools.
+ Is pre-injection always better?
No. If the wrong files are selected, tokens are wasted. Pre-injection depends on the initial routing quality of the graph. For complex, multi-file tasks, MCP tool loops are more flexible because the model decides what it needs.
+ Are these benchmarks reliable?
It's GrapeRoot's own benchmark, on their own repo, with 20 tasks. No independent verification. But the architectural argument (fewer turns = fewer tokens) is logically consistent.
+ Does cost matter if I use Claude Code Max?
If you're not paying per token, the cost argument is moot. But turn count still matters: fewer turns = faster completion, less context window consumption, less compression risk.
+ Which category does dnomia-knowledge fall into?
MCP tool loop. But interaction boost brings it closer to session tracking. Pre-injection mode doesn't exist yet, but context injection at session start using trace hot data is planned.