Context Engineering for AI Coding Agents: From Static Documents to a Living Ecosystem

TL;DR

A single CLAUDE.md file is not enough for an AI agent to work correctly. I built a four-layer ecosystem: (1) static references (CLAUDE.md + architecture.md), (2) JIT semantic search (mcp-code-search + dnomia-knowledge), (3) decision governance (/court + ADRs), (4) learning loops (forge retro). In this post I explain what each layer does, how I applied it in a real project (3,600-line architecture.md, 155 ADRs), and why tools like Repomix became unnecessary.

The biggest productivity loss when working with AI coding agents is not writing code, it’s re-introducing the project every session. A CLAUDE.md file is a good start but it’s not enough. In this post, I explain the four-layer context engineering ecosystem I built through real project experience.

Problem: The Agent Starts From Scratch Every Session

When working on a large project with Claude Code, Cursor, or GitHub Copilot, you experience this cycle:

Agent starts scanning code
Makes wrong inferences due to surface-level scanning (confuses similar names in different modules)
Context window starts filling with tool outputs
You correct, agent applies the correction
Session ends, context is lost
Next session: go to 1

2025-2026 research¹ shows that models perform significantly better when fed structured, persistent reference points compared to repo scanning. This gave rise to the “context engineering” discipline: maximizing the signal-to-noise ratio in the agent’s context window.

But a single CLAUDE.md file doesn’t solve this alone. You need to systematically manage what is where in the project, why it’s that way, and when to access that information.

Four-Layer Ecosystem

The structure I built through experimentation in a real SaaS project (event tracking platform, monorepo, 992 source files, 155 ADRs):

Layer	What	How	When
1. Static references	CLAUDE.md + architecture.md	Loaded automatically at session start	Always
2. JIT search	mcp-code-search + dnomia-knowledge	Semantic search when agent needs it	On demand
3. Decision governance	/court + ADRs	Before new features or architectural decisions	At decision time
4. Learning loop	forge retro	After completing work	At work completion

Each layer feeds the next. Static references are the agent’s starting point, JIT search is the deepening tool, decision governance ensures consistency, and the learning loop keeps the entire system current.

Layer 1: Static References

CLAUDE.md: Giving Instructions to the Agent

CLAUDE.md is the file the agent automatically reads at the start of every session. Its content is “what to do, how to do it” instructions:

Performance rules (“don’t use barrel imports”, “use Promise.all for parallel calls”)
Workflow rules (“enter plan mode”, “write spec”, “create ADR”)
Deployment commands (fully executable strings)
Boundaries (do/don’t lists)

CLAUDE.md’s job is to manage the agent. Not to describe the project’s structure.

architecture.md: Describing the Project’s Reality

Discovering this distinction took me several sessions. I tried putting project structure, module maps, data flow into CLAUDE.md. The file grew, readability dropped, and the agent started confusing which information was a rule versus a reference.

Solution: CLAUDE.md for agent instructions, architecture.md for project reference. They complement each other but live in separate files.

architecture.md contents:

Section	What It Describes
Stack and Dependencies	Technology stack with exact version numbers
Monorepo Structure	Directory tree, file sizes, responsibilities
Module Map	Each module’s responsibility, dependencies, key files
Data Flow	How data flows through the system (edge to DB, DB to destination)
Data Model	Structured summary of the Prisma schema
Infrastructure	Platform, region, ports, proxy, tunnel information
Architectural Decisions	Major decisions and why they were made (with ADR references)
Performance Rules	Actual application state (applied/not applied)
Code Hotspots	Most frequently changed files based on git change frequency
Related Notes	Cross-references to vault and repo documentation

In my project, this file reached ~3,600 lines. Sounds like a lot, but the agent doesn’t read the entire file every time. It jumps to the section it needs. The structure, thanks to heading hierarchy, makes this easy.

Repomix Experience: Why It Became Unnecessary

While writing architecture.md, I also tried Repomix. Repomix is a tool that packages the codebase into a single Markdown file, extracting function signatures with Tree-sitter.

Results:

Mode	File Count	Tokens	Assessment
Full (entire repo)	992	1.9M	10x the context window
Compressed (entire repo)	992	1.25M	Still unusable
Compressed (TS/TSX only)	489	90K	Theoretically fits but half the context

Repomix’s directory tree and git-change-count ranking were useful. I extracted hotspot files from there. But its main value is in tools that “can’t read files” (ChatGPT web interface, web Claude). Claude Code already reads files directly. Structured architecture.md + JIT search provides more targeted information than Repomix’s flat dump.

The Practical Impact of Separation

Before architecture.md, the agent’s investigation of the Inngest proxy worker structure took ~20 minutes (SSH attempts, API endpoint guessing, port scanning). After architecture.md, the agent directly reads the relevant section: container name, ports, tunnel routes, env vars. All in one place.

Similarly, answering “why did we switch to Neon?” used to require scanning 155 ADRs. Now architecture.md’s “Architectural Decisions” section has a summary with the ADR-127 reference, and the agent opens the ADR file if needed.

Layer 2: JIT Semantic Search

Static references are loaded every session, but not all information fits in static files. In a 992-file codebase, the answer to “where is consent checking done?” might be spread across 4-5 different files. Writing this into architecture.md would bloat the file. Having the agent grep for it every time consumes context window.

Solution: a semantic search layer where the agent can pull only relevant information on demand.

mcp-code-search: Semantic Search for Code

mcp-code-search is a semantic code search server that connects to Claude Code via MCP (Model Context Protocol).

How it works:

Directory scan -> Tree-sitter AST parse -> Chunk (function/class/method) -> Embed (jina-v2-base-code) -> LanceDB -> Hybrid search

When the agent says “find authentication middleware” or “rate limiting implementation”, unlike grep, it performs semantic matching. It finds the “rate-limiter.ts” file but can also surface files with unrelated names that use the “token bucket” pattern.

Feature	Detail
Chunking	Tree-sitter AST (40+ languages)
Embedding	jina-embeddings-v2-base-code (768 dim, code-focused)
Storage	LanceDB (local, zero network)
Search	Hybrid: vector similarity + FTS, RRF merge (k=60)
Incremental indexing	Hash-based, only changed files

Why grep isn’t enough: For “where is consent checking done?”, grep searches for the word consent and returns 47 results. mcp-code-search approaches the same question semantically and returns the 5-10 most relevant chunks. 10 snippets enter the context window instead of 47 files.

dnomia-knowledge: Semantic Search for Knowledge Base

dnomia-knowledge is a knowledge management MCP server that indexes Markdown, MDX, and code files.

How it differs from mcp-code-search:

	mcp-code-search	dnomia-knowledge
Focus	Code files	Markdown + code + web content
Embedding	jina-v2-base-code (code-focused)	multilingual-e5-base (multilingual)
Storage	LanceDB	SQLite + FTS5 + sqlite-vec
Chunking	AST-based (function/class)	Heading-based (## and ###)
Extra features	Find similar code	Knowledge graph, web indexing

They work together: the agent directs architectural questions to dnomia-knowledge and implementation questions to mcp-code-search. Both are connected via MCP, and the agent decides which is more appropriate. dnomia-knowledge also performs developer interaction tracking: tracking which files are read most, which searches return zero results, and applying interaction boost to personalize search results.

Progressive Disclosure: Revealing Information Gradually

Even a 1M token context window degrades in performance when filled with too much information². That’s why “pull what’s needed” beats “load everything”.

In the ecosystem, this works as follows:

Session start: CLAUDE.md + architecture.md loaded automatically (static, always needed)
First question: Agent reads the relevant section from architecture.md (jump-to pointer)
Deepening: Agent sends semantic query to mcp-code-search or dnomia-knowledge (JIT)
Decision needed: Agent opens the ADR file (on-demand)

At each step, only the needed information enters the context. This is the opposite of Repomix’s “put everything in one file” approach.

Layer 3: Decision Governance

Knowing the codebase structure and being able to search isn’t enough. If you can’t answer “why are we using Inngest instead of Redis?”, the agent might one day want to add Redis. Or try to solve a problem using a method you previously evaluated and rejected.

/court: Structured Evaluation

/court is an evaluation skill that runs before new features or architectural decisions. It applies the Decision Gate framework’s 8 criteria and delivers a verdict of GO, DEFER, or KILL.

But /court’s real value for context engineering: every decision is recorded as an ADR. The answer to “why did we make this decision?” doesn’t get lost in session-based context. When the agent returns to the same topic in the future, it can read the previous evaluation and its rationale.

ADRs: Living Constraints

Architecture Decision Records are usually thought of as passive logs. But in the agent ecosystem, they’re active constraints:

Architectural boundaries: “No direct DB queries from the collect worker, must go through Hyperdrive” (ADR-062)
Data processing constraints: “PII cannot be logged in plaintext, AES-256-GCM + blind index” (ADR-137)
Rejected alternatives: “Redis cache evaluated, rejected due to operational burden” (ADR-128)

The agent sees the summary in architecture.md’s “Architectural Decisions” section. If detail is needed, it opens the ADR file. This prevents the same discussion from recurring.

My project has 155 ADRs. Instead of writing each one into architecture.md, I summarized 12 major decisions in 4 categories and added the ADR index as a reference. The agent starts from the summary, deepens if needed.

”Kernel of Truth” Workflow

I didn’t write all 155 ADRs from scratch. Most were created with the “Kernel of Truth” pattern: I write one sentence (“We switched to Neon because there was a Docker port bypass security vulnerability”), the agent expands it into a structured ADR format. Writing effort is minimal, but the decision record is permanent.

Layer 4: Learning Loop

The first three layers provide information. The fourth layer keeps information current.

Forge Retro: Extracting Patterns from Completed Work

The last step of the Forge pipeline, /retro, extracts permanent patterns from completed features:

Read the court decision (why did we GO?)
Examine the implementation (what changed?)
Check critique findings (what issues came up?)
If a permanent pattern exists, add it to CLAUDE.md or architecture.md
Clean up temporary information

Critical point: retro doesn’t grow the knowledge base, it prunes it. It says “this pattern repeated 3 times, it should be a rule” and adds it to CLAUDE.md. It says “this was a temporary workaround” and deletes it. Upsert logic, not append.

Closing the Loop

Session start: CLAUDE.md + architecture.md loaded
    |
Working: deepening via JIT search
    |
Decision: evaluation via /court -> ADR record
    |
Work complete: pattern extraction via /retro
    |
Update: CLAUDE.md / architecture.md updated
    |
Next session: current references loaded

With each iteration, references become slightly more accurate, slightly more current. The agent does slightly less discovery and slightly more production each session.

Research Foundation

I didn’t invent this ecosystem from scratch. Findings compiled from 2025-2026 research (academic papers, Gemini Deep Research, GPT-4o analysis, Kimi K2.5 research) formed the foundation:

Codified Context approach¹: A three-layer system tested in a 108,000-line C# project (Hot Memory, Domain Expert Agents, Cold Memory). Critical finding: documentation is infrastructure, it requires maintenance like code.

AGENTS.md ecosystem: Different config files for different AI tools (CLAUDE.md, AGENTS.md, .cursorrules) but all serving the same purpose: giving the agent structured context. “Nearest-Wins” model: root file provides global standards, subdirectory files provide local guidance.

Hybrid approach consensus: All sources converge on the same point: “what” is auto-generated (schema, types, dependency graph), “why” is human-written (design decisions, constraints, trade-offs). Together, they give the agent the full picture.

Progressive disclosure: Opening information only at the moment of need rather than loading it all at once. Jump-to pointers, executable search commands, nested overrides.

What Worked and What Didn’t in Practice

Investment	Result
architecture.md (~3,600 lines)	Agent’s discovery time dropped significantly. Especially for infrastructure questions (ports, proxy, tunnel), direct reference instead of trial-and-error.
ADR index (155 decisions)	Recurring discussions ended. Being able to say “we evaluated this before” is very valuable.
mcp-code-search	More accurate than grep, especially for “find all places that do X” queries.
dnomia-knowledge	Very useful for vault notes and documentation search. Complementary when combined with code search.
/court	In a codebase audit, 6 out of 28 tasks were eliminated or deferred. Filters bad ideas early.
Repomix	Became unnecessary except for hotspot analysis. Claude Code can already read files.
Forge retro	Not enough data yet (new). Concept is correct, too early for impact measurement.

Template

If you want to adapt this structure to your own project, here’s the minimum starting set:

Small projects (single service, 50-100 files):

CLAUDE.md (rules + commands)
A single-page structure summary in architecture.md is sufficient

Medium projects (monorepo, 200-500 files):

CLAUDE.md + architecture.md (separate files)
mcp-code-search (semantic code search)
ADRs (for major decisions)

Large projects (500+ files, multiple services):

All four layers
architecture.md with module map, data flow, infrastructure, architectural decisions
dnomia-knowledge (documentation + code search)
/court + forge pipeline

In every case: CLAUDE.md gives instructions, architecture.md describes reality. This distinction is fundamental. If you want to start with a structured template rather than writing architecture.md from scratch, check out the Living Architecture template I derived from these experiences.

References

Open Source Tools

living-architecture: Project-agnostic architecture.md template
mcp-code-search: Semantic code search MCP server
dnomia-knowledge: Knowledge management MCP server
forge: Memory-backed decision-to-delivery pipeline

Academic and Industry Sources

Codified Context: Infrastructure for AI Agents in a Complex Codebase (arxiv.org/html/2602.20478v1)
AgenticAKM: Agentic Architecture Knowledge Management (arxiv.org/html/2602.04445v1)
AGENTS.md Standard (aihero.dev)
C4 Model (c4model.com)
Repomix (github.com/yamadashy/repomix)

Living Architecture: Project-agnostic architecture.md template (10 core sections, 11 optional modules, 3 depth levels)
Living Architecture Documentation for AI Agents: The research foundation of this ecosystem (Codified Context, AGENTS.md, C4 Model, Repomix, ADR, SDD comparison)
Claude Code Context Management: Context window optimization with MCP tools
Decision Gate v2: Detailed explanation of the /court skill
Forge Pipeline: Memory-backed development pipeline
Developer Interaction Tracking: dnomia-knowledge’s trace analytics system, hot files, knowledge gaps and interaction boost
AI-Powered Codebase Audit: How CLAUDE.md and context engineering are used in 6-track audit process
AI Agent Protocols Guide: Positioning MCP alongside 5 other open protocols

Footnotes

“Codified Context: Infrastructure for AI Agents in a Complex Codebase” (arxiv.org/html/2602.20478v1). A three-layer system tested in a 108,000-line C# project. ↩ ↩²
Context window performance degradation at higher fill ratios has been observed across multiple benchmarks. “Lost in the Middle” (Liu et al., 2023) was among the first studies to document this phenomenon. ↩

Key Takeaways

01 CLAUDE.md tells the agent what to do, architecture.md describes the project's actual structure. They serve different purposes and cannot replace each other.
02 Instead of loading all information into the context window, pulling only what's needed via JIT (just-in-time) semantic search is more efficient.
03 Structured evaluation before decisions (/court) and recording decisions as ADRs ensures the agent remains consistent in the future.
04 Without a learning loop, the ecosystem stagnates. Forge retro extracts permanent patterns from completed work and updates the rules, closing the loop.

Frequently Asked Questions (FAQ)

+ What is context engineering?

The discipline of maximizing the signal-to-noise ratio in an AI agent's context window. Giving the agent the right information, at the right time, in the right format. A systematic infrastructure approach beyond prompt engineering.

+ What is the difference between CLAUDE.md and architecture.md?

CLAUDE.md gives instructions to the agent: 'don't use barrel imports', 'use Promise.all for parallel calls'. architecture.md describes the project's reality: which module is where, how data flows, which decisions were made and why. One says 'how to work', the other says 'what you're working with'.

+ Are tools like Repomix unnecessary?

For tools that can read files like Claude Code, yes. Repomix packages a 992-file repo into 1.9M tokens, 10x the context window. Even the compressed version is 90K tokens. Structured architecture.md + JIT search provides the same information much more efficiently. Repomix is still useful for tools that can't read files (ChatGPT web, web Claude) to provide repo context.

+ How much effort does building this ecosystem require?

Writing architecture.md from scratch takes a few sessions (in my case ~3,600 lines). But the agent itself helps: scanning and summarizing ADRs, analyzing the codebase to produce module maps. After the initial investment, maintenance cost is low because only the relevant section gets updated with each major change.

+ Is this much structure necessary for a solo developer?

It's even more necessary for a solo developer. If you had a team, someone could tell you 'we made that decision for this reason'. Working solo, the AI agent is your teammate, but it starts from scratch every session. This ecosystem prevents the agent from rediscovering the project every time.

developer-tools ai