The biggest productivity loss when working with AI coding agents is not writing code, it’s re-introducing the project every session. A CLAUDE.md file is a good start but it’s not enough. In this post, I explain the four-layer context engineering ecosystem I built through real project experience.
Problem: The Agent Starts From Scratch Every Session
When working on a large project with Claude Code, Cursor, or GitHub Copilot, you experience this cycle:
- Agent starts scanning code
- Makes wrong inferences due to surface-level scanning (confuses similar names in different modules)
- Context window starts filling with tool outputs
- You correct, agent applies the correction
- Session ends, context is lost
- Next session: go to 1
2025-2026 research1 shows that models perform significantly better when fed structured, persistent reference points compared to repo scanning. This gave rise to the “context engineering” discipline: maximizing the signal-to-noise ratio in the agent’s context window.
But a single CLAUDE.md file doesn’t solve this alone. You need to systematically manage what is where in the project, why it’s that way, and when to access that information.
Four-Layer Ecosystem
The structure I built through experimentation in a real SaaS project (event tracking platform, monorepo, 992 source files, 155 ADRs):
| Layer | What | How | When |
|---|---|---|---|
| 1. Static references | CLAUDE.md + architecture.md | Loaded automatically at session start | Always |
| 2. JIT search | mcp-code-search + dnomia-knowledge | Semantic search when agent needs it | On demand |
| 3. Decision governance | /court + ADRs | Before new features or architectural decisions | At decision time |
| 4. Learning loop | forge retro | After completing work | At work completion |
Each layer feeds the next. Static references are the agent’s starting point, JIT search is the deepening tool, decision governance ensures consistency, and the learning loop keeps the entire system current.
Layer 1: Static References
CLAUDE.md: Giving Instructions to the Agent
CLAUDE.md is the file the agent automatically reads at the start of every session. Its content is “what to do, how to do it” instructions:
- Performance rules (“don’t use barrel imports”, “use Promise.all for parallel calls”)
- Workflow rules (“enter plan mode”, “write spec”, “create ADR”)
- Deployment commands (fully executable strings)
- Boundaries (do/don’t lists)
CLAUDE.md’s job is to manage the agent. Not to describe the project’s structure.
architecture.md: Describing the Project’s Reality
Discovering this distinction took me several sessions. I tried putting project structure, module maps, data flow into CLAUDE.md. The file grew, readability dropped, and the agent started confusing which information was a rule versus a reference.
Solution: CLAUDE.md for agent instructions, architecture.md for project reference. They complement each other but live in separate files.
architecture.md contents:
| Section | What It Describes |
|---|---|
| Stack and Dependencies | Technology stack with exact version numbers |
| Monorepo Structure | Directory tree, file sizes, responsibilities |
| Module Map | Each module’s responsibility, dependencies, key files |
| Data Flow | How data flows through the system (edge to DB, DB to destination) |
| Data Model | Structured summary of the Prisma schema |
| Infrastructure | Platform, region, ports, proxy, tunnel information |
| Architectural Decisions | Major decisions and why they were made (with ADR references) |
| Performance Rules | Actual application state (applied/not applied) |
| Code Hotspots | Most frequently changed files based on git change frequency |
| Related Notes | Cross-references to vault and repo documentation |
In my project, this file reached ~3,600 lines. Sounds like a lot, but the agent doesn’t read the entire file every time. It jumps to the section it needs. The structure, thanks to heading hierarchy, makes this easy.
Repomix Experience: Why It Became Unnecessary
While writing architecture.md, I also tried Repomix. Repomix is a tool that packages the codebase into a single Markdown file, extracting function signatures with Tree-sitter.
Results:
| Mode | File Count | Tokens | Assessment |
|---|---|---|---|
| Full (entire repo) | 992 | 1.9M | 10x the context window |
| Compressed (entire repo) | 992 | 1.25M | Still unusable |
| Compressed (TS/TSX only) | 489 | 90K | Theoretically fits but half the context |
Repomix’s directory tree and git-change-count ranking were useful. I extracted hotspot files from there. But its main value is in tools that “can’t read files” (ChatGPT web interface, web Claude). Claude Code already reads files directly. Structured architecture.md + JIT search provides more targeted information than Repomix’s flat dump.
The Practical Impact of Separation
Before architecture.md, the agent’s investigation of the Inngest proxy worker structure took ~20 minutes (SSH attempts, API endpoint guessing, port scanning). After architecture.md, the agent directly reads the relevant section: container name, ports, tunnel routes, env vars. All in one place.
Similarly, answering “why did we switch to Neon?” used to require scanning 155 ADRs. Now architecture.md’s “Architectural Decisions” section has a summary with the ADR-127 reference, and the agent opens the ADR file if needed.
Layer 2: JIT Semantic Search
Static references are loaded every session, but not all information fits in static files. In a 992-file codebase, the answer to “where is consent checking done?” might be spread across 4-5 different files. Writing this into architecture.md would bloat the file. Having the agent grep for it every time consumes context window.
Solution: a semantic search layer where the agent can pull only relevant information on demand.
mcp-code-search: Semantic Search for Code
mcp-code-search is a semantic code search server that connects to Claude Code via MCP (Model Context Protocol).
How it works:
Directory scan -> Tree-sitter AST parse -> Chunk (function/class/method) -> Embed (jina-v2-base-code) -> LanceDB -> Hybrid search
When the agent says “find authentication middleware” or “rate limiting implementation”, unlike grep, it performs semantic matching. It finds the “rate-limiter.ts” file but can also surface files with unrelated names that use the “token bucket” pattern.
| Feature | Detail |
|---|---|
| Chunking | Tree-sitter AST (40+ languages) |
| Embedding | jina-embeddings-v2-base-code (768 dim, code-focused) |
| Storage | LanceDB (local, zero network) |
| Search | Hybrid: vector similarity + FTS, RRF merge (k=60) |
| Incremental indexing | Hash-based, only changed files |
Why grep isn’t enough: For “where is consent checking done?”, grep searches for the word consent and returns 47 results. mcp-code-search approaches the same question semantically and returns the 5-10 most relevant chunks. 10 snippets enter the context window instead of 47 files.
dnomia-knowledge: Semantic Search for Knowledge Base
dnomia-knowledge is a knowledge management MCP server that indexes Markdown, MDX, and code files.
How it differs from mcp-code-search:
| mcp-code-search | dnomia-knowledge | |
|---|---|---|
| Focus | Code files | Markdown + code + web content |
| Embedding | jina-v2-base-code (code-focused) | multilingual-e5-base (multilingual) |
| Storage | LanceDB | SQLite + FTS5 + sqlite-vec |
| Chunking | AST-based (function/class) | Heading-based (## and ###) |
| Extra features | Find similar code | Knowledge graph, web indexing |
They work together: the agent directs architectural questions to dnomia-knowledge and implementation questions to mcp-code-search. Both are connected via MCP, and the agent decides which is more appropriate. dnomia-knowledge also performs developer interaction tracking: tracking which files are read most, which searches return zero results, and applying interaction boost to personalize search results.
Progressive Disclosure: Revealing Information Gradually
Even a 1M token context window degrades in performance when filled with too much information2. That’s why “pull what’s needed” beats “load everything”.
In the ecosystem, this works as follows:
- Session start: CLAUDE.md + architecture.md loaded automatically (static, always needed)
- First question: Agent reads the relevant section from architecture.md (jump-to pointer)
- Deepening: Agent sends semantic query to mcp-code-search or dnomia-knowledge (JIT)
- Decision needed: Agent opens the ADR file (on-demand)
At each step, only the needed information enters the context. This is the opposite of Repomix’s “put everything in one file” approach.
Layer 3: Decision Governance
Knowing the codebase structure and being able to search isn’t enough. If you can’t answer “why are we using Inngest instead of Redis?”, the agent might one day want to add Redis. Or try to solve a problem using a method you previously evaluated and rejected.
/court: Structured Evaluation
/court is an evaluation skill that runs before new features or architectural decisions. It applies the Decision Gate framework’s 8 criteria and delivers a verdict of GO, DEFER, or KILL.
But /court’s real value for context engineering: every decision is recorded as an ADR. The answer to “why did we make this decision?” doesn’t get lost in session-based context. When the agent returns to the same topic in the future, it can read the previous evaluation and its rationale.
ADRs: Living Constraints
Architecture Decision Records are usually thought of as passive logs. But in the agent ecosystem, they’re active constraints:
- Architectural boundaries: “No direct DB queries from the collect worker, must go through Hyperdrive” (ADR-062)
- Data processing constraints: “PII cannot be logged in plaintext, AES-256-GCM + blind index” (ADR-137)
- Rejected alternatives: “Redis cache evaluated, rejected due to operational burden” (ADR-128)
The agent sees the summary in architecture.md’s “Architectural Decisions” section. If detail is needed, it opens the ADR file. This prevents the same discussion from recurring.
My project has 155 ADRs. Instead of writing each one into architecture.md, I summarized 12 major decisions in 4 categories and added the ADR index as a reference. The agent starts from the summary, deepens if needed.
”Kernel of Truth” Workflow
I didn’t write all 155 ADRs from scratch. Most were created with the “Kernel of Truth” pattern: I write one sentence (“We switched to Neon because there was a Docker port bypass security vulnerability”), the agent expands it into a structured ADR format. Writing effort is minimal, but the decision record is permanent.
Layer 4: Learning Loop
The first three layers provide information. The fourth layer keeps information current.
Forge Retro: Extracting Patterns from Completed Work
The last step of the Forge pipeline, /retro, extracts permanent patterns from completed features:
- Read the court decision (why did we GO?)
- Examine the implementation (what changed?)
- Check critique findings (what issues came up?)
- If a permanent pattern exists, add it to CLAUDE.md or architecture.md
- Clean up temporary information
Critical point: retro doesn’t grow the knowledge base, it prunes it. It says “this pattern repeated 3 times, it should be a rule” and adds it to CLAUDE.md. It says “this was a temporary workaround” and deletes it. Upsert logic, not append.
Closing the Loop
Session start: CLAUDE.md + architecture.md loaded
|
Working: deepening via JIT search
|
Decision: evaluation via /court -> ADR record
|
Work complete: pattern extraction via /retro
|
Update: CLAUDE.md / architecture.md updated
|
Next session: current references loaded
With each iteration, references become slightly more accurate, slightly more current. The agent does slightly less discovery and slightly more production each session.
Research Foundation
I didn’t invent this ecosystem from scratch. Findings compiled from 2025-2026 research (academic papers, Gemini Deep Research, GPT-4o analysis, Kimi K2.5 research) formed the foundation:
Codified Context approach1: A three-layer system tested in a 108,000-line C# project (Hot Memory, Domain Expert Agents, Cold Memory). Critical finding: documentation is infrastructure, it requires maintenance like code.
AGENTS.md ecosystem: Different config files for different AI tools (CLAUDE.md, AGENTS.md, .cursorrules) but all serving the same purpose: giving the agent structured context. “Nearest-Wins” model: root file provides global standards, subdirectory files provide local guidance.
Hybrid approach consensus: All sources converge on the same point: “what” is auto-generated (schema, types, dependency graph), “why” is human-written (design decisions, constraints, trade-offs). Together, they give the agent the full picture.
Progressive disclosure: Opening information only at the moment of need rather than loading it all at once. Jump-to pointers, executable search commands, nested overrides.
What Worked and What Didn’t in Practice
| Investment | Result |
|---|---|
| architecture.md (~3,600 lines) | Agent’s discovery time dropped significantly. Especially for infrastructure questions (ports, proxy, tunnel), direct reference instead of trial-and-error. |
| ADR index (155 decisions) | Recurring discussions ended. Being able to say “we evaluated this before” is very valuable. |
| mcp-code-search | More accurate than grep, especially for “find all places that do X” queries. |
| dnomia-knowledge | Very useful for vault notes and documentation search. Complementary when combined with code search. |
| /court | In a codebase audit, 6 out of 28 tasks were eliminated or deferred. Filters bad ideas early. |
| Repomix | Became unnecessary except for hotspot analysis. Claude Code can already read files. |
| Forge retro | Not enough data yet (new). Concept is correct, too early for impact measurement. |
Template
If you want to adapt this structure to your own project, here’s the minimum starting set:
Small projects (single service, 50-100 files):
- CLAUDE.md (rules + commands)
- A single-page structure summary in architecture.md is sufficient
Medium projects (monorepo, 200-500 files):
- CLAUDE.md + architecture.md (separate files)
- mcp-code-search (semantic code search)
- ADRs (for major decisions)
Large projects (500+ files, multiple services):
- All four layers
- architecture.md with module map, data flow, infrastructure, architectural decisions
- dnomia-knowledge (documentation + code search)
- /court + forge pipeline
In every case: CLAUDE.md gives instructions, architecture.md describes reality. This distinction is fundamental. If you want to start with a structured template rather than writing architecture.md from scratch, check out the Living Architecture template I derived from these experiences.
References
Open Source Tools
- living-architecture: Project-agnostic architecture.md template
- mcp-code-search: Semantic code search MCP server
- dnomia-knowledge: Knowledge management MCP server
- forge: Memory-backed decision-to-delivery pipeline
Academic and Industry Sources
- Codified Context: Infrastructure for AI Agents in a Complex Codebase (arxiv.org/html/2602.20478v1)
- AgenticAKM: Agentic Architecture Knowledge Management (arxiv.org/html/2602.04445v1)
- AGENTS.md Standard (aihero.dev)
- C4 Model (c4model.com)
- Repomix (github.com/yamadashy/repomix)
Related Posts
- Living Architecture: Project-agnostic architecture.md template (10 core sections, 11 optional modules, 3 depth levels)
- Living Architecture Documentation for AI Agents: The research foundation of this ecosystem (Codified Context, AGENTS.md, C4 Model, Repomix, ADR, SDD comparison)
- Claude Code Context Management: Context window optimization with MCP tools
- Decision Gate v2: Detailed explanation of the /court skill
- Forge Pipeline: Memory-backed development pipeline
- Developer Interaction Tracking: dnomia-knowledge’s trace analytics system, hot files, knowledge gaps and interaction boost
- AI-Powered Codebase Audit: How CLAUDE.md and context engineering are used in 6-track audit process
- AI Agent Protocols Guide: Positioning MCP alongside 5 other open protocols
Footnotes
- “Codified Context: Infrastructure for AI Agents in a Complex Codebase” (arxiv.org/html/2602.20478v1). A three-layer system tested in a 108,000-line C# project. ↩ ↩2
- Context window performance degradation at higher fill ratios has been observed across multiple benchmarks. “Lost in the Middle” (Liu et al., 2023) was among the first studies to document this phenomenon. ↩
- 01 CLAUDE.md tells the agent what to do, architecture.md describes the project's actual structure. They serve different purposes and cannot replace each other.
- 02 Instead of loading all information into the context window, pulling only what's needed via JIT (just-in-time) semantic search is more efficient.
- 03 Structured evaluation before decisions (/court) and recording decisions as ADRs ensures the agent remains consistent in the future.
- 04 Without a learning loop, the ecosystem stagnates. Forge retro extracts permanent patterns from completed work and updates the rules, closing the loop.
+ What is context engineering?
The discipline of maximizing the signal-to-noise ratio in an AI agent's context window. Giving the agent the right information, at the right time, in the right format. A systematic infrastructure approach beyond prompt engineering.
+ What is the difference between CLAUDE.md and architecture.md?
CLAUDE.md gives instructions to the agent: 'don't use barrel imports', 'use Promise.all for parallel calls'. architecture.md describes the project's reality: which module is where, how data flows, which decisions were made and why. One says 'how to work', the other says 'what you're working with'.
+ Are tools like Repomix unnecessary?
For tools that can read files like Claude Code, yes. Repomix packages a 992-file repo into 1.9M tokens, 10x the context window. Even the compressed version is 90K tokens. Structured architecture.md + JIT search provides the same information much more efficiently. Repomix is still useful for tools that can't read files (ChatGPT web, web Claude) to provide repo context.
+ How much effort does building this ecosystem require?
Writing architecture.md from scratch takes a few sessions (in my case ~3,600 lines). But the agent itself helps: scanning and summarizing ADRs, analyzing the codebase to produce module maps. After the initial investment, maintenance cost is low because only the relevant section gets updated with each major change.
+ Is this much structure necessary for a solo developer?
It's even more necessary for a solo developer. If you had a team, someone could tell you 'we made that decision for this reason'. Working solo, the AI agent is your teammate, but it starts from scratch every session. This ecosystem prevents the agent from rediscovering the project every time.