Context
AI coding agents (Claude Code, Gemini CLI, Cursor, GitHub Copilot) make incorrect inferences by superficially scanning code in large projects. They confuse similarly named functions across different modules, treat unused legacy code as active, and guess infrastructure details.
2025-2026 research shows that models perform significantly better when fed structured, persistent reference points compared to repo scanning1. This gave rise to the “context engineering” discipline: maximizing the signal-to-noise ratio in the agent’s context window.
In this post, I examine 11 different approaches compiled from parallel investigations with GPT-4o, Kimi K2.5, and Gemini Deep Research, academic papers, and industry practices. This research formed the foundation of the Living Architecture template.
1. Codified Context: Documentation Is Infrastructure
A three-layer system tested on a 108,000-line C# project1 was one of the most concrete findings I came across in this research:
| Layer | Content | Agent Behavior |
|---|---|---|
| Hot Memory | Conventions, retrieval hooks, orchestration protocols | Loaded automatically every session |
| Domain Expert Agents | Project-specific specialist agents (database, frontend, API) | Activated when relevant domain is queried |
| Cold Memory | 34 on-demand specification documents | Only the relevant document is pulled |
In practice, Hot Memory corresponds to the CLAUDE.md or AGENTS.md file. Cold Memory corresponds to architecture.md, ADRs, spec documents. The agent pulls only what’s relevant, not loading all documents into context.
Critical finding: documentation is infrastructure. A “load-bearing artifact” that requires continuous maintenance, like code, and is essential for agents to produce correct output1.
2. AGENTS.md Ecosystem
AGENTS.md2, standardized by the Agentic AI Foundation in 2024-2025, serves as a “README for machines.” Different files exist for different AI tools, but they’re all complementary:
| File | Tool | Feature | Strategy |
|---|---|---|---|
| AGENTS.md | Universal | Multi-tool compatibility | Root-level global rules |
| CLAUDE.md | Claude Code | @imports, recursive referencing | Hierarchical, progressive disclosure |
| .cursorrules | Cursor | YAML frontmatter | Path-specific rule activation |
| .github/copilot-instructions.md | GitHub Copilot | VS Code integration | Project-level customization3 |
| SPEC.md | General SDD | Living contract | Defines “what” and “why” |
| runtime.md | Autonomous agents | Checkpoint tracking | Live progress and risk logging |
“Nearest-Wins” model: In a monorepo, root-level AGENTS.md provides global standards (commit format, security gates), while nested AGENTS.md files provide local stack guidance. Claude Code’s CLAUDE.md hierarchy follows the same model.
CodeAI.md4 offers a similar approach: placing a single file at the repo root to define architecture, naming conventions, integration patterns, and anti-patterns. With the slogan “Stop re-explaining your architecture in every prompt.”
3. Six Essential Content Areas
Six essential areas derived from analysis of over 2,500 successful agent config files2:
- Commands: Exact executable strings (install, build, test, lint). Write at the top of the file, this is the first information the agent will access
- Testing: Frameworks, coverage expectations, mocking patterns
- Project Structure: High-level directory tree map
- Code Style: Not explanations but “Show, Don’t Tell” code snippets
- Git Workflow: Branch naming, commit format, PR checklist
- Boundaries: Do/Don’t lists, operational limits
These six areas are the scope of CLAUDE.md or AGENTS.md. The project’s architectural structure, inter-module relationships, data flow, infrastructure topology, and tech debt don’t fit into these areas. Architectural information should live in a separate file (architecture.md).
4. C4 Model + AI Alignment
The C4 model5 (Context, Containers, Components, Code) offers an ideal hierarchy for AI agents. When implemented as text-based diagrams with Mermaid.js:
- Version-controllable (visible in Git diffs)
- Token-efficient (text instead of images)
- Readable and editable by agents
- Natively rendered in GitHub, GitLab, and VS Code
| C4 Level | Agent Information | Usage |
|---|---|---|
| Context (C1) | External systems, actors, global dependencies | Project’s relationship with the world |
| Container (C2) | DBs, microservices, frontends, queues | Service topology |
| Component (C3) | Module boundaries, service layers | Internal structure |
| Dynamic | Sequence flows for complex business logic | Flow detail |
The C4X tool supports AI_Agent, Memory, and Tool nodes in Mermaid syntax6. The agent can “see” its own role in the architecture.
There’s also academic progress: Szczepanik and Chudziak’s work7 investigates automated generation of C4 diagrams using multi-agent LLM systems. It measures the quality of architectural diagrams by combining structural validation and semantic evaluation methods.
5. Automatic Context Compression (Repomix + Tree-sitter)
Repomix8 packs codebases into a single AI-friendly file. It uses Tree-sitter to extract only core code signatures and structures:
| Compression Level | Information Retained | Token Savings |
|---|---|---|
| Raw packing | Entire file contents | 0% |
| Signature extraction | Class/function names, types | 60-70% |
| Structural map | Directory tree, file purposes | 90%+ |
Results from testing on a real 992-file project:
| Mode | File Count | Tokens | Assessment |
|---|---|---|---|
| Full (entire repo) | 992 | 1.9M | 10x the context window |
| Compressed (entire repo) | 992 | 1.25M | Still unusable |
| Compressed (TS/TSX only) | 489 | 90K | Theoretically fits but takes half the context |
Repomix’s directory tree and git-change-count ranking are useful for hotspot analysis. However, for tools that can read files like Claude Code, Gemini CLI, or Cursor, structured architecture.md + JIT semantic search provides more targeted information than Repomix’s flat dump.
Repomix’s real value is in tools that can’t read files (ChatGPT web interface, web Claude): being able to provide the entire repo context as a single file.
6. Hybrid Approach: Consensus
All research sources (GPT-4o analysis, Kimi K2.5 investigation, Gemini Deep Research, academic papers) converge on the same point:
| Layer | Content | Method |
|---|---|---|
| Auto-generate | Schema, types, dependency graph, API signatures | Update via Repomix/CI hook when code changes |
| Human-write | Design decisions, constraints, trade-offs, “why” | ADR format, “Kernel of Truth” workflow |
| Staleness detect | Document freshness tracking | Git hook or CI check |
“What” is auto-generated, “why” is human-written. Together they give the agent the full picture.
The practical application of this consensus: in architecture.md, sections like Stack & Dependencies and Module Map can be partially auto-generated (from package.json, directory scanning). Sections like Constraints & Trade-offs and Known Tech Debt should be human-written. Staleness detection can be achieved through PR checks or daily review integration.
7. ADRs: From Passive Log to Active Governance
Architecture Decision Records9 are no longer just records of past decisions but executable constraints for agents:
- Architectural boundaries: Which module can depend on which, which service accesses which database
- Data handling constraints: PII logging requirements, encryption standards
- Error propagation patterns: Prevents the agent from producing creative but silent error modes
- Rejected alternatives: “We evaluated this approach before and rejected it for this reason”
“Kernel of Truth” Workflow: The developer writes 1 sentence (“Use Redis for rate limiting to handle distributed spikes”), the agent expands it into a structured ADR. Minimal writing burden for the solo developer, permanent decision record.
Title: Use Inngest instead of Redis
Status: Accepted
Context: Queue system needed for background jobs
Decision: Inngest has less operational overhead than Redis in serverless environments
Consequences: Vendor lock-in risk exists but reduced management cost offsets it
My ADR and OpenSpec post covers ADRs in the agent context in more depth, and the Decision Gate post examines structured evaluation in detail.
8. Spec-Driven Development (SDD)
As Addy Osmani also emphasizes10: “You start with a plan. Before prompting anything, you write a design doc or spec.” Treating documentation not as an afterthought but as the starting point of every task:
- Specify: Define goals, the agent produces SPEC.md (user journeys, success metrics)
- Plan: Set constraints, the agent produces a technical plan (multiple variants)
- Tasks: The agent breaks the plan into small, reviewable chunks
- Implement: The agent solves chunks one by one, the developer does focused review
SDD is the fundamental distinction between “agentic engineering”11 and “vibe coding”. When you tell the agent what you want in a structured way, output quality improves significantly.
9. Progressive Disclosure and JIT Indexing
Even a 200K, or 1M token context window degrades in performance when overloaded with information12. The solution: revealing information only when needed, rather than loading everything upfront.
| Technique | Mechanism | Benefit |
|---|---|---|
| Jump-to pointers | Reference files by path, not content | Token savings |
| Executable search | rg/grep commands in AGENTS.md | Just-in-time discovery |
| Nested overrides | Per-module local AGENTS.md | Only relevant rules loaded |
| Sampling rules | Read entry points first | Human-like “mental mapping” |
VS Code’s Copilot customization3 works on the same principle: .github/copilot-instructions.md for project-level customization, .instructions.md files for path-specific rules. Instead of putting all information in one place, a layered structure that activates when needed.
Claude Code’s CLAUDE.md hierarchy is the most advanced example of this model: root-level global rules, directory-level overrides, pulling other files on demand with @imports.
10. Google Code Wiki
Google’s Code Wiki system announced in 202513:
- Creates a structured wiki for each repository
- Automatically updates after every commit
- Interactive, AI-powered documentation
Setting up full automation as a solo developer is challenging, but the concept is sound: documentation should be code-dependent. It should update with every change, not be left to manual maintenance.
A more accessible implementation of this vision: integrating architecture.md into the CI/CD pipeline. Mapping changed files to architecture.md sections via PR checks. While it can’t achieve the full automation of Google Code Wiki, it’s possible to establish the same feedback loop through staleness detection.
11. Markdown Optimization for LLMs
The format of documents provided to agents also affects performance. Considerations when writing Markdown for LLMs:
| Rule | Why |
|---|---|
| Language tags required in fenced code blocks | LLM parses as a single token unit |
| Never skip heading levels (H1 to H3) | Disrupts the attention mechanism |
| Plain-text alternative/summary for images | Non-multimodal models can’t read images |
| RFC 2119 constraints (MUST, SHOULD, MAY) | Clarifies the level of certainty |
| Action-oriented verbs: “ask”, “search”, “check” | Makes it easier for the agent to interpret as executable instructions |
These rules also apply when writing CLAUDE.md, AGENTS.md, or architecture.md. A structured, consistent format improves the quality of the agent’s information parsing and usage.
Approach Comparison
| Approach | Strength | Weakness | When to Use |
|---|---|---|---|
| Codified Context | Academically grounded, three-layer model | Specific to large projects | Enterprise/large projects |
| AGENTS.md/CLAUDE.md | Tool integration, hierarchical | Architectural info out of scope | Every project, for rules |
| C4 + Mermaid.js | Visual, hierarchical, VCS-compatible | Maintenance burden | Complex architectures |
| Repomix | Single command, full repo | Token explosion | Tools that can’t read files |
| ADRs | Decision record, agent constraint | Accumulates, requires maintenance | Every major decision |
| SDD | Structured from the start | Heavy for small tasks | New features/architectural changes |
| Google Code Wiki | Full automation | Not publicly available yet | (Future potential) |
| Progressive Disclosure | Token efficient, scalable | Setup complexity | Large projects, monorepo |
| architecture.md | Single file, structured, scalable | Initial writing effort | Every project |
Application: Living Architecture
I converted these research findings into a project-agnostic template: Living Architecture.
The template is a practical synthesis of the approaches above:
- From Codified Context: Layered memory model (Hot Memory = CLAUDE.md, Cold Memory = architecture.md)
- From AGENTS.md ecosystem: Structured format, heading hierarchy
- From C4 Model: Hierarchical sectioning (Stack, Module Map, Data Flow)
- From Hybrid approach: Auto-generatable sections + human-written sections
- From Progressive disclosure: Per-section depth (L1/L2/L3), agent jumps to the section it needs
- From ADRs: Constraints & Trade-offs section, decision references
10 core sections, 11 optional modules, 3 depth levels. Scales from small static sites to large monorepos. Available as open source on GitHub14.
Sources
Open Source Tools
- Living Architecture: Project-agnostic architecture.md template14
- mcp-code-search: Semantic code search MCP server
- decision-gate: Multi-AI evaluation framework
- forge: Memory-backed decision-delivery pipeline
Related Posts
- Living Architecture: Structured Architecture Documentation for AI Coding Agents
- Context Engineering Ecosystem
- ADR and OpenSpec: Decision Management in the AI Era
- Decision Gate: Multi-AI Tribunal
- Claude Code Context Management
Footnotes
- Vasilopoulos, A. “Codified Context: Infrastructure for AI Agents in a Complex Codebase” (2026). A study testing a three-layer documentation system on a 108,000-line C# project. arxiv.org/abs/2602.20478 ↩ ↩2 ↩3
- “A Complete Guide to AGENTS.md” (aihero.dev). Content areas and best practices standardized by the Agentic AI Foundation, based on analysis of 2,500+ agent config files. aihero.dev/a-complete-guide-to-agents-md ↩ ↩2
- VS Code Copilot Customization. Instruction files and context engineering guide for project-level AI customization. code.visualstudio.com/docs/copilot/copilot-customization ↩ ↩2
- CodeAI.md. A documentation framework that defines architecture, naming conventions, and integration patterns by placing a single file at the repo root. codeai.md ↩
- Brown, S. “The C4 Model for Visualising Software Architecture”. Software architecture visualization with four hierarchical abstraction levels (software systems, containers, components, code). c4model.com ↩
- Szczepanik, K. & Chudziak, J.A. “Collaborative LLM Agents for C4 Software Architecture Design Automation” (2025). A study investigating automated generation of C4 diagrams using multi-agent LLM systems. Accepted at HICSS-59. arxiv.org/abs/2510.22787 ↩
- Szczepanik & Chudziak’s work6 also references the C4X tool. C4X supports AI_Agent, Memory, and Tool nodes in Mermaid syntax. ↩
- Repomix. A tool that packs codebases into a single AI-friendly file, extracting function signatures via Tree-sitter. 22.6k GitHub stars. github.com/yamadashy/repomix ↩
- Henderson, J.P. “Architecture Decision Record”. ADR templates, practical examples, and team implementation guide. github.com/joelparkerhenderson/architecture-decision-record ↩
- Osmani, A. “How to write a good spec for AI agents” (2026). Core principles of writing structured specs for agents. addyosmani.com/blog/good-spec ↩
- Osmani, A. “Agentic Engineering” (2026). A post defining the distinction between “vibe coding” and disciplined AI-assisted development. addyosmani.com/blog/agentic-engineering ↩
- Liu, N. F. et al. “Lost in the Middle: How Language Models Use Long Contexts” (2023). A study documenting how model performance degrades as context window occupancy increases. ↩
- Krzaczyński, R. “Google Launches Code Wiki, an AI-Driven System for Continuous, Interactive Code Documentation” (InfoQ, 2025). Google’s system that creates a structured wiki for each repository and automatically updates after every commit. infoq.com/news/2025/11/google-code-wiki ↩
- Living Architecture GitHub repository. Project-agnostic architecture.md template: 10 core sections, 11 optional modules, 3 depth levels. github.com/ceaksan/living-architecture ↩ ↩2
- 01 The Codified Context approach defines documentation as infrastructure: a structure requiring continuous maintenance, like code, essential for agents to produce correct output.
- 02 AGENTS.md, CLAUDE.md, .cursorrules target different tools but all solve the same problem. The 'Nearest-Wins' model layers global and local rules in monorepos.
- 03 Repomix packs a 992-file repo into 1.9M tokens. For agents that can read files, structured architecture.md + JIT search is more efficient.
- 04 All research sources reach the same consensus: 'what' (schema, types, dependency graph) is auto-generated, 'why' (design decisions, constraints) is human-written.
+ What is context engineering?
The discipline of maximizing the signal-to-noise ratio in an AI agent's context window. Giving the agent the right information, at the right time, in the right format. A systematic infrastructure approach beyond prompt engineering.
+ What is the difference between AGENTS.md and CLAUDE.md?
AGENTS.md is a tool-agnostic format standardized by the Agentic AI Foundation. CLAUDE.md is specific to Claude Code, supporting advanced features like @imports and recursive referencing. Both can be used in the same project: AGENTS.md for global rules, CLAUDE.md for Claude-specific instructions.
+ When is Repomix useful, when is it unnecessary?
Repomix is useful for giving repo context to tools that can't read files (ChatGPT web, web Claude). For tools that can read files like Claude Code or Cursor, structured architecture.md + JIT semantic search is more efficient. Repomix's directory tree and hotspot analysis are useful in all cases.
+ Why are ADRs important for agents?
ADRs are no longer just passive decision logs but executable constraints for agents. Architectural boundaries, data handling rules, and rejected alternatives prevent the agent from repeating the same debates. The 'Kernel of Truth' workflow keeps the writing burden minimal.
+ Is there a practical output from this research?
Yes. I converted these research findings into a project-agnostic template: Living Architecture. 10 core sections, 11 optional modules, 3 depth levels. Available as open source on GitHub.