Skip to content
ceaksan

Living Architecture Documentation for AI Coding Agents: Research, Approaches, and Tools

Codified Context, AGENTS.md ecosystem, C4 Model, Repomix, ADRs, Spec-Driven Development, Google Code Wiki. A comparative analysis of 11 different approaches to giving AI agents architectural context, backed by research.

Mar 6, 2026 10 min read
TL;DR

AI coding agents make incorrect inferences by superficially scanning code in large projects. To solve this, approaches like Codified Context (three-layer memory), AGENTS.md ecosystem, C4 Model + Mermaid.js, Repomix + Tree-sitter, ADRs, Spec-Driven Development, Google Code Wiki, and progressive disclosure exist. I examine these findings, compiled from 2025-2026 research (academic papers, Gemini Deep Research, industry practices), in a comparative analysis. Consensus: 'what' is auto-generated, 'why' is human-written. This research formed the foundation of the [Living Architecture](/en/living-architecture-ai-architectural-documentation) template.

Context

AI coding agents (Claude Code, Gemini CLI, Cursor, GitHub Copilot) make incorrect inferences by superficially scanning code in large projects. They confuse similarly named functions across different modules, treat unused legacy code as active, and guess infrastructure details.

2025-2026 research shows that models perform significantly better when fed structured, persistent reference points compared to repo scanning1. This gave rise to the “context engineering” discipline: maximizing the signal-to-noise ratio in the agent’s context window.

In this post, I examine 11 different approaches compiled from parallel investigations with GPT-4o, Kimi K2.5, and Gemini Deep Research, academic papers, and industry practices. This research formed the foundation of the Living Architecture template.

1. Codified Context: Documentation Is Infrastructure

A three-layer system tested on a 108,000-line C# project1 was one of the most concrete findings I came across in this research:

LayerContentAgent Behavior
Hot MemoryConventions, retrieval hooks, orchestration protocolsLoaded automatically every session
Domain Expert AgentsProject-specific specialist agents (database, frontend, API)Activated when relevant domain is queried
Cold Memory34 on-demand specification documentsOnly the relevant document is pulled

In practice, Hot Memory corresponds to the CLAUDE.md or AGENTS.md file. Cold Memory corresponds to architecture.md, ADRs, spec documents. The agent pulls only what’s relevant, not loading all documents into context.

Critical finding: documentation is infrastructure. A “load-bearing artifact” that requires continuous maintenance, like code, and is essential for agents to produce correct output1.

2. AGENTS.md Ecosystem

AGENTS.md2, standardized by the Agentic AI Foundation in 2024-2025, serves as a “README for machines.” Different files exist for different AI tools, but they’re all complementary:

FileToolFeatureStrategy
AGENTS.mdUniversalMulti-tool compatibilityRoot-level global rules
CLAUDE.mdClaude Code@imports, recursive referencingHierarchical, progressive disclosure
.cursorrulesCursorYAML frontmatterPath-specific rule activation
.github/copilot-instructions.mdGitHub CopilotVS Code integrationProject-level customization3
SPEC.mdGeneral SDDLiving contractDefines “what” and “why”
runtime.mdAutonomous agentsCheckpoint trackingLive progress and risk logging

“Nearest-Wins” model: In a monorepo, root-level AGENTS.md provides global standards (commit format, security gates), while nested AGENTS.md files provide local stack guidance. Claude Code’s CLAUDE.md hierarchy follows the same model.

CodeAI.md4 offers a similar approach: placing a single file at the repo root to define architecture, naming conventions, integration patterns, and anti-patterns. With the slogan “Stop re-explaining your architecture in every prompt.”

3. Six Essential Content Areas

Six essential areas derived from analysis of over 2,500 successful agent config files2:

  1. Commands: Exact executable strings (install, build, test, lint). Write at the top of the file, this is the first information the agent will access
  2. Testing: Frameworks, coverage expectations, mocking patterns
  3. Project Structure: High-level directory tree map
  4. Code Style: Not explanations but “Show, Don’t Tell” code snippets
  5. Git Workflow: Branch naming, commit format, PR checklist
  6. Boundaries: Do/Don’t lists, operational limits

These six areas are the scope of CLAUDE.md or AGENTS.md. The project’s architectural structure, inter-module relationships, data flow, infrastructure topology, and tech debt don’t fit into these areas. Architectural information should live in a separate file (architecture.md).

4. C4 Model + AI Alignment

The C4 model5 (Context, Containers, Components, Code) offers an ideal hierarchy for AI agents. When implemented as text-based diagrams with Mermaid.js:

  • Version-controllable (visible in Git diffs)
  • Token-efficient (text instead of images)
  • Readable and editable by agents
  • Natively rendered in GitHub, GitLab, and VS Code
C4 LevelAgent InformationUsage
Context (C1)External systems, actors, global dependenciesProject’s relationship with the world
Container (C2)DBs, microservices, frontends, queuesService topology
Component (C3)Module boundaries, service layersInternal structure
DynamicSequence flows for complex business logicFlow detail

The C4X tool supports AI_Agent, Memory, and Tool nodes in Mermaid syntax6. The agent can “see” its own role in the architecture.

There’s also academic progress: Szczepanik and Chudziak’s work7 investigates automated generation of C4 diagrams using multi-agent LLM systems. It measures the quality of architectural diagrams by combining structural validation and semantic evaluation methods.

5. Automatic Context Compression (Repomix + Tree-sitter)

Repomix8 packs codebases into a single AI-friendly file. It uses Tree-sitter to extract only core code signatures and structures:

Compression LevelInformation RetainedToken Savings
Raw packingEntire file contents0%
Signature extractionClass/function names, types60-70%
Structural mapDirectory tree, file purposes90%+

Results from testing on a real 992-file project:

ModeFile CountTokensAssessment
Full (entire repo)9921.9M10x the context window
Compressed (entire repo)9921.25MStill unusable
Compressed (TS/TSX only)48990KTheoretically fits but takes half the context

Repomix’s directory tree and git-change-count ranking are useful for hotspot analysis. However, for tools that can read files like Claude Code, Gemini CLI, or Cursor, structured architecture.md + JIT semantic search provides more targeted information than Repomix’s flat dump.

Repomix’s real value is in tools that can’t read files (ChatGPT web interface, web Claude): being able to provide the entire repo context as a single file.

6. Hybrid Approach: Consensus

All research sources (GPT-4o analysis, Kimi K2.5 investigation, Gemini Deep Research, academic papers) converge on the same point:

LayerContentMethod
Auto-generateSchema, types, dependency graph, API signaturesUpdate via Repomix/CI hook when code changes
Human-writeDesign decisions, constraints, trade-offs, “why”ADR format, “Kernel of Truth” workflow
Staleness detectDocument freshness trackingGit hook or CI check

“What” is auto-generated, “why” is human-written. Together they give the agent the full picture.

The practical application of this consensus: in architecture.md, sections like Stack & Dependencies and Module Map can be partially auto-generated (from package.json, directory scanning). Sections like Constraints & Trade-offs and Known Tech Debt should be human-written. Staleness detection can be achieved through PR checks or daily review integration.

7. ADRs: From Passive Log to Active Governance

Architecture Decision Records9 are no longer just records of past decisions but executable constraints for agents:

  • Architectural boundaries: Which module can depend on which, which service accesses which database
  • Data handling constraints: PII logging requirements, encryption standards
  • Error propagation patterns: Prevents the agent from producing creative but silent error modes
  • Rejected alternatives: “We evaluated this approach before and rejected it for this reason”

“Kernel of Truth” Workflow: The developer writes 1 sentence (“Use Redis for rate limiting to handle distributed spikes”), the agent expands it into a structured ADR. Minimal writing burden for the solo developer, permanent decision record.

Title: Use Inngest instead of Redis
Status: Accepted
Context: Queue system needed for background jobs
Decision: Inngest has less operational overhead than Redis in serverless environments
Consequences: Vendor lock-in risk exists but reduced management cost offsets it

My ADR and OpenSpec post covers ADRs in the agent context in more depth, and the Decision Gate post examines structured evaluation in detail.

8. Spec-Driven Development (SDD)

As Addy Osmani also emphasizes10: “You start with a plan. Before prompting anything, you write a design doc or spec.” Treating documentation not as an afterthought but as the starting point of every task:

  1. Specify: Define goals, the agent produces SPEC.md (user journeys, success metrics)
  2. Plan: Set constraints, the agent produces a technical plan (multiple variants)
  3. Tasks: The agent breaks the plan into small, reviewable chunks
  4. Implement: The agent solves chunks one by one, the developer does focused review

SDD is the fundamental distinction between “agentic engineering”11 and “vibe coding”. When you tell the agent what you want in a structured way, output quality improves significantly.

9. Progressive Disclosure and JIT Indexing

Even a 200K, or 1M token context window degrades in performance when overloaded with information12. The solution: revealing information only when needed, rather than loading everything upfront.

TechniqueMechanismBenefit
Jump-to pointersReference files by path, not contentToken savings
Executable searchrg/grep commands in AGENTS.mdJust-in-time discovery
Nested overridesPer-module local AGENTS.mdOnly relevant rules loaded
Sampling rulesRead entry points firstHuman-like “mental mapping”

VS Code’s Copilot customization3 works on the same principle: .github/copilot-instructions.md for project-level customization, .instructions.md files for path-specific rules. Instead of putting all information in one place, a layered structure that activates when needed.

Claude Code’s CLAUDE.md hierarchy is the most advanced example of this model: root-level global rules, directory-level overrides, pulling other files on demand with @imports.

10. Google Code Wiki

Google’s Code Wiki system announced in 202513:

  • Creates a structured wiki for each repository
  • Automatically updates after every commit
  • Interactive, AI-powered documentation

Setting up full automation as a solo developer is challenging, but the concept is sound: documentation should be code-dependent. It should update with every change, not be left to manual maintenance.

A more accessible implementation of this vision: integrating architecture.md into the CI/CD pipeline. Mapping changed files to architecture.md sections via PR checks. While it can’t achieve the full automation of Google Code Wiki, it’s possible to establish the same feedback loop through staleness detection.

11. Markdown Optimization for LLMs

The format of documents provided to agents also affects performance. Considerations when writing Markdown for LLMs:

RuleWhy
Language tags required in fenced code blocksLLM parses as a single token unit
Never skip heading levels (H1 to H3)Disrupts the attention mechanism
Plain-text alternative/summary for imagesNon-multimodal models can’t read images
RFC 2119 constraints (MUST, SHOULD, MAY)Clarifies the level of certainty
Action-oriented verbs: “ask”, “search”, “check”Makes it easier for the agent to interpret as executable instructions

These rules also apply when writing CLAUDE.md, AGENTS.md, or architecture.md. A structured, consistent format improves the quality of the agent’s information parsing and usage.

Approach Comparison

ApproachStrengthWeaknessWhen to Use
Codified ContextAcademically grounded, three-layer modelSpecific to large projectsEnterprise/large projects
AGENTS.md/CLAUDE.mdTool integration, hierarchicalArchitectural info out of scopeEvery project, for rules
C4 + Mermaid.jsVisual, hierarchical, VCS-compatibleMaintenance burdenComplex architectures
RepomixSingle command, full repoToken explosionTools that can’t read files
ADRsDecision record, agent constraintAccumulates, requires maintenanceEvery major decision
SDDStructured from the startHeavy for small tasksNew features/architectural changes
Google Code WikiFull automationNot publicly available yet(Future potential)
Progressive DisclosureToken efficient, scalableSetup complexityLarge projects, monorepo
architecture.mdSingle file, structured, scalableInitial writing effortEvery project

Application: Living Architecture

I converted these research findings into a project-agnostic template: Living Architecture.

The template is a practical synthesis of the approaches above:

  • From Codified Context: Layered memory model (Hot Memory = CLAUDE.md, Cold Memory = architecture.md)
  • From AGENTS.md ecosystem: Structured format, heading hierarchy
  • From C4 Model: Hierarchical sectioning (Stack, Module Map, Data Flow)
  • From Hybrid approach: Auto-generatable sections + human-written sections
  • From Progressive disclosure: Per-section depth (L1/L2/L3), agent jumps to the section it needs
  • From ADRs: Constraints & Trade-offs section, decision references

10 core sections, 11 optional modules, 3 depth levels. Scales from small static sites to large monorepos. Available as open source on GitHub14.

Sources

Open Source Tools

Footnotes

  1. Vasilopoulos, A. “Codified Context: Infrastructure for AI Agents in a Complex Codebase” (2026). A study testing a three-layer documentation system on a 108,000-line C# project. arxiv.org/abs/2602.20478 2 3
  2. “A Complete Guide to AGENTS.md” (aihero.dev). Content areas and best practices standardized by the Agentic AI Foundation, based on analysis of 2,500+ agent config files. aihero.dev/a-complete-guide-to-agents-md 2
  3. VS Code Copilot Customization. Instruction files and context engineering guide for project-level AI customization. code.visualstudio.com/docs/copilot/copilot-customization 2
  4. CodeAI.md. A documentation framework that defines architecture, naming conventions, and integration patterns by placing a single file at the repo root. codeai.md
  5. Brown, S. “The C4 Model for Visualising Software Architecture”. Software architecture visualization with four hierarchical abstraction levels (software systems, containers, components, code). c4model.com
  6. Szczepanik, K. & Chudziak, J.A. “Collaborative LLM Agents for C4 Software Architecture Design Automation” (2025). A study investigating automated generation of C4 diagrams using multi-agent LLM systems. Accepted at HICSS-59. arxiv.org/abs/2510.22787
  7. Szczepanik & Chudziak’s work6 also references the C4X tool. C4X supports AI_Agent, Memory, and Tool nodes in Mermaid syntax.
  8. Repomix. A tool that packs codebases into a single AI-friendly file, extracting function signatures via Tree-sitter. 22.6k GitHub stars. github.com/yamadashy/repomix
  9. Henderson, J.P. “Architecture Decision Record”. ADR templates, practical examples, and team implementation guide. github.com/joelparkerhenderson/architecture-decision-record
  10. Osmani, A. “How to write a good spec for AI agents” (2026). Core principles of writing structured specs for agents. addyosmani.com/blog/good-spec
  11. Osmani, A. “Agentic Engineering” (2026). A post defining the distinction between “vibe coding” and disciplined AI-assisted development. addyosmani.com/blog/agentic-engineering
  12. Liu, N. F. et al. “Lost in the Middle: How Language Models Use Long Contexts” (2023). A study documenting how model performance degrades as context window occupancy increases.
  13. Krzaczyński, R. “Google Launches Code Wiki, an AI-Driven System for Continuous, Interactive Code Documentation” (InfoQ, 2025). Google’s system that creates a structured wiki for each repository and automatically updates after every commit. infoq.com/news/2025/11/google-code-wiki
  14. Living Architecture GitHub repository. Project-agnostic architecture.md template: 10 core sections, 11 optional modules, 3 depth levels. github.com/ceaksan/living-architecture 2
Key Takeaways
  • 01 The Codified Context approach defines documentation as infrastructure: a structure requiring continuous maintenance, like code, essential for agents to produce correct output.
  • 02 AGENTS.md, CLAUDE.md, .cursorrules target different tools but all solve the same problem. The 'Nearest-Wins' model layers global and local rules in monorepos.
  • 03 Repomix packs a 992-file repo into 1.9M tokens. For agents that can read files, structured architecture.md + JIT search is more efficient.
  • 04 All research sources reach the same consensus: 'what' (schema, types, dependency graph) is auto-generated, 'why' (design decisions, constraints) is human-written.
Frequently Asked Questions (FAQ)
+ What is context engineering?

The discipline of maximizing the signal-to-noise ratio in an AI agent's context window. Giving the agent the right information, at the right time, in the right format. A systematic infrastructure approach beyond prompt engineering.

+ What is the difference between AGENTS.md and CLAUDE.md?

AGENTS.md is a tool-agnostic format standardized by the Agentic AI Foundation. CLAUDE.md is specific to Claude Code, supporting advanced features like @imports and recursive referencing. Both can be used in the same project: AGENTS.md for global rules, CLAUDE.md for Claude-specific instructions.

+ When is Repomix useful, when is it unnecessary?

Repomix is useful for giving repo context to tools that can't read files (ChatGPT web, web Claude). For tools that can read files like Claude Code or Cursor, structured architecture.md + JIT semantic search is more efficient. Repomix's directory tree and hotspot analysis are useful in all cases.

+ Why are ADRs important for agents?

ADRs are no longer just passive decision logs but executable constraints for agents. Architectural boundaries, data handling rules, and rejected alternatives prevent the agent from repeating the same debates. The 'Kernel of Truth' workflow keeps the writing burden minimal.

+ Is there a practical output from this research?

Yes. I converted these research findings into a project-agnostic template: Living Architecture. 10 core sections, 11 optional modules, 3 depth levels. Available as open source on GitHub.