Living Architecture Documentation for AI Coding Agents: Research, Approaches, and Tools

TL;DR

AI coding agents make incorrect inferences by superficially scanning code in large projects. To solve this, approaches like Codified Context (three-layer memory), AGENTS.md ecosystem, C4 Model + Mermaid.js, Repomix + Tree-sitter, ADRs, Spec-Driven Development, Google Code Wiki, and progressive disclosure exist. I examine these findings, compiled from 2025-2026 research (academic papers, Gemini Deep Research, industry practices), in a comparative analysis. Consensus: 'what' is auto-generated, 'why' is human-written. This research formed the foundation of the [Living Architecture](/en/living-architecture-ai-architectural-documentation) template.

Context

AI coding agents (Claude Code, Gemini CLI, Cursor, GitHub Copilot) make incorrect inferences by superficially scanning code in large projects. They confuse similarly named functions across different modules, treat unused legacy code as active, and guess infrastructure details.

2025-2026 research shows that models perform significantly better when fed structured, persistent reference points compared to repo scanning¹. This gave rise to the “context engineering” discipline: maximizing the signal-to-noise ratio in the agent’s context window.

In this post, I examine 11 different approaches compiled from parallel investigations with GPT-4o, Kimi K2.5, and Gemini Deep Research, academic papers, and industry practices. This research formed the foundation of the Living Architecture template.

1. Codified Context: Documentation Is Infrastructure

A three-layer system tested on a 108,000-line C# project¹ was one of the most concrete findings I came across in this research:

Layer	Content	Agent Behavior
Hot Memory	Conventions, retrieval hooks, orchestration protocols	Loaded automatically every session
Domain Expert Agents	Project-specific specialist agents (database, frontend, API)	Activated when relevant domain is queried
Cold Memory	34 on-demand specification documents	Only the relevant document is pulled

In practice, Hot Memory corresponds to the CLAUDE.md or AGENTS.md file. Cold Memory corresponds to architecture.md, ADRs, spec documents. The agent pulls only what’s relevant, not loading all documents into context.

Critical finding: documentation is infrastructure. A “load-bearing artifact” that requires continuous maintenance, like code, and is essential for agents to produce correct output¹.

2. AGENTS.md Ecosystem

AGENTS.md², standardized by the Agentic AI Foundation in 2024-2025, serves as a “README for machines.” Different files exist for different AI tools, but they’re all complementary:

File	Tool	Feature	Strategy
AGENTS.md	Universal	Multi-tool compatibility	Root-level global rules
CLAUDE.md	Claude Code	@imports, recursive referencing	Hierarchical, progressive disclosure
.cursorrules	Cursor	YAML frontmatter	Path-specific rule activation
.github/copilot-instructions.md	GitHub Copilot	VS Code integration	Project-level customization³
SPEC.md	General SDD	Living contract	Defines “what” and “why”
runtime.md	Autonomous agents	Checkpoint tracking	Live progress and risk logging

“Nearest-Wins” model: In a monorepo, root-level AGENTS.md provides global standards (commit format, security gates), while nested AGENTS.md files provide local stack guidance. Claude Code’s CLAUDE.md hierarchy follows the same model.

CodeAI.md⁴ offers a similar approach: placing a single file at the repo root to define architecture, naming conventions, integration patterns, and anti-patterns. With the slogan “Stop re-explaining your architecture in every prompt.”

3. Six Essential Content Areas

Six essential areas derived from analysis of over 2,500 successful agent config files²:

Commands: Exact executable strings (install, build, test, lint). Write at the top of the file, this is the first information the agent will access
Testing: Frameworks, coverage expectations, mocking patterns
Project Structure: High-level directory tree map
Code Style: Not explanations but “Show, Don’t Tell” code snippets
Git Workflow: Branch naming, commit format, PR checklist
Boundaries: Do/Don’t lists, operational limits

These six areas are the scope of CLAUDE.md or AGENTS.md. The project’s architectural structure, inter-module relationships, data flow, infrastructure topology, and tech debt don’t fit into these areas. Architectural information should live in a separate file (architecture.md).

4. C4 Model + AI Alignment

The C4 model⁵ (Context, Containers, Components, Code) offers an ideal hierarchy for AI agents. When implemented as text-based diagrams with Mermaid.js:

Version-controllable (visible in Git diffs)
Token-efficient (text instead of images)
Readable and editable by agents
Natively rendered in GitHub, GitLab, and VS Code

C4 Level	Agent Information	Usage
Context (C1)	External systems, actors, global dependencies	Project’s relationship with the world
Container (C2)	DBs, microservices, frontends, queues	Service topology
Component (C3)	Module boundaries, service layers	Internal structure
Dynamic	Sequence flows for complex business logic	Flow detail

The C4X tool supports AI_Agent, Memory, and Tool nodes in Mermaid syntax⁶. The agent can “see” its own role in the architecture.

There’s also academic progress: Szczepanik and Chudziak’s work⁷ investigates automated generation of C4 diagrams using multi-agent LLM systems. It measures the quality of architectural diagrams by combining structural validation and semantic evaluation methods.

5. Automatic Context Compression (Repomix + Tree-sitter)

Repomix⁸ packs codebases into a single AI-friendly file. It uses Tree-sitter to extract only core code signatures and structures:

Compression Level	Information Retained	Token Savings
Raw packing	Entire file contents	0%
Signature extraction	Class/function names, types	60-70%
Structural map	Directory tree, file purposes	90%+

Results from testing on a real 992-file project:

Mode	File Count	Tokens	Assessment
Full (entire repo)	992	1.9M	10x the context window
Compressed (entire repo)	992	1.25M	Still unusable
Compressed (TS/TSX only)	489	90K	Theoretically fits but takes half the context

Repomix’s directory tree and git-change-count ranking are useful for hotspot analysis. However, for tools that can read files like Claude Code, Gemini CLI, or Cursor, structured architecture.md + JIT semantic search provides more targeted information than Repomix’s flat dump.

Repomix’s real value is in tools that can’t read files (ChatGPT web interface, web Claude): being able to provide the entire repo context as a single file.

6. Hybrid Approach: Consensus

All research sources (GPT-4o analysis, Kimi K2.5 investigation, Gemini Deep Research, academic papers) converge on the same point:

Layer	Content	Method
Auto-generate	Schema, types, dependency graph, API signatures	Update via Repomix/CI hook when code changes
Human-write	Design decisions, constraints, trade-offs, “why”	ADR format, “Kernel of Truth” workflow
Staleness detect	Document freshness tracking	Git hook or CI check

“What” is auto-generated, “why” is human-written. Together they give the agent the full picture.

The practical application of this consensus: in architecture.md, sections like Stack & Dependencies and Module Map can be partially auto-generated (from package.json, directory scanning). Sections like Constraints & Trade-offs and Known Tech Debt should be human-written. Staleness detection can be achieved through PR checks or daily review integration.

7. ADRs: From Passive Log to Active Governance

Architecture Decision Records⁹ are no longer just records of past decisions but executable constraints for agents:

Architectural boundaries: Which module can depend on which, which service accesses which database
Data handling constraints: PII logging requirements, encryption standards
Error propagation patterns: Prevents the agent from producing creative but silent error modes
Rejected alternatives: “We evaluated this approach before and rejected it for this reason”

“Kernel of Truth” Workflow: The developer writes 1 sentence (“Use Redis for rate limiting to handle distributed spikes”), the agent expands it into a structured ADR. Minimal writing burden for the solo developer, permanent decision record.

Title: Use Inngest instead of Redis
Status: Accepted
Context: Queue system needed for background jobs
Decision: Inngest has less operational overhead than Redis in serverless environments
Consequences: Vendor lock-in risk exists but reduced management cost offsets it

My ADR and OpenSpec post covers ADRs in the agent context in more depth, and the Decision Gate post examines structured evaluation in detail.

8. Spec-Driven Development (SDD)

As Addy Osmani also emphasizes¹⁰: “You start with a plan. Before prompting anything, you write a design doc or spec.” Treating documentation not as an afterthought but as the starting point of every task:

Specify: Define goals, the agent produces SPEC.md (user journeys, success metrics)
Plan: Set constraints, the agent produces a technical plan (multiple variants)
Tasks: The agent breaks the plan into small, reviewable chunks
Implement: The agent solves chunks one by one, the developer does focused review

SDD is the fundamental distinction between “agentic engineering”¹¹ and “vibe coding”. When you tell the agent what you want in a structured way, output quality improves significantly.

9. Progressive Disclosure and JIT Indexing

Even a 200K, or 1M token context window degrades in performance when overloaded with information¹². The solution: revealing information only when needed, rather than loading everything upfront.

Technique	Mechanism	Benefit
Jump-to pointers	Reference files by path, not content	Token savings
Executable search	rg/grep commands in AGENTS.md	Just-in-time discovery
Nested overrides	Per-module local AGENTS.md	Only relevant rules loaded
Sampling rules	Read entry points first	Human-like “mental mapping”

VS Code’s Copilot customization³ works on the same principle: .github/copilot-instructions.md for project-level customization, .instructions.md files for path-specific rules. Instead of putting all information in one place, a layered structure that activates when needed.

Claude Code’s CLAUDE.md hierarchy is the most advanced example of this model: root-level global rules, directory-level overrides, pulling other files on demand with @imports.

10. Google Code Wiki

Google’s Code Wiki system announced in 2025¹³:

Creates a structured wiki for each repository
Automatically updates after every commit
Interactive, AI-powered documentation

Setting up full automation as a solo developer is challenging, but the concept is sound: documentation should be code-dependent. It should update with every change, not be left to manual maintenance.

A more accessible implementation of this vision: integrating architecture.md into the CI/CD pipeline. Mapping changed files to architecture.md sections via PR checks. While it can’t achieve the full automation of Google Code Wiki, it’s possible to establish the same feedback loop through staleness detection.

11. Markdown Optimization for LLMs

The format of documents provided to agents also affects performance. Considerations when writing Markdown for LLMs:

Rule	Why
Language tags required in fenced code blocks	LLM parses as a single token unit
Never skip heading levels (H1 to H3)	Disrupts the attention mechanism
Plain-text alternative/summary for images	Non-multimodal models can’t read images
RFC 2119 constraints (MUST, SHOULD, MAY)	Clarifies the level of certainty
Action-oriented verbs: “ask”, “search”, “check”	Makes it easier for the agent to interpret as executable instructions

These rules also apply when writing CLAUDE.md, AGENTS.md, or architecture.md. A structured, consistent format improves the quality of the agent’s information parsing and usage.

Approach Comparison

Approach	Strength	Weakness	When to Use
Codified Context	Academically grounded, three-layer model	Specific to large projects	Enterprise/large projects
AGENTS.md/CLAUDE.md	Tool integration, hierarchical	Architectural info out of scope	Every project, for rules
C4 + Mermaid.js	Visual, hierarchical, VCS-compatible	Maintenance burden	Complex architectures
Repomix	Single command, full repo	Token explosion	Tools that can’t read files
ADRs	Decision record, agent constraint	Accumulates, requires maintenance	Every major decision
SDD	Structured from the start	Heavy for small tasks	New features/architectural changes
Google Code Wiki	Full automation	Not publicly available yet	(Future potential)
Progressive Disclosure	Token efficient, scalable	Setup complexity	Large projects, monorepo
architecture.md	Single file, structured, scalable	Initial writing effort	Every project

Application: Living Architecture

I converted these research findings into a project-agnostic template: Living Architecture.

The template is a practical synthesis of the approaches above:

From Codified Context: Layered memory model (Hot Memory = CLAUDE.md, Cold Memory = architecture.md)
From AGENTS.md ecosystem: Structured format, heading hierarchy
From C4 Model: Hierarchical sectioning (Stack, Module Map, Data Flow)
From Hybrid approach: Auto-generatable sections + human-written sections
From Progressive disclosure: Per-section depth (L1/L2/L3), agent jumps to the section it needs
From ADRs: Constraints & Trade-offs section, decision references

10 core sections, 11 optional modules, 3 depth levels. Scales from small static sites to large monorepos. Available as open source on GitHub¹⁴.

Sources

Open Source Tools

Living Architecture: Project-agnostic architecture.md template¹⁴
mcp-code-search: Semantic code search MCP server
decision-gate: Multi-AI evaluation framework
forge: Memory-backed decision-delivery pipeline

Footnotes

Vasilopoulos, A. “Codified Context: Infrastructure for AI Agents in a Complex Codebase” (2026). A study testing a three-layer documentation system on a 108,000-line C# project. arxiv.org/abs/2602.20478 ↩ ↩² ↩³
“A Complete Guide to AGENTS.md” (aihero.dev). Content areas and best practices standardized by the Agentic AI Foundation, based on analysis of 2,500+ agent config files. aihero.dev/a-complete-guide-to-agents-md ↩ ↩²
VS Code Copilot Customization. Instruction files and context engineering guide for project-level AI customization. code.visualstudio.com/docs/copilot/copilot-customization ↩ ↩²
CodeAI.md. A documentation framework that defines architecture, naming conventions, and integration patterns by placing a single file at the repo root. codeai.md ↩
Brown, S. “The C4 Model for Visualising Software Architecture”. Software architecture visualization with four hierarchical abstraction levels (software systems, containers, components, code). c4model.com ↩
Szczepanik, K. & Chudziak, J.A. “Collaborative LLM Agents for C4 Software Architecture Design Automation” (2025). A study investigating automated generation of C4 diagrams using multi-agent LLM systems. Accepted at HICSS-59. arxiv.org/abs/2510.22787 ↩
Szczepanik & Chudziak’s work⁶ also references the C4X tool. C4X supports AI_Agent, Memory, and Tool nodes in Mermaid syntax. ↩
Repomix. A tool that packs codebases into a single AI-friendly file, extracting function signatures via Tree-sitter. 22.6k GitHub stars. github.com/yamadashy/repomix ↩
Henderson, J.P. “Architecture Decision Record”. ADR templates, practical examples, and team implementation guide. github.com/joelparkerhenderson/architecture-decision-record ↩
Osmani, A. “How to write a good spec for AI agents” (2026). Core principles of writing structured specs for agents. addyosmani.com/blog/good-spec ↩
Osmani, A. “Agentic Engineering” (2026). A post defining the distinction between “vibe coding” and disciplined AI-assisted development. addyosmani.com/blog/agentic-engineering ↩
Liu, N. F. et al. “Lost in the Middle: How Language Models Use Long Contexts” (2023). A study documenting how model performance degrades as context window occupancy increases. ↩
Krzaczyński, R. “Google Launches Code Wiki, an AI-Driven System for Continuous, Interactive Code Documentation” (InfoQ, 2025). Google’s system that creates a structured wiki for each repository and automatically updates after every commit. infoq.com/news/2025/11/google-code-wiki ↩
Living Architecture GitHub repository. Project-agnostic architecture.md template: 10 core sections, 11 optional modules, 3 depth levels. github.com/ceaksan/living-architecture ↩ ↩²

Key Takeaways

01 The Codified Context approach defines documentation as infrastructure: a structure requiring continuous maintenance, like code, essential for agents to produce correct output.
02 AGENTS.md, CLAUDE.md, .cursorrules target different tools but all solve the same problem. The 'Nearest-Wins' model layers global and local rules in monorepos.
03 Repomix packs a 992-file repo into 1.9M tokens. For agents that can read files, structured architecture.md + JIT search is more efficient.
04 All research sources reach the same consensus: 'what' (schema, types, dependency graph) is auto-generated, 'why' (design decisions, constraints) is human-written.

Frequently Asked Questions (FAQ)

+ What is context engineering?

The discipline of maximizing the signal-to-noise ratio in an AI agent's context window. Giving the agent the right information, at the right time, in the right format. A systematic infrastructure approach beyond prompt engineering.

+ What is the difference between AGENTS.md and CLAUDE.md?

AGENTS.md is a tool-agnostic format standardized by the Agentic AI Foundation. CLAUDE.md is specific to Claude Code, supporting advanced features like @imports and recursive referencing. Both can be used in the same project: AGENTS.md for global rules, CLAUDE.md for Claude-specific instructions.

+ When is Repomix useful, when is it unnecessary?

Repomix is useful for giving repo context to tools that can't read files (ChatGPT web, web Claude). For tools that can read files like Claude Code or Cursor, structured architecture.md + JIT semantic search is more efficient. Repomix's directory tree and hotspot analysis are useful in all cases.

+ Why are ADRs important for agents?

ADRs are no longer just passive decision logs but executable constraints for agents. Architectural boundaries, data handling rules, and rejected alternatives prevent the agent from repeating the same debates. The 'Kernel of Truth' workflow keeps the writing burden minimal.

+ Is there a practical output from this research?

Yes. I converted these research findings into a project-agnostic template: Living Architecture. 10 core sections, 11 optional modules, 3 depth levels. Available as open source on GitHub.

developer-tools ai