Which Files Do You Actually Read? Developer Interaction Tracking

TL;DR

I automatically log every file read and edit using Claude Code's PostToolUse hook. The accumulated data is interpreted through four different analyses: hot files (most accessed), knowledge gaps (zero-result searches), decay (declining interest), and query patterns (search patterns). Even a week's worth of data reveals the project's true center of gravity.

I edited store.py 66 times this week and read the project’s README.md 79 times; I didn’t know that. People can’t accurately estimate their own work patterns. Without data or specific tracking, it’s hard to know which files are the project’s true center of gravity.

The Problem: A Project’s Real Map Isn’t in LOC

Which files in a project are the most important? If you look at line count, large files stand out. If you look at git churn, the most-committed files rise. But none of these measure “what you actually look at most as a developer.”

The true center of gravity is hidden in your behavior:

Which file do you open every day?
Which file do you edit over and over?
Which of your searches return empty?
Which files have you stopped looking at?

Answering these questions requires interaction data. Of course, if you’re working on a small project, your mind will inevitably pick up some breadcrumbs, making it easier to guess.

The Solution: Log Every Read and Edit

In the dnomia-knowledge project, I use Claude Code’s hook mechanism to automatically record every file interaction. This system is part of the learning loop layer in my context engineering ecosystem. The system has three components:

1. PostToolUse Hook

Whenever Claude Code calls the Read or Edit tool, the PostToolUse hook kicks in:

# hooks/post_tool_use.py (simplified)
interaction = InteractionType.READ if tool_name == "Read" else InteractionType.EDIT
batch = [
    (chunk_id, interaction, f"hook:{tool_name}", project_id, rel_path)
    for chunk_id in chunk_ids
]
store.batch_log_interactions(batch)

The hook identifies which project the file belongs to, retrieves the file’s indexed chunks, and creates an interaction record for each chunk. The entire process takes 10-30ms.

2. Search Logging

Every MCP search call is also logged:

# search.py
store.log_search(query, project_id, domain, chunk_ids, result_count)
store.batch_log_interactions(
    [(r.chunk_id, InteractionType.SEARCH_HIT, "search", r.project_id, r.file_path)
     for r in results]
)

Both the query itself (what was searched, how many results returned) and the chunks in the results are marked as search_hit.

3. SQLite Storage

Two tables:

chunk_interactions:
  chunk_id    | project_id | file_path       | interaction | source_tool | timestamp
  142         | my-project | src/store.py    | edit        | hook:Edit   | 2026-03-18 10:23
  89          | my-project | README.md       | read        | hook:Read   | 2026-03-18 10:25

search_log:
  query                          | project_id | result_count | timestamp
  authentication middleware      | my-project | 5            | 2026-03-18 10:30
  frontmatter title slug         | ceaksan    | 0            | 2026-03-17 09:41

All data lives in a local SQLite file. No cloud service, no API calls.

Four Analyses: Trace Analytics

The accumulated data is queried from four different angles:

trace hot: Most Accessed Files

dnomia-knowledge trace hot

                        Hot Files (last 30 days)
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━┳━━━━┳━━━━┳━━━━━━━┓
┃ #   ┃ File                              ┃  R ┃  E ┃  S ┃ Total ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━╇━━━━╇━━━━╇━━━━━━━┩
│ 1   │ dnomia-knowledge README.md        │ 79 │ 37 │  0 │   116 │
│ 2   │ dnomia-knowledge store.py         │  0 │ 66 │  0 │    66 │
│ 3   │ turkish-diacritics README.md      │ 36 │ 12 │  0 │    48 │
│ 4   │ dnomia-knowledge server.py        │ 44 │  0 │  0 │    44 │
│ 5   │ ceaksan post-normalize.ts         │ 18 │  9 │  0 │    27 │
└─────┴───────────────────────────────────┴────┴────┴────┴───────┘

This table comes from real data (13 indexed projects, 30-day window). A few observations:

README.md is the hottest file. 79 reads, 37 edits. This file is the project’s “face” and is constantly updated.

store.py is edit-only. 66 edits, zero reads. This makes sense: I already know store.py’s structure, so I edit it directly with each change without reading it for reference.

server.py is read-only. 44 reads, zero edits. The opposite: I read server.py as a reference but rarely change it.

Cross-project data. Different projects appear in the same table (dnomia-knowledge, turkish-diacritics, ceaksan). You can see the work intensity across all projects at a glance.

trace gaps: Zero-Result Searches

dnomia-knowledge trace gaps

                 Knowledge Gaps (last 30 days)
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ #   ┃ Query                                         ┃ Count ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ 1   │ date calendar pubDate display card            │     1 │
│ 2   │ frontmatter title slug                        │     1 │
│ 3   │ frontmatter başlık yapısı tüm bölümler        │     1 │
└─────┴───────────────────────────────────────────────┴───────┘

This analysis shows where search fails. Three different queries returned zero results.

“frontmatter title slug” returning zero results indicates that Astro components or layout files are not within the indexing scope. It means these file types are missing from the .knowledge.toml config.

“date calendar pubDate display card” is a similar gap. The sought information likely exists in an Astro component but hasn’t been indexed.

Knowledge gaps analysis provides concrete clues for expanding indexing coverage. Zero-result searches are your knowledge base’s blind spots.

trace decay: Declining Interest

dnomia-knowledge trace decay

Two time windows are compared: the last 30 days vs the previous 30 days. Files that were heavily accessed before but are no longer touched surface.

This analysis produces meaningful results when enough data has accumulated (at least 60 days). Possible scenarios:

Deprecated code: Modules that were once active but are no longer used
Knowledge staleness: An area you worked on intensively, but your knowledge is no longer current
Completed work: A feature is done, and you no longer return to those files (this is normal)

trace queries: Search Patterns

dnomia-knowledge trace queries

                     Top Queries (last 30 days)
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━┓
┃ #   ┃ Query                                 ┃ Count ┃ Avg Results ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━┩
│ 1   │ GTM Zaraz server-side tracking        │     1 │         5.0 │
│ 2   │ etkinlik takibinde farklı yaklaşımlar │     1 │        10.0 │
│ 3   │ return IndexResult                    │     1 │        10.0 │
└─────┴───────────────────────────────────────┴───────┴─────────────┘

Still early stage (each query only once). Over time, recurring queries will emerge. These point to information areas that are frequently needed but not quickly accessible. For instance, the “GTM Zaraz server-side tracking” query may have been generated while searching for the event data tracking approaches post.

Combining with Git Churn: Crossover Analysis

Interaction data alone is valuable. But when combined with git churn data, a stronger signal emerges.

dnomia-knowledge analyze crossover

Each file is evaluated on two axes:

Churn: How much it changes in git (insertions + deletions)
Read: How much it’s read (trace hot data)

Six signals are derived from these two dimensions:

Signal	Churn	Read	Meaning
HOT	High	High	The project’s heart. Changes a lot and is read a lot.
BLIND	High	Zero	Danger. Actively changing but never looked at.
TURBULENT	High	Low	Risky. Changes a lot, monitored little.
STABLE	Low	High	Reference code. Changes little, read a lot.
ZOMBIE	Zero	Low	Dead code candidate. Never changes, rarely looked at.
COLD	Low	Low	Inactive. Normal state.

The BLIND signal is the most critical. If a file is actively changing but never read, it means those changes aren’t being reviewed. Bug risk is high.

Interaction Data Improves Search Quality

This data isn’t collected just for analytics purposes. It directly affects search results.

Interaction Boost

Hybrid search (FTS5 + vector + RRF) results get interaction boost applied as a final step:

for r in results:
    count = interaction_counts.get(r.chunk_id, 0)
    bonus = 0.1 * min(count, 10) / 10
    r.score = r.score + bonus

Frequently read or edited files rank higher in search results. This creates a personal search engine: two developers working on the same codebase get different rankings for the same query.

Why It Works

When you search for “database connection”, there are probably multiple files in your project related to this concept. But store.py, the file you work on every day, is probably the one you’re looking for. Interaction boost knows this because you’ve edited store.py 66 times.

This is the single-user version of collaborative filtering. Like Netflix’s “based on what you’ve watched” recommendations, but for code files.

Setup

1. Install dnomia-knowledge

git clone https://github.com/ceaksan/dnomia-knowledge.git
cd dnomia-knowledge
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .

2. Index your project

dnomia-knowledge index /path/to/your/project

3. Register Claude Code hooks

Add to your ~/.claude/settings.json file:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Read|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/.venv/bin/python -m dnomia_knowledge.hooks.post_tool_use"
          }
        ]
      }
    ]
  }
}

4. Work and accumulate data

Work in your normal Claude Code sessions. The hook collects data in the background. After a few days:

dnomia-knowledge trace hot
dnomia-knowledge trace gaps

Observations and Takeaways

Even with just a week’s data, I learned the following:

The project’s true center of gravity differs from LOC. The largest file isn’t the most important file. The file with the most interactions is the most important file.

README’s importance is underestimated. The hottest file with 116 interactions. README isn’t just the project’s “face”; it’s an active working document.

Read and edit patterns provide different information. Edit-only means you know the file well and don’t need a reference. Read-only means you use it as a reference but don’t change it. Both together means an active work area.

Zero-result searches are your knowledge base’s blind spots. Every zero-result query points to an area that’s either not indexed or insufficiently documented.

Technical Details

Data size: Each interaction record is ~100 bytes. 200 interactions per day = ~600KB per month.
Hook duration: 10-30ms (DB lookup + insert). Doesn’t slow down Claude Code.
Window: trace hot/gaps/queries default to 30 days. trace decay compares 30 days vs previous 30 days.
Cross-project: All projects in a single DB. Filtering with --project flag is possible.
Hook safety: With a catch-all exception handler, the hook never crashes. If an error occurs, it logs to stderr without affecting the Claude Code session.

Source code: dnomia-knowledge

Key Takeaways

01 README.md turned out to be the hottest file with 116 interactions. Among code files, store.py (66 edits) was far ahead. You wouldn't find this ranking by measuring LOC.
02 Knowledge gaps analysis shows where the search engine fails. The query 'frontmatter title slug' returning zero results indicates gaps in indexing coverage.
03 Interaction data isn't just analytics; it also improves search quality. Frequently read files automatically rank higher in search results.
04 All data stays in local SQLite. No cloud service, no API calls, no privacy concerns.

Frequently Asked Questions (FAQ)

+ Does this system work outside Claude Code?

The hook mechanism is specific to Claude Code (PreToolUse/PostToolUse). But the Store and trace analytics modules are standalone Python. You can feed the same data through a different editor's plugin API.

+ How much space does interaction data take?

Each interaction record is approximately 100 bytes. If you do 200 file reads/edits per day, that's ~600KB per month. ~7MB per year. Negligible for SQLite.

+ Does the hook affect performance?

The PostToolUse hook runs after every Read/Edit. DB lookup and insert total 10-30ms. It doesn't noticeably slow down Claude Code.

+ What is decay analysis for?

It compares interactions from the last 30 days with the previous 30 days. It reveals files that were heavily read before but are no longer touched. These are either deprecated code or areas where your knowledge has gone stale.

developer-tools ai