Argus: Make Your AI Coding Assistant's Web Searches Visible

TL;DR

AI coding assistants run web searches and page fetches during a conversation, but those calls are never recorded anywhere. You cannot see what was searched, which source was fetched how many times, or how searches shift across sessions. Argus captures every WebSearch and WebFetch call through Claude Code's PreToolUse/PostToolUse hooks, writes them to a local SQLite database, and lets you query it from the CLI. 68-day snapshot: 2032 queries, 177 sessions, 20 projects. The argus analyze command turns the raw log into three signals: content candidate clusters, session efficiency, and knowledge gaps. These signals overlap with runtime failure patterns: repeated fetches mean a loop, falling efficiency means exploration drift, an abandoned search means premature conclusions.

AI coding assistants run web searches during a conversation. When they are unsure about something they search, open a doc, pull up a Reddit thread. But those searches stay invisible. There is no record showing what was searched, which results came back, or how many times the same page was fetched. Argus was written precisely to close that gap: it captures every WebSearch and WebFetch call, stores it locally, and lets you query it from the CLI.

The Invisible Problem

When you work with an AI assistant, its search behavior is a closed box. The assistant says “let me look this up,” and a few seconds later it comes back with an answer. What happened in between stays invisible. Which query was written, how many results came back, which source was read, whether the same doc was fetched again. These questions go unanswered.

This invisibility makes two things impossible. First, you cannot tell what the assistant researches often. If it keeps returning to the same topic, that is a sign of either a knowledge gap or a topic worth writing about. Second, you cannot notice the inefficiency in its search behavior. Fetching the same source three times, abandoning one topic for another without resolving it, searching externally for an answer that was actually in your own notes. None of this gets measured.

I had already built a system that logs AI mistakes. Coffee Debt logs the mistakes the assistant makes in JSONL format and extracts patterns. Argus is the search-side counterpart of the same philosophy: it makes not the error but the search behavior visible.

What Argus Does

Argus is a small CLI tool that hooks into Claude Code’s hook system. The working principle is simple:

Automatic capture: PreToolUse and PostToolUse hooks listen for WebSearch and WebFetch calls. A record is created when the query is sent and when the result returns.
Local storage: All data sits in a single SQLite file. No cloud, no external dependency, the data never leaves the machine.
Search history: Past queries can be filtered by date, project, assistant, and keyword.
Pattern analysis: argus analyze turns the raw log into interpretable signals.
Export: JSON and CSV output, for further analysis.

It works across multiple assistants. Claude Code is the primary target, but there are hook integrations for Kimi CLI and OpenCode too.

Installation

git clone https://github.com/ceaksan/argus.git
cd argus
npm install
npm run build
npm link

argus hook install

From this point on, every WebSearch and WebFetch in Claude Code is logged silently in the background. No step beyond installation is needed.

68 Days of Evidence

I have been running Argus on my own usage since March 26. As of June 1, the accumulated data looks like this:

Metric	Value
Total queries	2032 (1283 search + 749 fetch)
Sessions	177
Projects	20
Period	68 days (March 26 - June 1)
Top searching project	ceaksan-v4.0 (1183, 58%)
Second	DNM_Projects (662, 33%)

Assistant breakdown:

Assistant	Queries
claude-code	1936
kimi-code	88
opencode	8

There is an honest limit here. Cross-assistant data exists but is not sustained. The kimi-code records are all fetches with no search at all, and they are squeezed into a three-day window in April. So this data cannot carry an “assistant comparison” claim. Single user, hook-filtered, small sample. Argus is an observation tool, not a statistical benchmarking platform given the limited data at hand for now.

From the Log to Failure-Observability

The raw log on its own tells a limited story. Saying “this query ran, this page was fetched” is a data point, not a meaning. argus analyze turns this log into three signals, and these signals connect to the behavior patterns the assistant produces while it works.

In LLM Behavioral Failure Modes, the pillar I wrote earlier, I described how models drift over long runs. The signals Argus produces are the observable traces of those patterns on the search side. In an upcoming post I will place these signals alongside different failure taxonomies; there Argus will represent a narrow but concrete slice of the WebSearch and WebFetch layer.

Four Signals, Real Output

Content Candidate Clusters

Repeated searches on the same topic from different angles form a cluster. Over the 68-day period four clusters stood out: Looker Studio Community Connector experiences, the Chrome third-party cookie deprecation timeline, marketing mix modeling practitioner experiences, and Astro redirect patterns.

── Content Signals (4 clusters) ──
  "looker studio community connector experience problems" (3 queries)
    depth: 3 unique angles, 0 repeated fetches
  "Chrome third-party cookie deprecation timeline" (3 queries)
    depth: 3 unique angles, 1 repeated fetches
  "marketing mix modeling meridian robyn pymc practitioner" (3 queries)
    depth: 3 unique angles, 2 repeated fetches
  "astro content collection aliases redirects old-slug" (3 queries)
    depth: 3 unique angles, 3 repeated fetches

The “repeated fetches” count here matters. Repeatedly pulling the same source shows the assistant returning to the same place to solve one topic. I relate this to the step-repetition and loop patterns in the behavioral failure modes I wrote about earlier. A high repeat count may signal both a topic worth writing about and that the assistant struggled with it.

Session Efficiency

Each session is scored by its ratio of repeated queries and fetches. Short, focused sessions land at 90% and above, while score drops in long exploration sessions.

── Session Efficiency ──
  Session 0e171..  │ 63 queries │ score: 70%
  Session 9bb6c..  │ 24 queries │ score: 72%
  Session 46085..  │ 20 queries │ score: 79%

A 63-query session dropping to 70% is notable. In long sessions the assistant reformulates the query again and again, circling the same topic. I read this as exploration drift. The score is a numeric indicator of how efficiency erodes as a session grows longer.

Knowledge Gaps

Some queries are abandoned without finding any satisfying result. argus analyze flags these queries when the dnomia-knowledge bridge is active. A low match score points to either a real knowledge gap or the assistant leaving a topic half-done and jumping to a premature conclusion.

Missed Connections

The last signal is the most interesting one. The dnomia-knowledge bridge checks whether the answer to a query searched externally was actually inside your own notes. A high match score captures the “the answer was inside but the assistant asked outside” case. This is an observable example of turning outward without using the knowledge already in context.

Roadmap

Argus is open source, ISC licensed. With this release argus analyze and argus stats became filterable by assistant. Next up are cross-assistant comparison reports and wider use of the semantic bridge. As data accumulates the signals will sharpen; the value of an observation tool is the patterns that emerge over time.

Source code and installation: github.com/ceaksan/argus

Key Takeaways

01 AI assistants' web searches are invisible by default. Argus captures them via PreToolUse/PostToolUse hooks and writes them to local SQLite, zero cloud dependency.
02 A 68-day snapshot gives a concrete observation ground: 2032 queries, 177 sessions, 20 projects. 58% of searches concentrate in a single project.
03 argus analyze turns the raw log into three signals: content candidate clusters, session efficiency score, knowledge gaps. The raw log gives data, the analysis gives understanding.
04 Signals overlap with runtime failure patterns: repeatedly fetching the same source is a loop, falling efficiency in long sessions is exploration drift, an abandoned search is a sign of premature conclusion.
05 Cross-assistant data exists but is not yet sustained. Single user, hook-filtered, small sample. Argus is an observation tool, not a generalizable benchmark.

Frequently Asked Questions (FAQ)

+ What is Argus?

Argus is an open source CLI tool that logs and analyzes the web searches AI coding assistants make. It hooks into Claude Code's PreToolUse and PostToolUse events to capture every WebSearch and WebFetch call, storing them in a local SQLite database. There is no cloud dependency; the data never leaves your machine. Kimi CLI and OpenCode integrations are also available.

+ What data does Argus record?

Each search record holds these fields: timestamp, session id, assistant name, type (search or fetch), query text, trigger text, results, and project directory. This structure lets you filter searches by project, session, and assistant, and track how they change over time.

+ What does argus analyze do?

It turns the raw search log into three signals. First, content candidate clusters: repeated searches on the same topic from different angles signal a topic worth writing about. Second, session efficiency: it scores each session by its ratio of repeated queries and fetches. Third, knowledge gaps: through the dnomia-knowledge bridge, it flags topics searched externally that already had an answer in your own notes.

+ Does Argus work with a single assistant only?

No. Claude Code is the primary target, but there are hook integrations for Kimi CLI and OpenCode too. The argus stats --assistant and argus analyze --assistant commands let you filter by assistant. Right now most of the data comes from Claude Code; cross-assistant comparison is still early.

ai developer-tools

The Invisible Problem

What Argus Does

Installation

68 Days of Evidence

From the Log to Failure-Observability

Four Signals, Real Output

Content Candidate Clusters

Session Efficiency

Knowledge Gaps

Missed Connections

Roadmap

RELATED

AI Pair Programming: Coffee Debt, Gamifying Error Tracking

Which Files Do You Actually Read? Developer Interaction Tracking

Context Engineering for AI Coding Agents: From Static Documents to a Living Ecosystem

LLM failure patternsand how to defend

LLM failure patterns
and how to defend