AI Pair Programming: Coffee Debt, Gamifying Error Tracking

TL;DR

Every AI mistake is a bean, 5 beans make a coffee debt. 4 hook scripts auto-capture errors: tool failures, user corrections, blocked destructive commands. Everything is logged in JSONL format with context proxies. 14 days of data show: Bash is the most error-prone tool (14), user_correction is the most expensive error type (12x), morning hours are peak time, 11 destructive commands were blocked. Deep analysis cross-references 4 data sources to reveal where, when, and why errors cluster. Data instead of frustration.

View Premium Sign In

Membership Required

You need to sign in and have a Premium subscription to access this content.

Key Takeaways

01 Coffee Debt turns AI mistakes into observability data. 1 mistake = 1 bean, 5 beans = 1 coffee debt. Cumulative, never resets.
02 4 hook scripts auto-capture errors: PostToolUse (Edit/Write/Bash failures), UserPromptSubmit (user corrections), PreToolUse (destructive command blocking), SessionStart (banner).
03 User correction is the most expensive error type. A tool error costs time, but a user correction costs both time and trust.
04 Context proxies (prompt_count, session_kb) log the conditions under which errors occur. This makes it possible to test hypotheses like 'do errors increase as context grows.'
05 Deep analysis cross-references 4 data sources: coffee-log (errors), hermes.db (search history), state.db (code complexity), knowledge.db (file interactions). Errors aren't analyzed in isolation but within their full context.

Frequently Asked Questions (FAQ)

+ What is Coffee Debt?

Coffee Debt is a hook-based system that turns Claude Code mistakes into gamified observability data. Every AI error (tool failure, user correction, blocked destructive command) adds a bean. 5 beans = 1 coffee debt. The debt is cumulative and never resets. The goal isn't punishment, it's data collection.

+ Which errors add beans to Coffee Debt?

Three sources: (1) Tool failures: when an Edit command can't find a match, when a Bash command returns a non-zero exit code, when a Write operation fails. (2) User corrections: when Turkish or English correction phrases are detected (yanlis, hatali, that's wrong, you hallucinated, etc.). (3) Blocked commands: when destructive commands like rm -rf, git reset --hard, git push --force are stopped by the hook.

+ What does deep analysis do?

It cross-references 4 data sources. coffee-log.jsonl (error records), hermes.db (web search history, was there a search within 5 minutes before the error), state.db (file complexity and static analysis results, are error-prone files also complex), knowledge.db (file interaction history, which files are touched most often). This cross-referencing provides deeper clues about why errors occur.

+ What are context proxies for?

The prompt_count and session_kb fields added to each error record show the conditions under which the error occurred. prompt_count is the number of prompts in the session, session_kb is the size of the session JSONL file (a proxy for context window usage). With this data, hypotheses like 'does the error rate increase as context grows' or 'do errors cluster as the session gets longer' become testable.

ai developer-tools

Membership Required

RELATED

Argus: Make Your AI Coding Assistant's Web Searches Visible

Which Files Do You Actually Read? Developer Interaction Tracking

Living Architecture Documentation for AI Coding Agents: Research, Approaches, and Tools