The Problem: Gathering Information, Not Making Decisions
Every morning, the same cycle: I sit down at my computer but spend the first half hour gathering information instead of making decisions. Which emails came in on Gmail, what’s on the calendar today, which project has issues, what’s the status of yesterday’s leftover tasks, what’s been published in the feeds I follow.
None of this is hard. But it’s scattered. Five different tools, five different tabs, five different context switches. And cognitive load accumulates before any real work has begun.
I built Chief of Staff to solve this problem: a local AI assistant that runs overnight, collects data, classifies it, and delivers a ready briefing each morning.
The Result: The Briefing That Greets Me Every Morning
Before getting into technical details, I want to show the output the system produces. Every morning when I open Obsidian, a Daily Note like this is waiting for me:
# 2026-03-09
## Calendar
- 10:00-11:00 Client X meeting (Calendly) **prep needed**
- 14:00-14:30 Deploy review
- Free: 07:00-10:00, 11:00-14:00, 14:30-18:00
## Project Status
- OK: leetty, ceaksan
- validough: 3 errors (Neon connection timeout)
## Feed Highlights
- [Interesting Article](https://example.com) (Tech Blog) ~5min
- [Another Post](https://example.com) (Dev Weekly) ~3min
## Classified Tasks
### DISPATCH (AI handles)
- [ ] Client A email reply, meeting confirmation #email
- [ ] Blog research note #content
### PREP (80% ready, you finish)
- [ ] Client Y hosting migration reply, draft ready #email
- [ ] validough Neon timeout, summary + fix direction #dev
### YOURS (your brain needed)
- [ ] Client X meeting prep
- [ ] leetty checkout flow fix #dev
### SKIP (not today)
- [ ] validough onboarding wizard, P3, far deadline
## Carried Over
- [ ] [P2] Blog post publish, pending 2 days
Calendar, project statuses, classified tasks, items carried over from previous days. All in one place, collected and sorted. I just approve or adjust, then start working.
An Overview of the Options
There are multiple ways to build this kind of automation. Each approach has its own strengths and weaknesses.
Fully Autonomous Agent Systems
Systems where you define a task and expect the agent to solve it end-to-end. Appealing in theory, problematic in practice. Agents often enter loops, costs spiral unpredictably, and results are hard to anticipate. In a solo entrepreneur context, loss of control is an unacceptable risk.
No-Code Automation Platforms
Platforms that connect services through drag-and-drop interfaces. They excel at cloud-to-cloud integrations. However, when it comes to local file system access (like my Obsidian vault), custom classification logic, or LLM integration, they either hit their limits or require complex workarounds.
AI Assistant Platforms
Configurable AI assistant services. Useful for tasks like creating email drafts and calendar management. But data passes through their servers. For my consulting clients’ data and my own projects, this is a privacy risk. Monthly subscription costs add up, and customization options remain limited.
Agent Frameworks
Developer-focused frameworks that enable defining agent workflows in code. Powerful and flexible tools. But several issues arise for my situation:
- Security and privacy: Both my own and my clients’ data are involved
- Cost: Infrastructure overhead of unused features
- No generalizable routine: Between consulting and my own projects, there’s no workflow that reduces to a general routine. Assigning tasks via chat doesn’t fit how I work
- Overengineering risk: A purpose-built flow is more optimized and controllable than a ton of unused features
Build Your Own
Building your own pipeline with cron jobs, Python scripts, and API calls. The most flexible approach, but also the one requiring the most development time. Instead of writing everything from scratch, an orchestration layer over existing tools can be more efficient.
Local / Open Source Model Alternatives
Running models like Llama, Mistral, or Qwen locally through tools like Ollama. No cloud dependency, data never leaves. However, these models’ tool calling and long context management capabilities are currently limited. They may suffice for simple classification tasks, but they’re not yet mature enough for complex orchestration. This balance may shift over time.
| Approach | Strength | Weakness (for my constraints) |
|---|---|---|
| Fully autonomous agent | Minimal intervention | Loss of control, cost uncertainty |
| No-code automation | Quick setup | Weak local file access, limited LLM integration |
| AI assistant platform | Ready to use | Privacy risk, limited customization |
| Agent framework | Flexible, powerful | Overengineering, unnecessary complexity |
| Your own scripts | Full control | Development time, maintenance burden |
| Local models | Privacy, zero cost | Tool calling and context management still limited |
My Choice: Lightweight and Purpose-Built
After evaluating these options, I decided to write my own solution. But not entirely from scratch.
I already had mini-services built for different needs. My own knowledge base MCP server for information management 1, my own scraper for pulling data from the web, my own semantic code search MCP server for searching codebases 2. Each one works independently and is accessible as an MCP server or API. I have similar mini-services in other areas too: design tokens, geometric pattern generation, e-commerce operations.
I use this ecosystem both to develop my own projects and to spot potential issues between developer and user perspectives. Using my own products daily shortens the feedback loop.
What was missing was an orchestration layer. A flow that brings existing pieces together, runs overnight, and produces a ready briefing by morning. Chief of Staff is that layer.
The principles behind my choice:
- Lightweight: I don’t want to carry the weight of features I don’t use. For specific situations, progressing case by case is more appropriate
- Local-first: My data stays on my computer. Client data going to third-party servers is unacceptable
- Portable: I want a solution I can carry with me, one that can run off USB power. I don’t want to be tied to a single computer
- Reviewable: I want to continuously review, adjust, and control the flow. Not a set-and-forget system, but a tool I actively manage
- Cost control: Each step’s cost is predetermined, no surprise bills
Why Claude Code
I evaluated these criteria when choosing a model:
| Criterion | Claude Code | GPT + API | Gemini + API | Local Model (Ollama) |
|---|---|---|---|---|
| MCP support | Native, built-in | None (custom integration required) | None | None (experimental) |
| Non-interactive mode | claude -p | API call required | API call required | ollama run |
| Budget control | --max-budget-usd flag | Manual tracking | Manual tracking | No cost |
| Tool calling | Strong | Strong | Strong | Limited |
| Gmail/Calendar access | MCP connector (no OAuth) | OAuth + API key required | OAuth + API key required | Manual integration |
| Local execution | Yes (CLI) | No (API) | No (API) | Yes |
Claude Code’s most decisive advantage is MCP connectors. I access Gmail and Google Calendar data through Claude’s built-in MCP connections. No need to create a project in Google Cloud Console, manage OAuth credentials, or store API keys. I connect once via /mcp in Claude Code, and everything works from there.
With claude -p (non-interactive mode), I pass a prompt file and run it. The --max-budget-usd flag sets the maximum cost for each run. This makes it possible to run automatically overnight via cron job or launchd.
# Overnight pipeline
claude -p prompts/collect.md --max-budget-usd 2.00 # Data collection
claude -p prompts/classifier.md --max-budget-usd 1.50 # Classification
An important note: the system is designed to be model-agnostic. Prompt files are plain markdown, the data layer is SQLite, collectors are Python scripts. The dependency on Claude Code is limited to MCP connectors and claude -p calls. Running the prompt files through Ollama with Llama or Mistral is technically possible. Gmail API and Google Calendar API can be used directly instead of MCP connectors (additional OAuth setup required). As local models’ tool calling capabilities mature, this transition will become easier.
System Architecture
Chief of Staff consists of three independent layers. Each layer produces value on its own; they don’t depend on each other.
Claude Code (claude -p)
┌───────────────────────────────-───┐
│ MCP Connector'lar │
│ ┌──────────┐ ┌─────────────-─┐ │
│ │ Gmail │ │ Google │ │
│ │ MCP │ │ Calendar MCP │ │
│ └────┬─────┘ └──────┬────────┘ │
└───────┼───────────────┼───────────┘
│ │
┌─────────────────┼───────────────┼─────────-─────────┐
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────-─┐ │
│ │ cos.db (SQLite) │ │
│ │ Tek doğru kaynak │ │
│ └─────────────────────┬───────────────-───────-─┘ │
│ │ │
│ ┌──────────┐ ┌───────▼────────┐ ┌──────────--─┐ │
│ │ Feed │ │ Renderer │ │ Task │ │
│ │Collector │ │ SQLite → MD │ │ Collector. │ │
│ │ (Python) │ │ (Python) │ │ (Python) │ │
│ └──────────┘ └───────┬────────┘ └──────────--─┘ │
└────────────────────────┼────────────────────────-───┘
│
┌──────────▼──────────┐
│ Classifier (Sonnet) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Morning Sweep │
│ (Opus + subagent) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Day Block (Sonnet) │
└─────────────────────┘
Layer 1: Overnight Collection
Runs automatically at 06:00 via launchd. A single claude -p session collects Gmail and Calendar data via MCP and writes to SQLite. Then Python scripts run sequentially:
| Source | Method | What It Collects |
|---|---|---|
| Gmail | MCP connector | Actionable emails from the last 24 hours |
| Google Calendar | MCP connector | Today’s and tomorrow’s events across all calendars |
| RSS feeds | Python (Miniflux REST API) | Unread feed entries |
| Obsidian tasks | Python (file scanning) | Incomplete tasks |
| Project health | Python (custom scripts) | Project status, error count, last deploy |
If a source fails, the others continue running. The Daily Note shows a warning for the failed source.
Layer 1.5: Overnight Classification
After collection completes, Claude Sonnet classifies pending items:
| Class | Meaning | Example |
|---|---|---|
| DISPATCH | AI can handle entirely | Meeting confirmation reply, research note |
| PREP | AI does 80%, I finish | Complex email draft, error analysis |
| YOURS | My brain required | Strategy decisions, pricing, live meetings |
| SKIP | Not today | Low priority, far deadline |
Classification happens overnight. When I sit down in the morning, the sorted plan is already waiting.
Layer 2: Morning Sweep
On-demand, I trigger it. Claude Opus shows the classified plan, I approve or adjust. Subagents run in parallel for approved DISPATCH and PREP tasks:
| Agent | Scope | Safety |
|---|---|---|
| Email Agent | Creates Gmail drafts (MCP) | Never sends |
| Dev Prep Agent | Error summary + fix direction | Read-only |
| Content Agent | Blog draft, research note | Writes to specific folders only |
| Calendar Agent | Meeting prep note | Read-only |
Layer 3: Day Block
Triggered after the Morning Sweep. Places remaining YOURS and PREP tasks into free calendar blocks. Writes to a dedicated “AI Plan” Google Calendar, doesn’t mix with the main calendar.
Data Layer: SQLite
All data lives in SQLite. Obsidian is just the view layer, not the database.
-- Domain tables (source-specific)
CREATE TABLE emails (
id TEXT PRIMARY KEY,
thread_id TEXT,
subject TEXT,
sender TEXT,
snippet TEXT,
labels TEXT, -- JSON array
received_at TEXT NOT NULL,
raw_payload JSON
);
-- Work queue (pipeline lifecycle)
CREATE TABLE work_queue (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain_type TEXT NOT NULL, -- email, event, task, health, feed
domain_id TEXT NOT NULL,
priority TEXT, -- P1, P2, P3, P4
status TEXT NOT NULL DEFAULT 'pending',
content_hash TEXT,
collected_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(domain_type, domain_id)
);
-- Classifications (audit trail)
CREATE TABLE classifications (
id INTEGER PRIMARY KEY AUTOINCREMENT,
queue_id INTEGER NOT NULL,
category TEXT NOT NULL, -- dispatch, prep, yours, skip
reason TEXT,
model TEXT,
FOREIGN KEY (queue_id) REFERENCES work_queue(id)
);
Domain tables (emails, events, tasks, health_checks, feeds) hold source-specific structured data. work_queue tracks each item’s status in the pipeline. classifications records the reasoning behind each classification decision and which model made it. content_hash prevents re-classifying unchanged content.
Safety Model
| Rule | Implementation |
|---|---|
| Never sends emails | Email Agent only creates drafts (gmail_create_draft) |
| Budget cap | --max-budget-usd flag on every claude -p call |
| Mutex | shlock prevents concurrent runs |
| Idempotent | INSERT OR IGNORE + unique index prevents duplicate inserts |
| Dry run | --dry-run preview on Day Block |
| Failure isolation | One source failing doesn’t affect others |
| Human approval | Morning Sweep shows classifications, waits for approval |
| Scoped writes | Content Agent writes to specific vault folders only |
Is there a risk of an email being misclassified? Yes. But the system never acts autonomously. At Morning Sweep, all classifications come to me with their reasoning. Nothing runs without my approval. Critical decisions (pricing, strategy, contracts) are defined as force_yours in the config, preventing the AI from classifying them as DISPATCH or PREP.
Cost
| Component | Cost |
|---|---|
| Overnight collection + classification (Sonnet) | ~$1-3/day |
| Morning sweep (Opus) | ~$1-3/day |
| Day block (Sonnet) | ~$0.25-1/day |
| Google APIs | Free (via MCP) |
| Total | ~$2-7/day |
These numbers stay under control thanks to budget caps. If a step exceeds its allocated budget, it stops gracefully rather than crashing.
Current Status
Working layers:
- SQLite schema and data layer (9 tables, 5 views)
- Calendar, Gmail, Feed, Task, Health, Radar collectors
- Cloudflare (Workers + Pages) and Coolify (apps, services, databases) health monitoring
- Renderer (SQLite to Obsidian Daily Note generation)
- Classifier prompt and flow
- Parallel sweep orchestrator with 4 domain agents
- Pipeline runner (
run.sh,cos-brief.sh) with healthchecks.io monitoring - Weekly stats digest and scheduling insights
- Interactive setup wizard (
setup_wizard.py)
Not yet complete:
- Day Block (writing time blocks to calendar)
- Retry logic (exponential backoff for failed agent runs)
- Vercel and Neon health collectors
The system runs every night and prepares my briefing every morning. Since the incomplete layers are independent, they don’t break the existing flow.
Update: Parallel Agents and the Ecosystem (March 20, 2026)
Since publishing this post, three things changed.
Others built similar systems
Jim Prosser, a non-technical consultant from Marin County, built his version in 36 hours and wrote about it on Medium and LinkedIn. Anthropic published an official Claude Agent SDK cookbook using a Chief of Staff scenario. Someone on Reddit built a skill that separates planning from building.
I compared all of them with my implementation. The patterns are converging: everyone lands on some variant of dispatch/prep/yours/skip classification. The differences are in the data layer and safety model. Prosser stores everything in Todoist (third-party dependency), the Anthropic cookbook uses CSV files, the Reddit skill has no persistence at all. My system’s SQLite intermediate layer with content hash dedup, idempotent inserts, and audit trail classifications is the most robust of the four. What I was missing was parallel agent execution.
Subagents now run in parallel
I replaced the monolithic sweep prompt with an async Python orchestrator (collectors/orchestrator.py). Four domain-specific agents run concurrently with semaphore-based concurrency control:
| Agent | Model | Budget | What it does |
|---|---|---|---|
| Calendar | Sonnet | $0.50 | Meeting prep notes |
| Health | Sonnet | $0.50 | Error analysis, fix direction |
| Task | Sonnet | $0.50 | Task completion notes, research outlines |
| Feed | Sonnet | $0.50 | Actionable feed summaries |
Each agent gets its own prompt file, budget cap, timeout, and log file. If one agent fails, others still complete. Results are collected and imported to cos.db in a single transaction. The orchestrator supports --sequential mode for debugging and --dry-run for testing without running agents.
Email agent is classification-only
After testing, I decided not to auto-create Gmail drafts. Emails are still classified and shown in the Daily Note, but no agent touches Gmail. I handle email responses manually after reviewing the briefing. The email agent prompt exists in the codebase but is excluded from the orchestrator’s dispatch map. One line change to re-enable it.
The overnight pipeline now runs collect, classify, and render only. Sweep is triggered manually after reviewing the Daily Note. This matches how I actually work: I want to see the plan before anything fires.
Platform health monitoring across the stack
The health collector now monitors infrastructure beyond individual projects. Two platform-level scripts check Cloudflare and Coolify resources automatically:
| Platform | What it monitors | Method |
|---|---|---|
| Cloudflare Workers | Error rates, invocation counts | GraphQL Analytics API |
| Cloudflare Pages | Deployment status | REST API |
| Coolify Apps | Container status (running/healthy) | REST API via Cloudflare Tunnel |
| Coolify Services | Service health | REST API via Cloudflare Tunnel |
| Coolify Databases | Database status | REST API via Cloudflare Tunnel |
Each platform script outputs a JSON array. The health collector runs them alongside per-project health scripts and writes everything to cos.db. If a Worker’s error rate exceeds 10%, it shows as P1 in the Daily Note. If a Coolify container exits or becomes unhealthy, same treatment.
The architecture is extensible: adding a new platform (Vercel, Neon, Railway) means adding one script to PLATFORM_SCRIPTS and a config section. No changes to the collector or renderer needed.
Setup wizard
The project now includes an interactive setup wizard (setup_wizard.py). It reads config.example.toml as a template, walks through each section with sensible defaults, and generates config.toml. It also initializes the SQLite database and optionally installs the macOS launchd agent for overnight scheduling.
Every step is skippable. Optional integrations (Miniflux, Coolify, Cloudflare, healthchecks.io) are prompted separately. A --validate mode checks an existing config without interactive prompts. No external dependencies, stdlib only.
Closing
Chief of Staff isn’t the right solution for everyone. Agent frameworks or no-code platforms may be more suitable for many people. In my case, there’s no workflow between consulting and my own projects that reduces to a general routine. For routine situations, I already have different approaches that are sufficient. For specific situations, progressing case by case is more appropriate.
What this approach gives me is control. I can continuously review the flow, change it when needed, and avoid carrying the weight of unused features. A lightweight, purpose-built, portable system. When I sit down in the morning, I start making decisions instead of gathering information.
I’ll cover my other work in this area, the mini-services ecosystem, and the portable AI setup in future posts.
The project has been published as open source on GitHub 3.
Footnotes
- dnomia-knowledge: Project-based semantic knowledge management MCP server ↩
- mcp-code-search: Local semantic code search MCP server ↩
- Chief of Staff GitHub repository ↩
- 01 Automating information gathering accelerates the decision-making process
- 02 MCP connectors eliminate the need for OAuth setup or API key management
- 03 The system is model-agnostic: prompt files and SQLite make it portable to any LLM
- 04 Each layer works independently and delivers value on its own
+ What is Chief of Staff?
A local-first AI assistant built for solo entrepreneurs. It collects Gmail, calendar, RSS feeds and Obsidian tasks overnight, classifies them, and presents a ready-made briefing each morning. Built on Claude Code, Python and SQLite, fully open source.
+ What data sources does it support?
Gmail and Google Calendar (via MCP connectors), Miniflux RSS feeds (REST API), Obsidian vault tasks (Python grep), and project health checks (customizable scripts).
+ Why not use OpenClaw or a similar framework?
Security, privacy, cost and portability constraints. Consulting and personal projects don't reduce to a generalizable routine. A lightweight, purpose-built solution is more optimized and controllable than a large framework with unused features.
+ Does it work with models other than Claude?
The system is designed to be model-agnostic. Prompt files and the SQLite layer can work with any LLM. Local models like Llama, Mistral or Qwen via Ollama are also viable options.
+ What does it cost per day?
Roughly $2-7 per day. Collection and classification run on Sonnet, the morning sweep runs on Opus. Budget caps keep each step's cost under control.