Chief of Staff: A Local-First AI Assistant That Prepares Your Daily Workflow

TL;DR

Instead of spending every morning gathering information from multiple sources, I built Chief of Staff so I can sit down and start making decisions. It uses claude -p to collect Gmail and calendar data via MCP, adds RSS feeds and tasks through Python scripts, writes everything to SQLite, classifies items with Claude Sonnet, and presents the result as an Obsidian Daily Note. Fully local, open source.

The Problem: Gathering Information, Not Making Decisions

Every morning, the same cycle: I sit down at my computer but spend the first half hour gathering information instead of making decisions. Which emails came in on Gmail, what’s on the calendar today, which project has issues, what’s the status of yesterday’s leftover tasks, what’s been published in the feeds I follow.

None of this is hard. But it’s scattered. Five different tools, five different tabs, five different context switches. And cognitive load accumulates before any real work has begun.

I built Chief of Staff to solve this problem: a local AI assistant that runs overnight, collects data, classifies it, and delivers a ready briefing each morning.

The Result: The Briefing That Greets Me Every Morning

Before getting into technical details, I want to show the output the system produces. Every morning when I open Obsidian, a Daily Note like this is waiting for me:

# 2026-03-09

## Calendar

- 10:00-11:00 Client X meeting (Calendly) **prep needed**
- 14:00-14:30 Deploy review
- Free: 07:00-10:00, 11:00-14:00, 14:30-18:00

## Project Status

- OK: leetty, ceaksan
- validough: 3 errors (Neon connection timeout)

## Feed Highlights

- [Interesting Article](https://example.com) (Tech Blog) ~5min
- [Another Post](https://example.com) (Dev Weekly) ~3min

## Classified Tasks

### DISPATCH (AI handles)

- [ ] Client A email reply, meeting confirmation #email
- [ ] Blog research note #content

### PREP (80% ready, you finish)

- [ ] Client Y hosting migration reply, draft ready #email
- [ ] validough Neon timeout, summary + fix direction #dev

### YOURS (your brain needed)

- [ ] Client X meeting prep
- [ ] leetty checkout flow fix #dev

### SKIP (not today)

- [ ] validough onboarding wizard, P3, far deadline

## Carried Over

- [ ] [P2] Blog post publish, pending 2 days

Calendar, project statuses, classified tasks, items carried over from previous days. All in one place, collected and sorted. I just approve or adjust, then start working.

An Overview of the Options

There are multiple ways to build this kind of automation. Each approach has its own strengths and weaknesses.

Fully Autonomous Agent Systems

Systems where you define a task and expect the agent to solve it end-to-end. Appealing in theory, problematic in practice. Agents often enter loops, costs spiral unpredictably, and results are hard to anticipate. In a solo entrepreneur context, loss of control is an unacceptable risk.

No-Code Automation Platforms

Platforms that connect services through drag-and-drop interfaces. They excel at cloud-to-cloud integrations. However, when it comes to local file system access (like my Obsidian vault), custom classification logic, or LLM integration, they either hit their limits or require complex workarounds.

AI Assistant Platforms

Configurable AI assistant services. Useful for tasks like creating email drafts and calendar management. But data passes through their servers. For my consulting clients’ data and my own projects, this is a privacy risk. Monthly subscription costs add up, and customization options remain limited.

Agent Frameworks

Developer-focused frameworks that enable defining agent workflows in code. Powerful and flexible tools. But several issues arise for my situation:

Security and privacy: Both my own and my clients’ data are involved
Cost: Infrastructure overhead of unused features
No generalizable routine: Between consulting and my own projects, there’s no workflow that reduces to a general routine. Assigning tasks via chat doesn’t fit how I work
Overengineering risk: A purpose-built flow is more optimized and controllable than a ton of unused features

Build Your Own

Building your own pipeline with cron jobs, Python scripts, and API calls. The most flexible approach, but also the one requiring the most development time. Instead of writing everything from scratch, an orchestration layer over existing tools can be more efficient.

Local / Open Source Model Alternatives

Running models like Llama, Mistral, or Qwen locally through tools like Ollama. No cloud dependency, data never leaves. However, these models’ tool calling and long context management capabilities are currently limited. They may suffice for simple classification tasks, but they’re not yet mature enough for complex orchestration. This balance may shift over time.

Approach	Strength	Weakness (for my constraints)
Fully autonomous agent	Minimal intervention	Loss of control, cost uncertainty
No-code automation	Quick setup	Weak local file access, limited LLM integration
AI assistant platform	Ready to use	Privacy risk, limited customization
Agent framework	Flexible, powerful	Overengineering, unnecessary complexity
Your own scripts	Full control	Development time, maintenance burden
Local models	Privacy, zero cost	Tool calling and context management still limited

My Choice: Lightweight and Purpose-Built

After evaluating these options, I decided to write my own solution. But not entirely from scratch.

I already had mini-services built for different needs. My own knowledge base MCP server for information management ¹, my own scraper for pulling data from the web, my own semantic code search MCP server for searching codebases ². Each one works independently and is accessible as an MCP server or API. I have similar mini-services in other areas too: design tokens, geometric pattern generation, e-commerce operations.

I use this ecosystem both to develop my own projects and to spot potential issues between developer and user perspectives. Using my own products daily shortens the feedback loop.

What was missing was an orchestration layer. A flow that brings existing pieces together, runs overnight, and produces a ready briefing by morning. Chief of Staff is that layer.

The principles behind my choice:

Lightweight: I don’t want to carry the weight of features I don’t use. For specific situations, progressing case by case is more appropriate
Local-first: My data stays on my computer. Client data going to third-party servers is unacceptable
Portable: I want a solution I can carry with me, one that can run off USB power. I don’t want to be tied to a single computer
Reviewable: I want to continuously review, adjust, and control the flow. Not a set-and-forget system, but a tool I actively manage
Cost control: Each step’s cost is predetermined, no surprise bills

Why Claude Code

I evaluated these criteria when choosing a model:

Criterion	Claude Code	GPT + API	Gemini + API	Local Model (Ollama)
MCP support	Native, built-in	None (custom integration required)	None	None (experimental)
Non-interactive mode	`claude -p`	API call required	API call required	`ollama run`
Budget control	`--max-budget-usd` flag	Manual tracking	Manual tracking	No cost
Tool calling	Strong	Strong	Strong	Limited
Gmail/Calendar access	MCP connector (no OAuth)	OAuth + API key required	OAuth + API key required	Manual integration
Local execution	Yes (CLI)	No (API)	No (API)	Yes

Claude Code’s most decisive advantage is MCP connectors. I access Gmail and Google Calendar data through Claude’s built-in MCP connections. No need to create a project in Google Cloud Console, manage OAuth credentials, or store API keys. I connect once via /mcp in Claude Code, and everything works from there.

With claude -p (non-interactive mode), I pass a prompt file and run it. The --max-budget-usd flag sets the maximum cost for each run. This makes it possible to run automatically overnight via cron job or launchd.

# Overnight pipeline
claude -p prompts/collect.md --max-budget-usd 2.00    # Data collection
claude -p prompts/classifier.md --max-budget-usd 1.50  # Classification

An important note: the system is designed to be model-agnostic. Prompt files are plain markdown, the data layer is SQLite, collectors are Python scripts. The dependency on Claude Code is limited to MCP connectors and claude -p calls. Running the prompt files through Ollama with Llama or Mistral is technically possible. Gmail API and Google Calendar API can be used directly instead of MCP connectors (additional OAuth setup required). As local models’ tool calling capabilities mature, this transition will become easier.

System Architecture

Chief of Staff consists of three independent layers. Each layer produces value on its own; they don’t depend on each other.

              Claude Code (claude -p)
              ┌───────────────────────────────-───┐
              │  MCP Connector'lar                │
              │  ┌──────────┐  ┌─────────────-─┐  │
              │  │  Gmail   │  │  Google       │  │
              │  │  MCP     │  │  Calendar MCP │  │
              │  └────┬─────┘  └──────┬────────┘  │
              └───────┼───────────────┼───────────┘
                      │               │
    ┌─────────────────┼───────────────┼─────────-─────────┐
    │                 ▼               ▼                   │
    │  ┌─────────────────────────────────────────────-─┐  │
    │  │              cos.db (SQLite)                  │  │
    │  │              Tek doğru kaynak                 │  │
    │  └─────────────────────┬───────────────-───────-─┘  │
    │                        │                            │
    │  ┌──────────┐  ┌───────▼────────┐  ┌──────────--─┐  │
    │  │  Feed    │  │   Renderer     │  │  Task       │  │
    │  │Collector │  │  SQLite → MD   │  │ Collector.  │  │
    │  │ (Python) │  │  (Python)      │  │ (Python)    │  │
    │  └──────────┘  └───────┬────────┘  └──────────--─┘  │
    └────────────────────────┼────────────────────────-───┘
                             │
                  ┌──────────▼──────────┐
                  │ Classifier (Sonnet) │
                  └──────────┬──────────┘
                             │
                  ┌──────────▼──────────┐
                  │  Morning Sweep      │
                  │  (Opus + subagent)  │
                  └──────────┬──────────┘
                             │
                  ┌──────────▼──────────┐
                  │  Day Block (Sonnet) │
                  └─────────────────────┘

Layer 1: Overnight Collection

Runs automatically at 06:00 via launchd. A single claude -p session collects Gmail and Calendar data via MCP and writes to SQLite. Then Python scripts run sequentially:

Source	Method	What It Collects
Gmail	MCP connector	Actionable emails from the last 24 hours
Google Calendar	MCP connector	Today’s and tomorrow’s events across all calendars
RSS feeds	Python (Miniflux REST API)	Unread feed entries
Obsidian tasks	Python (file scanning)	Incomplete tasks
Project health	Python (custom scripts)	Project status, error count, last deploy

If a source fails, the others continue running. The Daily Note shows a warning for the failed source.

Layer 1.5: Overnight Classification

After collection completes, Claude Sonnet classifies pending items:

Class	Meaning	Example
DISPATCH	AI can handle entirely	Meeting confirmation reply, research note
PREP	AI does 80%, I finish	Complex email draft, error analysis
YOURS	My brain required	Strategy decisions, pricing, live meetings
SKIP	Not today	Low priority, far deadline

Classification happens overnight. When I sit down in the morning, the sorted plan is already waiting.

Layer 2: Morning Sweep

On-demand, I trigger it. Claude Opus shows the classified plan, I approve or adjust. Subagents run in parallel for approved DISPATCH and PREP tasks:

Agent	Scope	Safety
Email Agent	Creates Gmail drafts (MCP)	Never sends
Dev Prep Agent	Error summary + fix direction	Read-only
Content Agent	Blog draft, research note	Writes to specific folders only
Calendar Agent	Meeting prep note	Read-only

Layer 3: Day Block

Triggered after the Morning Sweep. Places remaining YOURS and PREP tasks into free calendar blocks. Writes to a dedicated “AI Plan” Google Calendar, doesn’t mix with the main calendar.

Data Layer: SQLite

All data lives in SQLite. Obsidian is just the view layer, not the database.

-- Domain tables (source-specific)
CREATE TABLE emails (
    id TEXT PRIMARY KEY,
    thread_id TEXT,
    subject TEXT,
    sender TEXT,
    snippet TEXT,
    labels TEXT,           -- JSON array
    received_at TEXT NOT NULL,
    raw_payload JSON
);

-- Work queue (pipeline lifecycle)
CREATE TABLE work_queue (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    domain_type TEXT NOT NULL,  -- email, event, task, health, feed
    domain_id TEXT NOT NULL,
    priority TEXT,              -- P1, P2, P3, P4
    status TEXT NOT NULL DEFAULT 'pending',
    content_hash TEXT,
    collected_at TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(domain_type, domain_id)
);

-- Classifications (audit trail)
CREATE TABLE classifications (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    queue_id INTEGER NOT NULL,
    category TEXT NOT NULL,     -- dispatch, prep, yours, skip
    reason TEXT,
    model TEXT,
    FOREIGN KEY (queue_id) REFERENCES work_queue(id)
);

Domain tables (emails, events, tasks, health_checks, feeds) hold source-specific structured data. work_queue tracks each item’s status in the pipeline. classifications records the reasoning behind each classification decision and which model made it. content_hash prevents re-classifying unchanged content.

Safety Model

Rule	Implementation
Never sends emails	Email Agent only creates drafts (`gmail_create_draft`)
Budget cap	`--max-budget-usd` flag on every `claude -p` call
Mutex	`shlock` prevents concurrent runs
Idempotent	`INSERT OR IGNORE` + unique index prevents duplicate inserts
Dry run	`--dry-run` preview on Day Block
Failure isolation	One source failing doesn’t affect others
Human approval	Morning Sweep shows classifications, waits for approval
Scoped writes	Content Agent writes to specific vault folders only

Is there a risk of an email being misclassified? Yes. But the system never acts autonomously. At Morning Sweep, all classifications come to me with their reasoning. Nothing runs without my approval. Critical decisions (pricing, strategy, contracts) are defined as force_yours in the config, preventing the AI from classifying them as DISPATCH or PREP.

Cost

Component	Cost
Overnight collection + classification (Sonnet)	~$1-3/day
Morning sweep (Opus)	~$1-3/day
Day block (Sonnet)	~$0.25-1/day
Google APIs	Free (via MCP)
Total	~$2-7/day

These numbers stay under control thanks to budget caps. If a step exceeds its allocated budget, it stops gracefully rather than crashing.

Current Status

Working layers:

SQLite schema and data layer (9 tables, 5 views)
Calendar, Gmail, Feed, Task, Health, Radar collectors
Cloudflare (Workers + Pages) and Coolify (apps, services, databases) health monitoring
Renderer (SQLite to Obsidian Daily Note generation)
Classifier prompt and flow
Parallel sweep orchestrator with 4 domain agents
Pipeline runner (run.sh, cos-brief.sh) with healthchecks.io monitoring
Weekly stats digest and scheduling insights
Interactive setup wizard (setup_wizard.py)

Not yet complete:

Day Block (writing time blocks to calendar)
Retry logic (exponential backoff for failed agent runs)
Vercel and Neon health collectors

The system runs every night and prepares my briefing every morning. Since the incomplete layers are independent, they don’t break the existing flow.

Update: Parallel Agents and the Ecosystem (March 20, 2026)

Since publishing this post, three things changed.

Others built similar systems

Jim Prosser, a non-technical consultant from Marin County, built his version in 36 hours and wrote about it on Medium and LinkedIn. Anthropic published an official Claude Agent SDK cookbook using a Chief of Staff scenario. Someone on Reddit built a skill that separates planning from building.

I compared all of them with my implementation. The patterns are converging: everyone lands on some variant of dispatch/prep/yours/skip classification. The differences are in the data layer and safety model. Prosser stores everything in Todoist (third-party dependency), the Anthropic cookbook uses CSV files, the Reddit skill has no persistence at all. My system’s SQLite intermediate layer with content hash dedup, idempotent inserts, and audit trail classifications is the most robust of the four. What I was missing was parallel agent execution.

Subagents now run in parallel

I replaced the monolithic sweep prompt with an async Python orchestrator (collectors/orchestrator.py). Four domain-specific agents run concurrently with semaphore-based concurrency control:

Agent	Model	Budget	What it does
Calendar	Sonnet	$0.50	Meeting prep notes
Health	Sonnet	$0.50	Error analysis, fix direction
Task	Sonnet	$0.50	Task completion notes, research outlines
Feed	Sonnet	$0.50	Actionable feed summaries

Each agent gets its own prompt file, budget cap, timeout, and log file. If one agent fails, others still complete. Results are collected and imported to cos.db in a single transaction. The orchestrator supports --sequential mode for debugging and --dry-run for testing without running agents.

Email agent is classification-only

After testing, I decided not to auto-create Gmail drafts. Emails are still classified and shown in the Daily Note, but no agent touches Gmail. I handle email responses manually after reviewing the briefing. The email agent prompt exists in the codebase but is excluded from the orchestrator’s dispatch map. One line change to re-enable it.

The overnight pipeline now runs collect, classify, and render only. Sweep is triggered manually after reviewing the Daily Note. This matches how I actually work: I want to see the plan before anything fires.

Platform health monitoring across the stack

The health collector now monitors infrastructure beyond individual projects. Two platform-level scripts check Cloudflare and Coolify resources automatically:

Platform	What it monitors	Method
Cloudflare Workers	Error rates, invocation counts	GraphQL Analytics API
Cloudflare Pages	Deployment status	REST API
Coolify Apps	Container status (running/healthy)	REST API via Cloudflare Tunnel
Coolify Services	Service health	REST API via Cloudflare Tunnel
Coolify Databases	Database status	REST API via Cloudflare Tunnel

Each platform script outputs a JSON array. The health collector runs them alongside per-project health scripts and writes everything to cos.db. If a Worker’s error rate exceeds 10%, it shows as P1 in the Daily Note. If a Coolify container exits or becomes unhealthy, same treatment.

The architecture is extensible: adding a new platform (Vercel, Neon, Railway) means adding one script to PLATFORM_SCRIPTS and a config section. No changes to the collector or renderer needed.

Setup wizard

The project now includes an interactive setup wizard (setup_wizard.py). It reads config.example.toml as a template, walks through each section with sensible defaults, and generates config.toml. It also initializes the SQLite database and optionally installs the macOS launchd agent for overnight scheduling.

Every step is skippable. Optional integrations (Miniflux, Coolify, Cloudflare, healthchecks.io) are prompted separately. A --validate mode checks an existing config without interactive prompts. No external dependencies, stdlib only.

Closing

Chief of Staff isn’t the right solution for everyone. Agent frameworks or no-code platforms may be more suitable for many people. In my case, there’s no workflow between consulting and my own projects that reduces to a general routine. For routine situations, I already have different approaches that are sufficient. For specific situations, progressing case by case is more appropriate.

What this approach gives me is control. I can continuously review the flow, change it when needed, and avoid carrying the weight of unused features. A lightweight, purpose-built, portable system. When I sit down in the morning, I start making decisions instead of gathering information.

I’ll cover my other work in this area, the mini-services ecosystem, and the portable AI setup in future posts.

The project has been published as open source on GitHub ³.

Footnotes

dnomia-knowledge: Project-based semantic knowledge management MCP server ↩
mcp-code-search: Local semantic code search MCP server ↩
Chief of Staff GitHub repository ↩

Version Info

v0.2.0Repo →

Changelog

v0.2.02026-03-20Parallel sweep orchestrator, domain-specific subagents

v0.1.02026-01-03Initial release: Gmail, calendar, RSS and task collection pipeline

Key Takeaways

01 Automating information gathering accelerates the decision-making process
02 MCP connectors eliminate the need for OAuth setup or API key management
03 The system is model-agnostic: prompt files and SQLite make it portable to any LLM
04 Each layer works independently and delivers value on its own

Frequently Asked Questions (FAQ)

+ What is Chief of Staff?

A local-first AI assistant built for solo entrepreneurs. It collects Gmail, calendar, RSS feeds and Obsidian tasks overnight, classifies them, and presents a ready-made briefing each morning. Built on Claude Code, Python and SQLite, fully open source.

+ What data sources does it support?

Gmail and Google Calendar (via MCP connectors), Miniflux RSS feeds (REST API), Obsidian vault tasks (Python grep), and project health checks (customizable scripts).

+ Why not use OpenClaw or a similar framework?

Security, privacy, cost and portability constraints. Consulting and personal projects don't reduce to a generalizable routine. A lightweight, purpose-built solution is more optimized and controllable than a large framework with unused features.

+ Does it work with models other than Claude?

The system is designed to be model-agnostic. Prompt files and the SQLite layer can work with any LLM. Local models like Llama, Mistral or Qwen via Ollama are also viable options.

+ What does it cost per day?

Roughly $2-7 per day. Collection and classification run on Sonnet, the morning sweep runs on Opus. Budget caps keep each step's cost under control.

project