Skip to content

Rolync217/chief

 
 

Repository files navigation

Chief - AI Chief of Staff for Live Conversations

Chief is a desktop overhearing agent for high-stakes calls. It listens locally, remembers prior conversations, decides when a single fact is worth surfacing, and can prepare background actions while the user keeps talking.

Built at the Call My Agent Hackathon at Y Combinator, May 2026.

The core design constraint is latency: nothing expensive sits between an utterance ending and the whisper card appearing. Memory retrieval, belief updates, claim extraction, and action planning all run in background threads so the decision path stays small.


Architecture

Electron mic capture
        |
        v
AudioWorklet VAD -> finalized utterance -> ElevenLabs Scribe
        |                                      |
        |                                      v
        |                              Python live loop
        |                                      |
        |                 +--------------------+--------------------+
        |                 |                    |                    |
        v                 v                    v                    v
 Belief updater     Moss prefetch       Decision agent       Action agent
  Claude Haiku       background          Claude Haiku         Claude Sonnet
        |                 |                    |                    |
        +-----------------+--------------------+                    |
                                      |                              |
                                      v                              v
                               Whisper card                  Action card / tool

Important implementation choices:

  • Belief updates are debounced so partial STT does not flood Claude.
  • Memory blocks use Anthropic prompt caching.
  • Moss and Supermemory never block the decision call.
  • The decision model has a hard timeout and fails silent.
  • Sonnet only runs off the hot path.
  • Irreversible actions, especially email sends and phone calls, require approval.

Why Chief is different

Most meeting AI falls into one of three buckets:

  • Meeting notepads help you remember what happened.
  • Live overlays give you a chat box, transcript, or suggestions while the call is happening.
  • Personal assistants / wrappers wait for an explicit instruction, then call tools.

Chief is trying to be a different category: a live chief of staff that knows when to interrupt and when to stay silent. The product is not "AI in a meeting." The product is the judgment layer between a live transcript and the user's attention.

That distinction matters because the failure mode is not just a bad answer. The failure mode is breaking the conversation at the wrong time. Chief is designed around the idea that silence is the default output, and a whisper only appears when one sentence would materially change the moment.

Compared with Granola

Granola is excellent framing for the "AI notepad" category: capture meetings, enhance notes, organize them, and chat with meeting history. Chief uses memory too, but the center of gravity is different.

Chief does not primarily produce a better artifact after the meeting. It uses memory before and during the meeting to catch live contradictions, surface exact prior commitments, and trigger follow-up work while the conversation is still happening.

Example:

Past call: "Marcus said runway was 18 months."
Live call: "We're tracking about 12 months of runway right now."
Chief: "He said 18 months runway on the April call. He just said 12."

The note is not the product. The timely intervention is the product.

Compared with Cluely

Cluely is useful framing for the "live AI overlay" category: real-time insights, transcript, suggested responses, fact checks, and Ask AI during a session. Chief overlaps at the surface level because it also listens live and can show a card.

The architectural difference is that Chief does not make the user drive the assistant mid-call. It does not rely on "ask a question" or "click a dynamic action" as the main interaction. It continuously prepares context in the background, then makes a cheap end-of-utterance decision: whisper or stay silent.

Chief is optimized for:

  • Intervention timing: act only after a finalized utterance, when the conversational pause can absorb a whisper.
  • Low cognitive load: one sentence, no chat thread, no menu of suggestions.
  • Action, not just coaching: draft the email, pull the doc, check the calendar, place a side call.
  • Memory-backed precision: cite the exact prior claim or commitment that matters now.

Compared with a generic assistant wrapper

A generic assistant can summarize a transcript or call a tool once prompted. Chief has to solve a harder streaming problem:

  1. The user may not know they need help yet.
  2. The useful fact may come from a previous conversation, not the current transcript.
  3. The assistant must not interrupt on every plausible hook.
  4. Slow tools and large models cannot sit on the whisper path.
  5. The system must know whether the user or counterparty made a commitment before acting.

That is why Chief is split into Ears, Memory, The Decision, and The Hands. The parts that are expensive or uncertain run continuously in the background. The part that touches the user's attention is tiny, fast, and conservative.


Actual features

  • Bot-free desktop listening: Chief runs as an Electron app, captures local microphone audio, and does not need to join the meeting as a participant.
  • Always-on-top whisper orb: a compact overlay appears only when Chief has something useful to say, then collapses back into a listening state.
  • Main Call dashboard: live transcript, active counterparty, hydrated memory count, retrieval status, and whisper history.
  • Side Call dashboard: shows outbound AgentPhone call state, transcript, and result while the main call continues.
  • Voice identity: local voiceprint enrollment labels utterances as user or counterparty, preventing Chief from acting on the wrong person's commitments.
  • Counterparty memory briefing: at conversation start, Chief resolves the counterparty and hydrates prior claims into a frozen briefing.
  • Memory-backed contradiction detection: seeded and Supermemory-backed facts are retrieved through Moss and injected into the decision context.
  • Conservative whisper policy: the decision agent returns only silent or one sentence of JSON-controlled whisper content.
  • Background action agent: email, calendar, document, search, and side-call tools run off-path so the whisper card is never blocked by real-world tool latency.
  • Approval hierarchy: low-risk actions can execute, drafts surface as cards, and irreversible actions such as sending email or placing a call require approval.
  • Demo fallback controls: global shortcuts can pause Chief, force a scripted utterance, open dashboards, or flush queued actions at call end.
  • Graceful degradation: missing sponsor keys do not break the demo; integrations fall back to static facts, cards, stubs, or simulated call state.

The hard part: live timing

The central technical problem is not transcription. It is deciding exactly when the assistant should spend the user's attention.

Chief treats a conversation as a stream of possible intervention moments. Every finalized utterance is a candidate, but almost all candidates should resolve to silence. To make that work, the system separates background preparation from the hot decision:

  • Before the utterance ends, belief updates and memory retrieval are already being prepared.
  • When the utterance ends, the decision agent does only a cheap yes/no call over current state.
  • After the decision, slow actions can run in the background and surface cards later.

This is why the architecture avoids the tempting simple version: send the whole transcript to a large model after every sentence and ask "what should I do?" That version is too slow, too expensive, and too likely to speak when it should not.

Chief instead uses these constraints:

  • End-of-utterance is the trigger: the whisper decision fires only when a turn is complete, not on every token or audio frame.
  • Background state is always warm: belief, retrieval, and claim extraction run separately so the decision call reads state rather than building it.
  • Frozen memory during a call: the relevant counterparty facts are snapshotted at conversation start so mid-call writes cannot change what the decision agent sees.
  • Small model on the hot path: Haiku handles the timing-sensitive decision; Sonnet is reserved for background actions.
  • Hard timeout and silent failure: if the model or network is late, Chief stays silent unless the deterministic demo fallback is confident.
  • One-sentence output budget: the UI cannot become another chat app. The card has room for one useful intervention.

The core evaluation question is: would a highly competent chief of staff slide a post-it note across the table right now? If not, Chief should do nothing.


Demo beats

The repo is tuned around a live hackathon demo with visible, legible moments:

  1. Conversation starts: Chief resolves the counterparty, hydrates prior memory, and can brief the user before the call gets moving.
  2. Contradiction appears: the counterparty says, "We're tracking about 12 months of runway right now." Chief remembers that Marcus said 18 months on the April call and whispers the discrepancy.
  3. User commits to follow up: if the user says they will send a cap table, deck, recap, or other written artifact, Chief prepares an AgentMail-backed draft card.
  4. User needs live help from someone else: if the user says they should call someone for a quick answer, Chief prepares an AgentPhone side call, shows the Side Call dashboard, and can report back while the primary call continues.
  5. Call ends: queued cards can be flushed so the user reviews pending drafts and actions before leaving the flow.

The demo is designed so silence also reads correctly. If there is no specific memory, contradiction, commitment, or tool trigger, Chief should visibly keep listening without doing anything.


What this repo covers

Desktop App

The frontend is an Electron app with three surfaces:

  • A frameless always-on-top orb that shows whispers and action cards.
  • A Main Call dashboard with live transcript, memory status, and whispers.
  • A Side Call dashboard for outbound AgentPhone calls placed while the primary call continues.

Electron owns mic capture, global shortcuts, tray state, and the local WebSocket server on :8765. Python is spawned as the backend worker and sends UI events back through the bridge.

Relevant files:

  • electron/main.js: app lifecycle, WebSocket server, tray, shortcuts, Python process, orb/dashboard windows.
  • electron/orb.js: mic capture, VAD, whisper/action card rendering, approval buttons.
  • electron/MainCallPanel.jsx and electron/LiveCallPanel.jsx: primary call and side-call dashboards.
  • backend/bridge.py: persistent Python-to-Electron event bridge.

Ears

The renderer captures microphone audio with getUserMedia, runs a simple energy-based VAD in an AudioWorklet, and ships completed utterances to Python as base64 PCM16. The backend wraps each utterance as WAV and transcribes it with ElevenLabs Scribe.

Voice identity is handled locally with a lightweight NumPy voiceprint. After enrollment, each utterance is labeled as the user, counterparty, or unknown so the action agent knows whose commitments it is allowed to act on.

There is also a typed/demo fallback path: Alt+Space injects a scripted utterance directly into the pipeline, which keeps the demo recoverable if live audio or STT fails.

The current live path is utterance-level rather than continuous partial STT: VAD decides when a turn is done, then Scribe transcribes that clip. Typed partial/final hooks still exist so the decision loop can be tested without microphone hardware.

Relevant files:

  • electron/audio-worklet.js: 100 ms audio frames and RMS calculation.
  • electron/orb.js: utterance buffering, silence thresholding, PCM16 conversion.
  • backend/ears/stt_elevenlabs.py: Scribe wrapper.
  • backend/ears/hooks.py: filters noise, short clips, and sound-effect annotations before routing to the decision loop.
  • backend/voice_id.py: local voice enrollment and speaker classification.

Memory

Chief uses two memory layers:

  • Static seeded past-call facts for the demo contradiction set.
  • Supermemory-backed durable claims and profiles for longitudinal memory across sessions.

At conversation start, the backend resolves the counterparty, hydrates their claims from Supermemory, and freezes a briefing snapshot. The decision agent reads that immutable snapshot during the call. New claims are extracted off-path and written after the conversation ends, which avoids poisoning the hot path with mid-call writes.

Moss provides low-latency semantic retrieval over the past-call store. Retrieval is prefetched in the background on utterances and stored in shared state. The decision call reads the latest available facts without waiting on a network query.

The memory system is deliberately read-live / write-after:

  • Conversation start hydrates a frozen briefing for the current counterparty.
  • The hot path reads only frozen claims or the latest prefetched retrieval result.
  • Claim extraction runs after utterances and buffers structured claims.
  • Conversation end consolidates buffered claims and a session summary back to Supermemory.

When Moss or Supermemory credentials are absent, Chief falls back to static seeded facts so the contradiction demo and tests remain deterministic.

Relevant files:

  • backend/memory/past_calls.py: seeded demo facts.
  • backend/memory/retrieval.py: Moss index creation, background prefetch, static fallback.
  • backend/memory/conversation.py: counterparty resolution, frozen briefing, start/end lifecycle.
  • backend/memory/claims.py: claim model, alias resolution, claim store.
  • backend/memory/extractor.py: off-path claim extraction.
  • backend/memory/supermemory_store.py: durable Supermemory facade with timeouts and sentinels.
  • backend/memory/cache.py: prompt-cached memory block construction for the decision agent.

The Decision

The decision agent is the hot path. It runs Claude Haiku with a low token budget and a hard timeout, reading:

  • The current belief paragraph.
  • The frozen memory snapshot or latest Moss facts.
  • The new finalized utterance.

It returns JSON only: either silent or a one-sentence whisper. Failures degrade to silence unless a narrow deterministic fallback catches the known demo contradiction.

The intended behavior is conservative: Chief should be silent most of the time and only interrupt for specific, actionable facts or contradictions.

The hot-path order is:

  1. Read latest memory facts from in-process state.
  2. Read the current belief paragraph.
  3. Call Haiku with max_tokens=80.
  4. Parse tolerant JSON.
  5. Emit a whisper only when the model returns {"action": "whisper"}.
  6. On timeout or model error, consult a narrow fallback, otherwise stay silent.

Relevant files:

  • backend/decision/loop.py: orchestrates background tasks and the decision call for each finalized utterance.
  • backend/decision/agent.py: Haiku call, timeout, JSON parsing, fallback behavior.
  • backend/decision/prompt.py: conservative whisper policy.
  • backend/decision/fallback.py: deterministic fallback for the scripted contradiction.
  • backend/decision/trace.py: decision trace logging for tuning.

The Hands

The action layer runs after each finalized utterance, but never blocks the whisper path. A Claude Sonnet agent chooses at most one tool call for the user's own commitments:

  • Draft or send email.
  • Check or block calendar time.
  • Pull a document.
  • Place an outbound side call.

Actions follow an approval hierarchy. Auto actions can execute immediately, draft actions surface a card, and irreversible actions require explicit approval. When real credentials are missing, most integrations fall back to card-only behavior so the demo still runs.

There is also a fast fact-check path: Haiku generates a web query, Brave Search returns snippets, and Haiku decides whether the result is worth whispering.

The action tools are intentionally scoped:

  • draft_email and send_email: AgentMail-backed, with send gated by approval.
  • check_calendar and block_calendar: Google Calendar-backed when OAuth is configured, otherwise stubbed.
  • pull_document: Google Drive search when OAuth is configured, otherwise stubbed.
  • call_person: AgentPhone-backed side call, always approval-gated.

The action agent receives explicit speaker context. If the counterparty says "I'll send it," Chief does not draft an email from the user's account. If the user says "I'll send it," Chief can prepare the draft card.

Relevant files:

  • backend/action/agent.py: Sonnet tool routing, approval behavior, fact-check path.
  • backend/action/email_agentmail.py: AgentMail draft/send facade.
  • backend/action/phone_agentphone.py: AgentPhone side-call facade and call-state tracking.
  • backend/action/calendar_google.py: Google Calendar check/block facade.
  • backend/action/drive_google.py: Google Drive document search facade.
  • backend/action/search.py: Brave Search and Browser Use deep-search helpers.
  • backend/action/hangup.py: queued action flush at call end.

Sponsor tracks used

Sponsor track Where it is used How Chief uses it
Moss Memory retrieval backend/memory/retrieval.py creates a past_calls index from seeded call facts, loads it at app boot, and queries it with top_k=2 / alpha=0.6. Retrieval is prefetched in a daemon thread and exposed through get_latest_facts() so the decision agent never waits on Moss. If the SDK or keys are missing, the same API returns the static fact set.
AgentMail Hands / email backend/action/email_agentmail.py is the real side-effect layer behind draft_email and send_email. It lazily resolves an inbox, creates drafts in the background, and only sends after an approval event from the orb. Every network call is timeout-wrapped; missing keys fall back to card-only behavior.
Browser Use Hands / web automation backend/action/search.py includes search_deep() for Browser Use cloud sessions. This is the slower browser-automation track for structured web workflows such as CRM updates, company research, or pulling information from pages that require navigation. It is intentionally kept off the decision path because runs can take 5-15 seconds.
AgentPhone Hands / outbound side calls backend/action/phone_agentphone.py powers call_person. It resolves or creates a Chief phone agent, optionally pins an existing AGENTPHONE_AGENT_ID, places approved outbound side calls, streams call status to the Side Call dashboard, fetches transcripts, and whispers a short post-call summary. Without a key, it runs a simulated call-state flow for demos.
Supermemory Long-term memory backend/memory/supermemory_store.py stores claims, profiles, and session summaries with Chief-specific tags. conversation.py hydrates claims at conversation start into an immutable briefing, and flushes extracted claims back at conversation end. Supermemory is never called directly on the decision hot path.

Stack

Layer Technology
Desktop shell Electron
Backend loop Python, WebSockets, threads
Speech-to-text ElevenLabs Scribe
Voice ID NumPy voiceprint
Belief updater Claude Haiku
Decision agent Claude Haiku
Action agent Claude Sonnet
Semantic retrieval Moss
Durable memory Supermemory
Email AgentMail
Web automation Browser Use
Outbound calls AgentPhone
Calendar / Drive Google APIs

Getting started

git clone https://github.com/bliu8/overhearing-agents
cd overhearing-agents
cp .env.example .env
npm install
pip install -r requirements.txt
npm run dev

Fill in the API keys you want to exercise in .env. The app is built to degrade gracefully: missing optional keys disable the real side effect but keep the UI and demo path working.


Tests

pytest

The test suite covers decision behavior, memory retrieval and caching, Supermemory persistence adapters, action-tool facades, Google Calendar / Drive fallbacks, conversation priming, and hang-up flushing.


License

MIT

About

An overhearing agent that proactively acts for you. Integrated with AgentPhone, AgentMail, Browser Use, Google Calendar, and Supermemory. Won third overall out of 100+ teams at the YC Call My Agent Hackathon.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 70.1%
  • JavaScript 23.1%
  • CSS 4.6%
  • HTML 2.2%