# Strategic Architecture & Systems Planning: PodcastIQ — Real-Time Research Co-Pilot for Podcast and Live Audio Hosts
**Principal Architect:** Michael Carter / Mobius Labs

---

## 1. Executive Summary & Scope of Vision

I designed PodcastIQ as a local-first, bring-your-own-keys research assistant for podcast hosts and live audio broadcasters (including Twitter/X Spaces and equivalent live formats). The system listens to a show in real time, transcribes the conversation with speaker diarization, and lets the host trigger purpose-built AI agents via hotkey to surface facts, counterarguments, follow-up questions, and guest research in the seconds between a claim being made and the moment a host has to respond to it.

The core mandate was to collapse the research gap that defines most long-form interview shows. Hosts either prepare exhaustively for every possible thread and still miss live claims, or they react on instinct and let falsehoods and weak premises stand unchallenged. PodcastIQ is the intervention layer that gives a host working at the pace of live conversation the same research depth they would have had with a producer and a three-day prep window.

The output of this concept is a running reference implementation scoped for Windows v0.1, an architecture designed to remain provider-agnostic across transcription, search, and LLM layers, a locked decision log, and a build pipeline that runs autonomously through a phased task plan.

---

## 2. Core Strategic Thesis

I established PodcastIQ around one central framing: live conversation is already an AI-native workload, and the host's cognitive load is the bottleneck. The incumbent tools in podcasting are production-centric. They optimize editing, publishing, and distribution. None of them help during the show itself, which is the moment where the quality of the conversation is actually determined.

The second thesis: hosts should not have to choose between control and capability. Cloud-native assistant products force a trade where sensitive pre-show research, private dossiers, and live transcript streams pass through platform-owned infrastructure. PodcastIQ is designed to run on the host's own machine, with the host's own API keys, and with every artifact (transcripts, dossiers, knowledge base) stored locally by default. The host keeps control of their research surface while still getting the capability of modern agent tooling.

The third thesis, implicit in the architecture: the interview is the knowledge artifact. Every episode the host runs should compound into a personal research asset, not evaporate into a publishing pipeline. PodcastIQ treats each session as an ingest event into a long-lived knowledge base the host owns.

---

## 3. Systems Planning & Methodologies

I designed PodcastIQ across five integrated layers, each scoped to run locally on a single host workstation.

**Capture Layer:** Continuous audio capture with voice-activity detection, segmenting the live stream into utterances before they are sent downstream. This layer is explicitly decoupled from the rest of the system so that transcription providers and local audio pipelines can be swapped without touching agent or display code.

**Transcription Layer:** Streaming speech-to-text with speaker diarization. The default is a cloud streaming provider chosen on the basis of latency and diarization quality, with the provider interface deliberately abstracted so alternative engines (including local models as they mature) can slot in without a rewrite.

**Agent Layer:** A micro-harness hosting four purpose-built agents — Fact-Check, Counterpoint, Follow-Up Question, and Guest Research — each triggered by a dedicated hotkey. Each agent is grounded: it receives a structured slice of recent transcript plus retrieval context from the knowledge base, issues search or retrieval tool calls as needed, and streams a constrained response back to the display surface within a locked latency budget. The harness is provider-agnostic across the underlying model so the same hotkey can be wired to a local model, a frontier cloud model, or a task-specific fast model depending on the host's preference.

**Knowledge Layer:** An embedded vector store that ingests show transcripts, host notes, and pre-show dossiers, and serves retrieval requests to the agent layer. Ingest is bi-directional: live sessions feed back into the KB so claims, guests, and recurring topics accumulate over time.

**Surface Layer:** Three coordinated display surfaces running off a single local server — a dashboard for configuration, key management, session launch, and KB browsing; a live session view with transcript plus agent panels for the host during the show; and an OBS browser source for hosts who stream, optimized for overlay composition. All three read from the same event stream so state stays consistent across surfaces.

The host triggers the system with a hotkey, not a prompt. The architecture exists to make the time between "the host notices something" and "the host sees a grounded answer" as close to conversational reflex as possible.

---

## 4. Research & Documentation Strategy

I developed the PodcastIQ specification through a structured research process covering: the competitive landscape of podcast production tooling (to confirm that the live-assistance surface is structurally underserved); the real-time transcription and diarization market (to identify latency-competitive options across cloud and local paths); the agent harness and local-LLM landscape (to validate that a provider-agnostic micro-harness could deliver grounded responses inside a live-show latency envelope); and the knowledge-retrieval ecosystem at the embedded end of the stack. The tech stack is locked against a decision log that records both the accepted path and the escape valves if any layer fails in validation.

The build itself is executed against a phased task plan with an explicit Day-1 smoke test as a hard gate before any downstream work begins. Phases are built, tested, and committed autonomously with human oversight at phase boundaries.

---

## 5. Visionary Concepts & Key Innovations

**Hotkey-Triggered Grounded Agents as a Live Interface:** The host's interaction surface is a keystroke, not a chat window. The agent returns a constrained, grounded response against the live transcript and the host's knowledge base. This collapses the interaction into something a host can actually use on-air, rather than a conversational assistant that would itself consume attention during the show.

**Local-First, BYOK Architecture:** The system is designed to run on the host's own machine with the host's own provider keys. Transcripts, dossiers, and knowledge base artifacts live locally by default. This is a structural stance, not a feature flag: the decision log treats cloud dependency as an escape valve, not the default.

**Provider-Agnostic Agent Harness:** The agent harness abstracts the underlying LLM, search provider, and transcription provider. A host can route any agent to any supported model or provider without a code change. This preserves optionality as the frontier-model and local-model landscape continues to shift, and keeps per-host cost profiles controllable.

**Episode-as-Knowledge-Asset Model:** Each session is an ingest event. The host's back catalogue of shows, guest research, and live claims compounds into a personal research asset they own, not a platform's data lake. The dossier builder and the knowledge base close this loop so that prep for future shows draws on the history of prior ones.

---

## 6. Summary of Strategic Impact

- **Live-assistance surface in a production-centric market:** PodcastIQ addresses the moment of the show itself, where no mature tooling currently operates. This is a defensible category framing rather than a feature extension of existing podcast platforms.
- **Sovereignty as product position:** Local-first storage and BYOK key management make the product credible to hosts who treat their pre-show research, guest relationships, and session archives as sensitive intellectual property.
- **Architecture designed to outlast the current model cycle:** Provider-agnostic agent and transcription layers mean that the system's value proposition does not depend on any single vendor's continued pricing, availability, or capability curve.

---

## 7. Current Status & Next Steps

Architecture and decision log are locked. The phased build pipeline is active: the agent harness, dossier builder, and knowledge base retrieval path are implemented, the live session and dashboard surfaces are scaffolded with their settings, dossier, KB browser, session rating, and agent composition views wired through, and OBS browser source rendering is in progress. The remaining work concentrates on observability, tone learning, error-recovery end-to-end testing, and the release gate, with a demo video scoped as part of the v0.1 Done criteria. macOS and Linux ports are explicitly deferred to post-release community contribution.

The project ships under AGPL-3.0 with a DCO sign-off workflow, which preserves a commercial dual-license pathway while allowing the reference implementation to remain open for the host community.
