{"id":333,"date":"2026-05-13T15:40:05","date_gmt":"2026-05-13T23:40:05","guid":{"rendered":"https:\/\/rainier-it.com\/blog\/?p=333"},"modified":"2026-05-13T15:40:14","modified_gmt":"2026-05-13T23:40:14","slug":"inside-percival-mcp-architecture","status":"publish","type":"post","link":"https:\/\/rainier-it.com\/blog\/inside-percival-mcp-architecture\/","title":{"rendered":"Inside Percival: 28 MCP Servers, 213 Tools, and 7 Workers Powering Our In-House AI Operator"},"content":{"rendered":"\n<p>Most &#8220;AI features&#8221; you&#8217;ll see an MSP ship in 2026 are a chat-bubble around someone else&#8217;s hosted LLM. They look nice in a demo. They work great for &#8220;what&#8217;s the weather in Tacoma.&#8221; They&#8217;re useless the moment a customer asks &#8220;can you restart the docker container on the front-desk laptop&#8221; or &#8220;which of my backups failed last night.&#8221;<\/p>\n\n\n\n<p>Percival \u2014 the agent we run at <a href=\"https:\/\/agent.rainier-it.com\">agent.rainier-it.com<\/a> \u2014 is the opposite of that. It&#8217;s a self-hosted AI operator built around the <a href=\"https:\/\/modelcontextprotocol.io\/\" target=\"_blank\" rel=\"noopener\">Model Context Protocol<\/a>, wired into 28 separate tool servers, and tied to every system that runs Rainier IT. It opens tickets, restarts containers, checks backups, runs SSH commands on allowlisted hosts, drafts incident summaries to Discord, and writes its weekly digest itself. It costs about $40 a month to operate. And it does it all on a 4-core LXC with 4 GB of RAM.<\/p>\n\n\n\n<p>This post is the architectural tour I wish someone had written for me before I started building it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">\ud83e\udd16 Inside Percival<\/h1>\n\n\n\n<p>Percival is named after the Arthurian knight who went looking for the Grail and came back with answers. Ours just goes looking for &#8220;which of these 47 things is on fire right now.&#8221; Same vibe.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83c\udfd7\ufe0f The stack at a glance<\/h2>\n\n\n\n<p>The whole thing is one Python FastAPI app, one Postgres database, and a small herd of subprocess MCP servers. No vector databases other than pgvector. No LangChain. No agent framework. Vanilla JS frontend with no build pipeline.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Layer<\/th><th>What<\/th><\/tr><\/thead><tbody><tr><td>Frontend<\/td><td>Vanilla JS + WebSocket chat UI<\/td><\/tr><tr><td>Backend<\/td><td>FastAPI + asyncio + Anthropic SDK<\/td><\/tr><tr><td>Tool runtime<\/td><td>28 MCP servers, stdio-spawned per session<\/td><\/tr><tr><td>LLM proxy<\/td><td>LiteLLM on odus:4000 \u2192 Anthropic + Ollama<\/td><\/tr><tr><td>Local LLM<\/td><td>Ollama on wodin GPU host (GTX 1660 SUPER)<\/td><\/tr><tr><td>Memory<\/td><td>pgvector in Postgres, 768-dim, <code>nomic-embed-text<\/code><\/td><\/tr><tr><td>Background workers<\/td><td>7 asyncio tasks under one scheduler<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Total host: one Proxmox LXC, 4 GB RAM, 40 GB disk, Ubuntu 24.04. The GPU box that does embeddings is a separate machine that idles most of the day.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd0c The 28 MCP Servers (213 Tools)<\/h2>\n\n\n\n<p>This is the part of the architecture that earns its keep. Every external system Percival talks to gets its own MCP server: a small Python process that exposes typed tools over stdio. The agent spawns whichever it needs per session, and tears them down when the session ends.<\/p>\n\n\n\n<p>Here&#8217;s the full fleet, grouped by function. Tool counts are real \u2014 I just grepped them out of the source.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Infrastructure &amp; ops (7 servers, 80 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>proxmox<\/code><\/td><td style=\"white-space:nowrap\"><strong>24<\/strong><\/td><td>VMs, LXCs, nodes, snapshots, create\/destroy\/start\/stop<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>pbs<\/code><\/td><td style=\"white-space:nowrap\"><strong>16<\/strong><\/td><td>Proxmox Backup Server \u2014 datastores, snapshots, verify\/prune\/GC, restore-list<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>docker<\/code><\/td><td>9<\/td><td>Fleet container ops across 6 hosts (allowlisted, SSH key injection)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>nginx<\/code><\/td><td>9<\/td><td>Site configs, cert status, reload on spark<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>monitoring<\/code><\/td><td>8<\/td><td>Prometheus, Loki, Grafana composite reads<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>logs<\/code><\/td><td>5<\/td><td>Tail nginx\/docker\/syslog via SSH<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>ssh-exec<\/code><\/td><td>3<\/td><td>Raw shell on allowlisted hosts (approval-gated)<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This is the bulk of the day-to-day load. The <code>proxmox<\/code> and <code>pbs<\/code> servers alone are why I built Percival in the first place \u2014 every other monitoring dashboard knows <em>something<\/em> about backups, but only PBS knows whether the snapshot from last night actually verified.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">RMM, security, and patching (4 servers, 38 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>huntress<\/code><\/td><td style=\"white-space:nowrap\"><strong>14<\/strong><\/td><td>EDR agents, incidents, malware, isolation actions<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>action1<\/code><\/td><td style=\"white-space:nowrap\"><strong>10<\/strong><\/td><td>Patch compliance, endpoint inventory, scripts (OAuth2)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>ansible<\/code><\/td><td>8<\/td><td>Run playbooks against inventory groups<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>trmm<\/code><\/td><td>6<\/td><td>TacticalRMM agents, alerts, commands<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Huntress + Action1 + TacticalRMM is the same stack we deploy at every client. Having all three in one chat window \u2014 <em>&#8220;what&#8217;s currently isolated, what&#8217;s behind on patches, and what just paged&#8221;<\/em> \u2014 is genuinely surprising the first time you use it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Identity, comms, and tickets (6 servers, 33 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>nextcloud<\/code><\/td><td>5<\/td><td>WebDAV search + OCS user management<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>authentik<\/code><\/td><td>6<\/td><td>Users, groups, sessions, audit events<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>email<\/code><\/td><td>4<\/td><td>Gmail IMAP\/SMTP<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>discord<\/code><\/td><td>4<\/td><td>Channels, search, post, attachments<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>zammad<\/code><\/td><td>6<\/td><td>Ticket list, comment, state changes<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>onboarding<\/code><\/td><td>6<\/td><td>Multi-service client provisioning<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The <code>onboarding<\/code> server is special \u2014 it&#8217;s a <em>composite<\/em> MCP that uses 5 others (Authentik, Email, Nextcloud, Invoice Ninja, TRMM) to spin up a new MSP client end-to-end. One tool call from chat, one workflow logged to Postgres, one branded welcome email at the end.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Business apps &amp; content (5 servers, 36 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>wordpress<\/code><\/td><td>9<\/td><td>Posts, pages, comments, media (used to publish this post, actually)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>gdrive<\/code><\/td><td>9<\/td><td>Drive file ops (OAuth2)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>wikijs<\/code><\/td><td>6<\/td><td>Dual endpoint \u2014 internal wiki + client KB (GraphQL)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>invoiceninja<\/code><\/td><td>5<\/td><td>Invoices and clients<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>cloudflare<\/code><\/td><td>8<\/td><td>Zones, DNS, cache purge, firewall rules<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">DevOps &amp; observability (4 servers, 18 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>jenkins<\/code><\/td><td>6<\/td><td>Jobs, builds, queue<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>grafana<\/code><\/td><td>7<\/td><td>Dashboards, datasources, alerts<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>uptime-kuma<\/code><\/td><td>4<\/td><td>Monitor status (via SQLite \u2014 Kuma&#8217;s REST is socket.io-only)<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>litellm-stats<\/code><\/td><td>2<\/td><td>LLM spend telemetry<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The Uptime Kuma integration is an honest hack. Kuma&#8217;s &#8220;API&#8221; is a socket.io endpoint that won&#8217;t talk to anything that isn&#8217;t its own dashboard. We SSH into the Kuma container, query its SQLite directly, and call it a day. It works.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Plumbing (2 servers, 8 tools)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Server<\/th><th>Tools<\/th><th>Purpose<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>memory<\/code><\/td><td>9<\/td><td>Semantic search (pgvector + Ollama embed) over long-term memory<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>net<\/code><\/td><td>5<\/td><td>ping, nslookup, iperf, traceroute<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Total: 28 MCP servers, 213 tools.<\/strong> Plus a <code>shared\/<\/code> directory of base classes (not a server, just a library) that handles things like API-token auth and rate-limiting.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde9 What an MCP server actually looks like<\/h2>\n\n\n\n<p>Every server follows the same shape. Here&#8217;s an abridged version of the structure (real example, not pseudocode):<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># percival\/mcp-servers\/zammad\/server.py\nimport asyncio, os\nfrom mcp.server import Server\nfrom mcp.server.stdio import stdio_server\nimport mcp.types as types\n\nserver = Server(\"zammad\")\n\n@server.list_tools()\nasync def list_tools() -> list&#91;types.Tool&#93;:\n    return [\n        types.Tool(\n            name=\"list_tickets\",\n            description=\"List Zammad tickets, optionally filtered by state or group.\",\n            inputSchema={\n                \"type\": \"object\",\n                \"properties\": {\n                    \"state\":  {\"type\": \"string\", \"enum\": &#91;\"open\", \"pending\", \"closed\"&#93;},\n                    \"group\":  {\"type\": \"string\"},\n                    \"limit\":  {\"type\": \"integer\", \"default\": 25},\n                },\n            },\n        ),\n        # ...5 more tools...\n    ]\n\n@server.call_tool()\nasync def call_tool(name: str, arguments: dict) -> list&#91;types.TextContent&#93;:\n    if name == \"list_tickets\":\n        result = await zammad_get(\"\/api\/v1\/tickets\/search\", params=arguments)\n        return &#91;types.TextContent(type=\"text\", text=json.dumps(result, indent=2))&#93;\n    raise ValueError(f\"Unknown tool: {name}\")\n\nasync def main():\n    async with stdio_server() as (r, w):\n        await server.run(r, w, server.create_initialization_options())\n\nif __name__ == \"__main__\":\n    asyncio.run(main())<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># percival\/mcp-servers\/zammad\/server.py<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> asyncio<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> os<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> mcp<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">server <\/span><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> Server<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> mcp<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">server<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">stdio <\/span><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> stdio_server<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> mcp<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">types <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> types<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">server <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">Server<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">zammad<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">@<\/span><span style=\"color: #D08770\">server<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D08770\">list_tools<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">list_tools<\/span><span style=\"color: #ECEFF4\">()<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">-&gt;<\/span><span style=\"color: #D8DEE9FF\"> list<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #D8DEE9FF\">types<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">Tool<\/span><span style=\"color: #ECEFF4\">&#93;:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">[<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        types<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">Tool<\/span><span style=\"color: #ECEFF4\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #D8DEE9\">name<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">list_tickets<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #D8DEE9\">description<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">List Zammad tickets, optionally filtered by state or group.<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #D8DEE9\">inputSchema<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">object<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">properties<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">state<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">  <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">string<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">enum<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">open<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">pending<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">closed<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">&#93;},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">group<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">  <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">string<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">limit<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">  <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">integer<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">default<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">25<\/span><span style=\"color: #ECEFF4\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">                <\/span><span style=\"color: #ECEFF4\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #ECEFF4\">},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #ECEFF4\">),<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #616E88\"># ...5 more tools...<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">@<\/span><span style=\"color: #D08770\">server<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D08770\">call_tool<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">call_tool<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">name<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">str<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">arguments<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">dict<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">-&gt;<\/span><span style=\"color: #D8DEE9FF\"> list<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #D8DEE9FF\">types<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">TextContent<\/span><span style=\"color: #ECEFF4\">&#93;:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> name <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">list_tickets<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        result <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">zammad_get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">\/api\/v1\/tickets\/search<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">params<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">arguments<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #D8DEE9FF\">types<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">TextContent<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">type<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">text<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">text<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">json<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">dumps<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">result<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">indent<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #B48EAD\">2<\/span><span style=\"color: #ECEFF4\">))&#93;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">raise<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #8FBCBB\">ValueError<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Unknown tool: <\/span><span style=\"color: #EBCB8B\">{<\/span><span style=\"color: #D8DEE9FF\">name<\/span><span style=\"color: #EBCB8B\">}<\/span><span style=\"color: #A3BE8C\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">main<\/span><span style=\"color: #ECEFF4\">():<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">stdio_server<\/span><span style=\"color: #ECEFF4\">()<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">r<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> w<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> server<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">run<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">r<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> w<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> server<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">create_initialization_options<\/span><span style=\"color: #ECEFF4\">())<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> __name__ <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">__main__<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    asyncio<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">run<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #88C0D0\">main<\/span><span style=\"color: #ECEFF4\">())<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>The agent talks to all 28 of these via JSON-RPC over stdio. Each server is its own subprocess, its own venv, its own credentials. If one of them crashes \u2014 say, the Zammad container goes down \u2014 none of the other 27 notice. That fault-isolation is the single biggest reason MCP turned out to be a better abstraction than a monolithic &#8220;tools&#8221; module.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\u23f0 The 7 Background Workers<\/h2>\n\n\n\n<p>The MCP servers handle the <em>interactive<\/em> path \u2014 chat, tool use, the back-and-forth. Everything <em>autonomous<\/em> runs in 7 background workers under one asyncio scheduler. They all share the same FastAPI process, the same Postgres pool, and the same MCP client.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Worker<\/th><th>Cadence<\/th><th>What it does<\/th><\/tr><\/thead><tbody><tr><td style=\"white-space:nowrap\"><code>heartbeat<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 60s<\/strong><\/td><td>Writes a liveness marker so I can tell the agent process is alive from the outside. Logs only \u2014 no side effects.<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>trmm_poller<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 5min<\/strong><\/td><td>Pulls <code>\/alerts\/<\/code> from TacticalRMM, dedupes against the <code>events<\/code> table, writes new ones with severity mapping (<code>error\u2192critical<\/code>, <code>warning\u2192warning<\/code>).<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>uptime_poller<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 2min<\/strong><\/td><td>Queries Uptime Kuma&#8217;s SQLite via SSH; tracks monitor state in <code>SystemState<\/code>; writes monitor-down events.<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>discord_notifier<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 30s<\/strong><\/td><td>Watches <code>events<\/code> for new critical\/warning rows, posts a Discord webhook embed each. Tracks a high-water mark so restarts don&#8217;t double-send.<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>incident_responder<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 45s<\/strong><\/td><td>On critical events, gathers the \u00b130-minute context window, queries Kuma + Zammad + recent fixes via MCP, calls Haiku for a remediation summary, posts a rich Discord embed with a deep-link button to the chat UI (&#8220;Resolve via Percival&#8221;).<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>haiku_monitor<\/code><\/td><td style=\"white-space:nowrap\"><strong>every 1hr<\/strong><\/td><td>Polls LiteLLM&#8217;s spend API for <code>haiku-last-resort<\/code> invocations. If the counter went up, primary LLMs are down and Percival fell back. Posts an alert. Critically: this worker <strong>does not use the LLM<\/strong> \u2014 it&#8217;s the canary that watches for LLM failure, so it can&#8217;t depend on LLM.<\/td><\/tr><tr><td style=\"white-space:nowrap\"><code>weekly_digest<\/code><\/td><td style=\"white-space:nowrap\"><strong>Mon 09:00 PT<\/strong><\/td><td>Queries Kuma + Zammad + Invoice Ninja + nginx certs + docker fleet via MCP, hands the JSON to Haiku for executive-summary formatting, posts a Monday-morning Discord embed.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The <code>incident_responder<\/code> is my favorite. When something goes critical at 2 AM, you get a Discord push that already has:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The alert (from TRMM, Kuma, or Defender)<\/li>\n\n\n\n<li>Three to five related events from the same \u00b130 min window<\/li>\n\n\n\n<li>The current monitor status<\/li>\n\n\n\n<li>A Haiku-drafted 2-paragraph &#8220;here&#8217;s what we think happened and here&#8217;s the next move&#8221;<\/li>\n\n\n\n<li>A button: <strong>Resolve via Percival<\/strong> that opens the chat UI pre-loaded with the incident context<\/li>\n<\/ul>\n\n\n\n<p>If I&#8217;m asleep, the on-call human gets the same embed plus the button. If I&#8217;m awake, I open the chat and it&#8217;s already loaded the context \u2014 I&#8217;m one prompt away from &#8220;OK, restart that container.&#8221; That&#8217;s the agent paying for itself.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83e\udde0 The Model Story<\/h2>\n\n\n\n<p>This is where most &#8220;AI agent&#8221; architectures get expensive, slow, or both. Here&#8217;s how we keep all three in check.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Primary chat: <strong>Anthropic Haiku<\/strong> via LiteLLM<\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># backend\/app\/agent\/core.py\nfrom openai import AsyncOpenAI\n\nclient = AsyncOpenAI(\n    base_url=settings.llm_base_url,     # (LiteLLM)\n    api_key=settings.anthropic_api_key, # legacy name \u2014 doubles as the LiteLLM master key\n)\n\nMODEL_HEAVY = settings.model_heavy      # \"claude-haiku\"      (in production)\nMODEL_LIGHT = settings.model_light      # \"local-fast\"        (qwen2.5:7b)\n\nasync def chat_turn(messages, tools):\n    response = await client.chat.completions.create(\n        model=MODEL_HEAVY,\n        messages=messages,\n        tools=tools,\n        max_tokens=4096,\n    )\n    return response.choices&#91;0&#93;.message<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># backend\/app\/agent\/core.py<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> openai <\/span><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> AsyncOpenAI<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">client <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">AsyncOpenAI<\/span><span style=\"color: #ECEFF4\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #D8DEE9\">base_url<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">settings<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">llm_base_url<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\">     <\/span><span style=\"color: #616E88\"># (LiteLLM)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #D8DEE9\">api_key<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">settings<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">anthropic_api_key<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #616E88\"># legacy name \u2014 doubles as the LiteLLM master key<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">MODEL_HEAVY <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> settings<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">model_heavy      <\/span><span style=\"color: #616E88\"># &quot;claude-haiku&quot;      (in production)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">MODEL_LIGHT <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> settings<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">model_light      <\/span><span style=\"color: #616E88\"># &quot;local-fast&quot;        (qwen2.5:7b)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">chat_turn<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">messages<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">tools<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> client<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">chat<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">completions<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">create<\/span><span style=\"color: #ECEFF4\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #D8DEE9\">model<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">MODEL_HEAVY<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #D8DEE9\">messages<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">messages<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #D8DEE9\">tools<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">tools<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #D8DEE9\">max_tokens<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #B48EAD\">4096<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">choices<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #B48EAD\">0<\/span><span style=\"color: #ECEFF4\">&#93;.<\/span><span style=\"color: #D8DEE9FF\">message<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Every interactive turn hits Claude <strong>Haiku<\/strong> by default. It&#8217;s fast enough to keep the chat UI feeling instant, smart enough to drive 200+ tool calls reliably, and cheap enough that the average user-session costs cents. We retired Opus from the chat path in April \u2014 at our usage shape (lots of small tool calls, not long-form reasoning) it was paying 20\u00d7 for noticeably-but-not-dramatically better output.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">LiteLLM proxy on odus:4000<\/h3>\n\n\n\n<p>Everything LLM-shaped goes through one <a href=\"https:\/\/github.com\/BerriAI\/litellm\" target=\"_blank\" rel=\"noopener\">LiteLLM<\/a> instance on odus. It does four jobs:<\/p>\n\n\n\n<p>1. <strong>OpenAI-compatible API in front of every model<\/strong> (Anthropic, Ollama, Groq) so the agent code is one shape. 2. <strong>Per-key budgets and rate-limits<\/strong> \u2014 Percival&#8217;s key has its own monthly cap separate from the chatbot&#8217;s. 3. <strong>Spend telemetry<\/strong> \u2014 the <code>litellm-stats<\/code> MCP server and the <code>haiku_monitor<\/code> worker both read from this. 4. <strong>Routing fallbacks<\/strong> \u2014 if Anthropic returns 5xx, LiteLLM auto-routes to a <code>haiku-last-resort<\/code> model that lives on the same Anthropic endpoint but with a different alias, so spend telemetry can prove fallback occurred.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Local fallback: <strong>Ollama on wodin<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># Ubuntu 24.04 VM, GTX 1660 SUPER passthrough\n$ curl &lt;dedicated VM > | jq '.models[].name'\n\"qwen2.5:7b\"           # MODEL_LIGHT for low-stakes drafts\n\"llama3.2:3b\"          # fast token generation, low quality\n\"nomic-embed-text\"     # 768-dim embeddings for memory<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># Ubuntu 24.04 VM, GTX 1660 SUPER passthrough<\/span><\/span>\n<span class=\"line\"><span style=\"color: #88C0D0\">$<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #A3BE8C\">curl<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">&lt;<\/span><span style=\"color: #A3BE8C\">dedicated<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #A3BE8C\">VM<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">&gt;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">|<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">jq<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">.models[].name<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #88C0D0\">&quot;qwen2.5:7b&quot;<\/span><span style=\"color: #D8DEE9FF\">           <\/span><span style=\"color: #616E88\"># MODEL_LIGHT for low-stakes drafts<\/span><\/span>\n<span class=\"line\"><span style=\"color: #88C0D0\">&quot;llama3.2:3b&quot;<\/span><span style=\"color: #D8DEE9FF\">          <\/span><span style=\"color: #616E88\"># fast token generation, low quality<\/span><\/span>\n<span class=\"line\"><span style=\"color: #88C0D0\">&quot;nomic-embed-text&quot;<\/span><span style=\"color: #D8DEE9FF\">     <\/span><span style=\"color: #616E88\"># 768-dim embeddings for memory<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>The honest story on local LLMs in 2026: a GTX 1660 SUPER is not enough hardware to replace Anthropic on the chat path. We tried. <code>qwen2.5:7b<\/code> is shockingly good for its size, but it&#8217;s noticeably worse at long-context tool-use loops, and the token-per-second on a 6 GB card is half what you need for a conversational feel.<\/p>\n\n\n\n<p>Where local LLMs <em>do<\/em> earn their keep:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Embeddings<\/strong> for the memory store (<code>nomic-embed-text<\/code>, hits the GPU for milliseconds, free)<\/li>\n\n\n\n<li><strong>Background summarization<\/strong> when latency doesn&#8217;t matter<\/li>\n\n\n\n<li><strong>The emergency fallback path<\/strong> if Anthropic is down for an extended period<\/li>\n<\/ul>\n\n\n\n<p>We&#8217;ll revisit primary-on-local when the GPU upgrade happens. The architecture&#8217;s already there.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Memory: <strong>pgvector + 768-dim embeddings<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># backend\/app\/agent\/memory.py\nasync def embed(text: str) -> list&#91;float&#93; | None:\n    try:\n        resp = await client.embeddings.create(\n            model=\"local-embed\",       # \u2192 wodin nomic-embed-text\n            input=text,\n        )\n        return resp.data&#91;0&#93;.embedding\n    except Exception:\n        return None                    # silent fail \u2014 chat continues without RAG<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># backend\/app\/agent\/memory.py<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">async<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">embed<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">text<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">str<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">-&gt;<\/span><span style=\"color: #D8DEE9FF\"> list<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #88C0D0\">float<\/span><span style=\"color: #ECEFF4\">&#93;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">|<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">None<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">try<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        resp <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> client<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">embeddings<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">create<\/span><span style=\"color: #ECEFF4\">(<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #D8DEE9\">model<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">local-embed<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\">       <\/span><span style=\"color: #616E88\"># \u2192 wodin nomic-embed-text<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #D8DEE9\">input<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">text<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> resp<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">data<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #B48EAD\">0<\/span><span style=\"color: #ECEFF4\">&#93;.<\/span><span style=\"color: #D8DEE9FF\">embedding<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">except<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #8FBCBB\">Exception<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">None<\/span><span style=\"color: #D8DEE9FF\">                    <\/span><span style=\"color: #616E88\"># silent fail \u2014 chat continues without RAG<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>Every long-term memory the agent writes (project state, infra notes, user preferences) gets embedded and stored as a Postgres row with a <code>vector(768)<\/code> column. On the next chat turn, the top-5 cosine-similar memories get prepended to the context. If embeddings fail \u2014 Ollama down, wodin off \u2014 chat continues without RAG. Memory is a nice-to-have, not a load-bearing component.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prompt caching (the cheap trick that matters)<\/h3>\n\n\n\n<p>The tool-list system prompt for 28 servers is roughly 25k tokens. Sending that on every turn at Anthropic&#8217;s input rate would be a real problem. Two <code>cache_control<\/code> markers fix it:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># Tool definitions block \u2014 gets cached\ntools_msg = [\n    *all_tools,\n    {**all_tools&#91;-1&#93;, \"cache_control\": {\"type\": \"ephemeral\"}},\n]\n\n# System prompt (Percival persona) \u2014 also cached\nsystem = &#91;\n    {\"type\": \"text\", \"text\": persona_prompt,\n     \"cache_control\": {\"type\": \"ephemeral\"}},\n&#93;<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># Tool definitions block \u2014 gets cached<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">tools_msg <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">[<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">*<\/span><span style=\"color: #D8DEE9FF\">all_tools<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #81A1C1\">**<\/span><span style=\"color: #D8DEE9FF\">all_tools<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #81A1C1\">-<\/span><span style=\"color: #B48EAD\">1<\/span><span style=\"color: #ECEFF4\">&#93;,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">cache_control<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">ephemeral<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">}},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">]<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># System prompt (Percival persona) \u2014 also cached<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">system <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#91;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">text<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">text<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> persona_prompt<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">     <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">cache_control<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">ephemeral<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">}},<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">&#93;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>The cache lasts 5 minutes after each hit. Subsequent turns within that window pay <strong>10% of the input rate<\/strong> for those tokens. Real-world hit rate on a busy session is north of 90%. This single change roughly cut Percival&#8217;s Anthropic bill in half.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udd10 Approval gates \u2014 three levels<\/h2>\n\n\n\n<p>Letting an LLM call <code>rm -rf \/<\/code> on production is bad. Not letting it call <em>anything<\/em> on production makes it useless. The middle path is a classifier on every tool invocation:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly># backend\/app\/agent\/approval.py\nclass ApprovalLevel(Enum):\n    AUTO     = \"auto\"     # safe reads, allowlisted SSH on a tame host\n    CONFIRM  = \"confirm\"  # writes, restarts, anything user-visible\n    DENIED   = \"denied\"   # destructive ops on critical systems\n\ndef classify(tool_name: str, args: dict) -> ApprovalLevel:\n    if tool_name in READ_ONLY_TOOLS:\n        return ApprovalLevel.AUTO\n    if tool_name == \"ssh_exec\" and args&#91;\"host\"&#93; in CRITICAL_HOSTS:\n        if any(k in args&#91;\"command\"&#93; for k in DESTRUCTIVE_KEYWORDS):\n            return ApprovalLevel.DENIED\n        return ApprovalLevel.CONFIRM\n    if tool_name in WRITE_TOOLS:\n        return ApprovalLevel.CONFIRM\n    return ApprovalLevel.AUTO<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># backend\/app\/agent\/approval.py<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">class<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #8FBCBB\">ApprovalLevel<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #8FBCBB; font-weight: bold\">Enum<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    AUTO     <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">auto<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #D8DEE9FF\">     <\/span><span style=\"color: #616E88\"># safe reads, allowlisted SSH on a tame host<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    CONFIRM  <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">confirm<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #D8DEE9FF\">  <\/span><span style=\"color: #616E88\"># writes, restarts, anything user-visible<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    DENIED   <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">denied<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #D8DEE9FF\">   <\/span><span style=\"color: #616E88\"># destructive ops on critical systems<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">classify<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">tool_name<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">str<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">args<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">dict<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">-&gt;<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> tool_name <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> READ_ONLY_TOOLS<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">AUTO<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> tool_name <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">ssh_exec<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">and<\/span><span style=\"color: #D8DEE9FF\"> args<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">host<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">&#93;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> CRITICAL_HOSTS<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">any<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">k <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> args<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">command<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">&#93;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">for<\/span><span style=\"color: #D8DEE9FF\"> k <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> DESTRUCTIVE_KEYWORDS<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">DENIED<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">CONFIRM<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> tool_name <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> WRITE_TOOLS<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">CONFIRM<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> ApprovalLevel<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">AUTO<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>When a tool returns <code>CONFIRM<\/code>, the WebSocket pipeline pauses the agent and streams a typed message to the UI:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>await ws.send_json({\n    \"type\": \"approval_required\",\n    \"tool\": tool_name,\n    \"input\": tool_args,\n    \"id\": invocation_id,\n})\napproval = await ws.receive_json()  # blocks until user clicks Approve or Deny\nif approval&#91;\"decision\"&#93; != \"approve\":\n    return ToolResult(error=\"User denied execution\")<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> ws<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">send_json<\/span><span style=\"color: #ECEFF4\">({<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">type<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">approval_required<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">tool<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> tool_name<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">input<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> tool_args<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">id<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> invocation_id<\/span><span style=\"color: #ECEFF4\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">})<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">approval <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">await<\/span><span style=\"color: #D8DEE9FF\"> ws<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">receive_json<\/span><span style=\"color: #ECEFF4\">()<\/span><span style=\"color: #D8DEE9FF\">  <\/span><span style=\"color: #616E88\"># blocks until user clicks Approve or Deny<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> approval<\/span><span style=\"color: #ECEFF4\">&#91;<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">decision<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">&#93;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">!=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">approve<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">ToolResult<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">error<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">User denied execution<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>In practice the UI renders a card with the tool name, the arguments, and two buttons. The agent literally cannot proceed until I tap one. <code>DENIED<\/code> short-circuits before the tool ever runs and writes an audit log entry \u2014 that&#8217;s the layer that stops the LLM from getting cute with <code>docker rm -f<\/code> on the production stack.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udce1 WebSocket streaming, not chunked HTTP<\/h2>\n\n\n\n<p>The frontend is one long-lived WebSocket per session. Every event the agent emits is a typed JSON message:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><pre class=\"code-block-pro-copy-button-pre\" aria-hidden=\"true\"><textarea class=\"code-block-pro-copy-button-textarea\" tabindex=\"-1\" aria-hidden=\"true\" readonly>ws.onmessage = (event) => {\n    const msg = JSON.parse(event.data);\n    switch (msg.type) {\n        case \"token\":             appendToCurrentMessage(msg.text); break;\n        case \"tool_start\":        renderToolCard(msg.tool, msg.input); break;\n        case \"tool_result\":       attachResultToToolCard(msg.id, msg.output); break;\n        case \"approval_required\": showApprovalCard(msg); break;\n        case \"done\":              markTurnComplete(); break;\n    }\n};<\/textarea><\/pre><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #D8DEE9\">ws<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">onmessage<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">event<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=&gt;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">const<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">JSON<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">parse<\/span><span style=\"color: #D8DEE9FF\">(<\/span><span style=\"color: #D8DEE9\">event<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">data<\/span><span style=\"color: #D8DEE9FF\">)<\/span><span style=\"color: #81A1C1\">;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">switch<\/span><span style=\"color: #D8DEE9FF\"> (<\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">type<\/span><span style=\"color: #D8DEE9FF\">) <\/span><span style=\"color: #ECEFF4\">{<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">case<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">token<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">             <\/span><span style=\"color: #88C0D0\">appendToCurrentMessage<\/span><span style=\"color: #D8DEE9FF\">(<\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">text<\/span><span style=\"color: #D8DEE9FF\">)<\/span><span style=\"color: #81A1C1\">;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">break;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">case<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">tool_start<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #88C0D0\">renderToolCard<\/span><span style=\"color: #D8DEE9FF\">(<\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">tool<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">input<\/span><span style=\"color: #D8DEE9FF\">)<\/span><span style=\"color: #81A1C1\">;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">break;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">case<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">tool_result<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">       <\/span><span style=\"color: #88C0D0\">attachResultToToolCard<\/span><span style=\"color: #D8DEE9FF\">(<\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">id<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9\">output<\/span><span style=\"color: #D8DEE9FF\">)<\/span><span style=\"color: #81A1C1\">;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">break;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">case<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">approval_required<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">showApprovalCard<\/span><span style=\"color: #D8DEE9FF\">(<\/span><span style=\"color: #D8DEE9\">msg<\/span><span style=\"color: #D8DEE9FF\">)<\/span><span style=\"color: #81A1C1\">;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">break;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">case<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">done<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\">              <\/span><span style=\"color: #88C0D0\">markTurnComplete<\/span><span style=\"color: #D8DEE9FF\">()<\/span><span style=\"color: #81A1C1\">;<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">break;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #ECEFF4\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #ECEFF4\">}<\/span><span style=\"color: #81A1C1\">;<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>No SSE, no long-poll, no chunked transfer. One bidirectional pipe. The approval flow falls out of this naturally \u2014 the same channel that streams tokens also streams approval requests and consumes the user&#8217;s decision. The whole thing is roughly 300 lines of vanilla JavaScript on the frontend.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udcca Five things I learned building this<\/h2>\n\n\n\n<p><strong>1. MCP is the right abstraction.<\/strong> Before MCP, every tool was a function in a giant <code>tools.py<\/code> and adding one was a deploy. Now every tool is a small server I can develop, test, and restart independently. When the Zammad container went down last month, the 14 other tools the agent was using at the time didn&#8217;t even notice. That kind of fault isolation is hard to get any other way.<\/p>\n\n\n\n<p><strong>2. Haiku is criminally underrated as an agent driver.<\/strong> It&#8217;s not as smart as Sonnet or Opus. It does not need to be. For an agent loop where the LLM&#8217;s job is &#8220;look at this tool output and pick the next thing to call,&#8221; Haiku gets it right 95+% of the time at 1\/20th the cost of Opus and 1\/4 the latency of Sonnet. Reserve Sonnet\/Opus for the things that actually need them.<\/p>\n\n\n\n<p><strong>3. Prompt caching pays the bills.<\/strong> Two lines of <code>cache_control<\/code> cut our LLM bill roughly in half. If your agent has a stable, long system prompt \u2014 which every tool-use agent does \u2014 and you&#8217;re not caching, you are setting money on fire.<\/p>\n\n\n\n<p><strong>4. Local LLMs aren&#8217;t ready to be primary on a $300 GPU.<\/strong> They&#8217;re great at embeddings, fine for background summarization, and perfectly acceptable as the emergency fallback when the cloud LLM is down. They are not, in May 2026, a drop-in replacement for the cloud on the interactive chat path. They will be soon, but they aren&#8217;t yet. Plan the architecture for the day they are, but don&#8217;t pretend you&#8217;ve already crossed that line.<\/p>\n\n\n\n<p>**5. The interesting part of an agent is the <em>workers<\/em>, not the chat.** The chat is the demo. The chat is what gets the screenshots. But the things that actually save time at 2 AM are the autonomous loops: the incident-responder that hands me a pre-summarized Discord embed, the weekly digest that nobody had to write, the spend monitor that flags fallback events before the bill arrives. If you&#8217;re building an agent, budget half your engineering for the workers and the alerting around them. They&#8217;re where the leverage is.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">\ud83d\udee3\ufe0f What&#8217;s next<\/h2>\n\n\n\n<p>A few things on the roadmap that I think will be technically interesting to write up:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Multi-tenant Percival for clients<\/strong> \u2014 same agent, scoped MCP servers per client, OIDC-gated chat sessions<\/li>\n\n\n\n<li><strong>Embeddings on a GPU that can actually run a 70B model<\/strong> \u2014 when the 5060 Ti lands, we revisit the local-primary path<\/li>\n\n\n\n<li><strong>MCP server marketplace<\/strong> \u2014 sharing the 28 servers under MIT so other MSPs can wire them up to their own stacks<\/li>\n<\/ul>\n\n\n\n<p>If you&#8217;re an MSP, IT director, or just a self-hosting nerd who wants to compare notes \u2014 drop me a line at <a href=\"mailto:christopher@rainier-it.com\">christopher@rainier-it.com<\/a>. I love this stuff.<\/p>\n\n\n\n<p>Thanks, and may your tool calls always succeed on the first try.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A deep technical tour of Percival, our self-hosted AI operator \u2014 28 MCP servers (213 tools) wired to every system at Rainier IT, 7 autonomous background workers, Claude Haiku over a LiteLLM proxy with Ollama fallback on a GPU LXC, pgvector memory, prompt caching, three-tier approval gates, and WebSocket streaming.<\/p>\n","protected":false},"author":1,"featured_media":335,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,5,7],"tags":[],"class_list":["post-333","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-chatbots","category-cloud-infrastructure","category-it-management"],"_links":{"self":[{"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/posts\/333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/comments?post=333"}],"version-history":[{"count":7,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/posts\/333\/revisions"}],"predecessor-version":[{"id":341,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/posts\/333\/revisions\/341"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/media\/335"}],"wp:attachment":[{"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/media?parent=333"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/categories?post=333"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rainier-it.com\/blog\/wp-json\/wp\/v2\/tags?post=333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}