Swarm agents in production

The single-agent paradigm — one big model, one chat thread, one human supervisor — is the entry-level move. It is fine for prototypes. It is not how we run production work at IMPT.io, and after watching it scale poorly across two dozen teams that were earnest about wanting it to scale, I'd argue it is not how anyone should run production work in 2026.

The pattern that works is a swarm: a small set of cooperating agents, each with a narrow remit, coordinated by a planner and watched by an auditor. The good news is that the swarm pattern is now boring engineering — the components are well understood, the failure modes are well understood, the operational practices are well understood. The bad news is that most teams that say they have moved to multi-agent have actually just added a router in front of a single agent and renamed the room. That is not a swarm.

What an actual swarm looks like

Let me walk through the morning briefing swarm we run at IMPT.io. It runs on a cron at 06:30 Irish time, every morning, and produces by 07:30 a single email to me with the day's operating picture. That email is the only operational thing I read in the morning. The swarm has eleven agents.

Planner — reads yesterday's reflections, decides which sections of the briefing matter today, and dispatches.
Booking-data agent — queries Mongo for last 24 hours of hotel bookings, dedupes, summarises.
Stripe agent — queries Stripe for last 24 hours of payouts, reconciles against bookings, flags discrepancies.
Marketing-channels agent — pulls Meta + Google ads + organic SEO impressions, computes deltas, flags anomalies.
Inbox agent — reads mike@impt.io overnight, classifies, ranks by importance, surfaces what needs eyes.
Calendar agent — reads today's calendar, pulls context for each meeting, drafts a one-line briefing per slot.
Press & mentions agent — checks for new mentions of IMPT or Mike English across the web, summarises.
Competitor agent — pulls last 24 hours of meaningful moves from a watch list of twelve competitor sites.
Open-tasks agent — reads the “owed by Mike” list from yesterday's reflections and bumps anything stale.
Composer — takes the eleven other outputs, drafts the email in a fixed structure, gets approved by the auditor.
Auditor — checks the draft for hallucinated numbers, missing sections, tone drift. Has veto authority.

That swarm runs every day. It costs less than a coffee in compute. It saves three hours of senior-operator time every morning. None of the agents above are large; only the planner and auditor use a frontier model. Most of the workers are small fine-tuned models (we mix Claude 4.5 / Sonnet, the Chinese open-source stack, and a couple of in-house tunes) chosen for cost and latency on their specific task.

The patterns that work

Planner / worker / auditor

This is the core triangle. A planner that decides, workers that do, an auditor that checks. The auditor must have veto authority and must use a different model than the planner — otherwise it is essentially the planner agreeing with itself.

Specialisation over generalisation

It is tempting to give every worker the same big model and let prompting do the specialisation. Don't. A small fine-tuned model running a narrow task — “classify inbound email by importance,” “extract structured fields from a Stripe payout” — is faster, cheaper, more reliable, and easier to evaluate than a general-purpose worker doing the same thing.

Reflections, not just logs

After every swarm run, a reflection agent writes a structured paragraph: what the swarm did, what worked, what surprised it, what should change. Those reflections feed back into the planner. This is the line between “we ran the same swarm again today” and “the swarm got slightly smarter overnight.”

Tool surface as the security boundary

Authorisation lives at the tool layer, not the agent layer. An agent that can't talk to Stripe can't talk to Stripe — full stop, regardless of how cleverly it asks. Tool-level allow-lists are the only durable form of agent governance.

Dry runs everywhere

Every tool with an irreversible side effect — sending email, charging cards, posting to social — has a dry-run mode that produces the exact same return shape but does not actually do the thing. Swarms run in dry-run mode by default during development and during certain ops windows. This single pattern eliminates ninety percent of swarm incidents.

The patterns that don't work

Adversarial debate as the main loop

Two agents arguing with each other looks impressive in demos and falls apart in production. The latency is bad, the cost is high, and the convergence properties are unstable. Use one decisive agent and one auditor. Save debate for genuinely contested decisions.

Letting agents write to the state layer freely

Every write to canonical state goes through a structured tool with validation and audit. Agents do not get raw database access. The cost of forgetting this rule is high.

Making the planner the smartest model

It feels right and is wrong. The planner needs to be reliable, cheap, and predictable. The smartest model in your swarm should be the auditor, where the cost-per-call is amortised across many worker outputs and the value-per-correct-decision is highest.

Three case studies we'll work through in the workshop

Day 1 afternoon of the Clonmel workshop walks through three real swarms in detail with the messy bits left in: the IMPT.io morning briefing swarm above, our marketing-ops swarm (which writes, reviews, schedules and tracks every piece of marketing content we ship), and our customer-research swarm (which reads support emails, identifies patterns, and feeds product priorities). Each comes with the prompt files, the tool definitions, the failure modes we've hit, and the things we'd do differently if we were starting again today.

Reserve your seat →