Featured · AI engineering

Building reliable Claude agents: lessons from twelve production systems.

What we learned shipping multi-agent Claude systems for ops teams at scale. Evals, fall-backs, audit, and the patterns we now reach for by default.

By the Plutobee AI team · 12 min read

Latest

MTWTFS SHIP
Why we ship every Friday, and what changes when you commit to it.
Weekly demos changed how we estimate, how we cut scope, and how clients judge us. Here's the operating model.
mcp://plutobee/tools → list_tools() ✓ 12 tools registered → invoke("ship_to_prod") ✓ ok · 38ms
MCP servers in production: a checklist for going live.
Auth, rate limiting, audit, observability, kill switches. The non-fun parts you need before you ship.
Threat-modeling for AI features (without slowing the team).
A 60-minute exercise we now run at the start of every AI engagement. Outputs a one-page risk register.
Before After $8.4K $4.1K -51%
Cutting our Postgres bill in half without touching a query.
Storage class, retention policy and a few targeted indexes. Cost down 51%, p95 latency unchanged.
Designing for a global team without flattening voice.
A distributed swarm, three languages, one design system. How we keep the work coherent without making it bland.
SF MIA BER TLN IST
Follow-the-sun on-call: what actually works.
Six months of running 24/7 coverage across Miami, SF, Berlin, Tallinn and Istanbul. The handoff template included.
eval_factuality eval_refusals eval_grounding eval_tone eval_latency PR
Evals are the new tests. Here's how we run them.
Why we treat AI evaluation suites like our integration tests, and how we make them cheap to run on every PR.
$0 trust verify everything
Zero-trust on a startup budget.
What you actually need before Series B. Open source where possible, paid where it matters.
TS interface Plutobee { stack: 'TypeScript' } const ship = () => production;
Why we standardized on TypeScript across the studio.
Faster onboarding, fewer prod bugs, and the trade-offs we accepted to get there.

Notes by email, twice a month.

No fluff. A short brief on what we shipped, learned and broke.

Start a project