FIELD NOTES.
Engineering notes on AI agents, automation and the systems behind them — what works in production, what breaks, and what we would build differently.

Evals before vibes: measuring agent quality
You cannot improve what you do not measure, and "it feels smarter" is not a metric. How we build eval suites that catch regressions before users do.

The boring automation that pays for itself
The highest-ROI automation we ship is rarely glamorous: report generation, data syncs, handoffs between tools. Boring is a feature.

Internal tools your team will actually use
Most internal tools die of neglect, not bad code. The difference between a dashboard nobody opens and a tool the team fights to keep.

Case study: a reporting pipeline, from days to minutes
How a mid-size operations team replaced a three-day manual reporting cycle with an automated pipeline — and what it changed downstream.

Choosing an LLM stack: OpenAI, Claude or self-host
Provider choice is an engineering decision, not a brand preference. The trade-offs that actually matter when picking a model stack.

Onboarding a team to agent-assisted workflows
Tools change faster than habits. What actually works when you introduce AI agents into a team's daily workflow — beyond the kickoff demo.

Why we stay a small crew
Headcount is a cost, not a metric. On staying deliberately small while shipping like a bigger team — with systems doing the heavy lifting.
