April 10, 2026

AI Agents in the Real World

We've shipped agent systems to production. Here's what works, what doesn't, and what we'd do differently.

The hype around AI agents is intense. Autonomous systems that reason, plan, and take actions. The demos are impressive. The reality is more nuanced.

What agents are good at

Agents excel at well-defined tasks with clear success criteria. Customer support ticket routing. Document processing workflows. Research and data gathering. These are tasks where the action space is bounded and the tools are reliable.

What agents are bad at

Open-ended reasoning over long horizons. Multi-step workflows where errors compound. Anything where "close enough" isn't acceptable. Agents are probabilistic — they will make mistakes. The question is whether your system tolerates mistakes gracefully.

The tool problem

An agent is only as good as its tools. If your API returns inconsistent error messages, the agent will struggle. If your database schema is confusing to a human, it's confusing to an agent. Investing in clean, well-documented tools pays more dividends than any prompt engineering.

Lessons from production

  1. Start with deterministic fallbacks — if the agent fails, the system should degrade gracefully to a simpler, rule-based path.
  2. Human-in-the-loop by default — start with human approval for every action, then remove the guardrails one by one as confidence grows.
  3. Evaluation is everything — build regression tests for agent behavior. When you change a prompt, you need to know what broke.
  4. Observability — trace every decision. When an agent does the wrong thing, you need to understand why.
  5. Cost matters — agent loops can burn through tokens fast. Set budget limits and optimize for fewer, better tool calls.

The future is agentic. But the path to get there is engineering discipline, not magic prompts.