Introducing AOP

6 min read
AI AgentsObservabilityOpen SourceTypeScript

Autonomous agents are black boxes. You give them a goal, they run for a while, and either you get a result or you get a bill. What happened in between — the reasoning, the dead ends, the tool failures, the moments of uncertainty — is invisible. You're paying for decisions you can't see.

I built the Agent Observability Protocol (AOP) to fix that. It's an open protocol that lets any agent, built on any LLM, narrate its own thinking in real time. Two lines of code, and you can watch every thought, decision, and tool call as it happens.

The Problem

When an agent burns $4 and returns garbage, you have no idea where it went wrong. Was it the initial search strategy? A bad tool call? A misinterpreted result? Did it loop three times on the same query? You can't tell, because the only output you get is the final answer.

Traditional logging doesn't help. You can log tool calls — "called web_search at 10:04:32" — but that tells you what happened, not why. The interesting part isn't that the agent searched Google. The interesting part is that it searched Google because it decided Reddit was unreliable, and it made that decision because the previous result had conflicting data, and it flagged that conflict as medium-uncertainty but proceeded anyway.

That chain of reasoning is what you need to debug, optimize, and trust an autonomous agent. And no existing tool captures it.

The Design

AOP is a protocol, not a product. It defines a set of structured JSON events that agents emit over HTTP. There are eleven event types organized into three tiers.

Lifecycle events report that the agent exists and is running. Session started, heartbeat, session ended. Every agent emits at least these.

Cognition events capture the agent's internal reasoning. What it's thinking, what goal it's pursuing, what options it considered and why it chose one over the others, and where it's uncertain. These are the events that make AOP different from everything else — they expose why the agent does things.

Operation events record actions in the world. Tool calls with timing and results, memory reads and writes, spawning child agents, external API calls.

The protocol is deliberately minimal. Eleven event types cover the full surface area of what an autonomous agent does. Custom event types are supported via namespacing for anything domain-specific.

Agent-Native Observability

The core design decision is that the agent emits its own events. Not a wrapper that intercepts tool calls from outside. Not a proxy that sits between the agent and the LLM. The agent itself — driven by the LLM — generates the content of every thought, decision, and uncertainty event.

This means the content of aop.thought() isn't a string a developer typed. It's the LLM's actual reasoning output, wired into an AOP event. The developer writes the wiring once. After that, the agent narrates its own behavior.

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  tools: myTools,
  messages,
})

// The LLM's reasoning becomes the thought event
const text = response.content.find(b => b.type === 'text')
if (text?.type === 'text') {
  await aop.thought(text.text, { confidence: 'high' })
}

The second design decision is zero coupling. Every event is fire-and-forget — a POST request with a 500ms timeout. If the collector is down, the event is silently dropped and the agent continues. Observability must never crash, slow, or otherwise affect the observed system. This isn't negotiable.

What You Get

The collector is a single command:

npx @useaop/collector start

It starts a local server that receives events, stores them in SQLite, and serves a real-time dashboard. The dashboard shows every session — active and completed — with a live event feed that streams thoughts, tool calls, and alerts as they happen.

The anomaly detection runs automatically. If the same tool is called three times with similar inputs within a minute, the collector flags a loop. If the agent expresses low confidence three times in a row, it flags a confidence drop. If three consecutive tool calls fail, it flags an error cascade. These alerts appear inline in the dashboard — not as intrusive notifications, but as context in the feed where they happened.

Cost tracking is built into the event model. Every tool call can carry a token_spend_delta, and every session ends with a total_cost_usd. The dashboard shows per-session and per-tool spend so you know exactly where your budget is going.

Multi-Agent Support

Modern agent architectures often involve multiple agents — an orchestrator that spawns researchers, writers, and reviewers. AOP handles this with a simple parent-child model. Every session has a parent_session_id. When an orchestrator spawns a child, it emits an operation.agent_spawn event, and the child's session links back to the parent.

The SDK makes this ergonomic. agentSpawn() returns a pre-configured child client with the parent relationship already set:

const orchestrator = new AOPClient({ agentId: 'orchestrator' })
await orchestrator.sessionStarted({ goal: 'Research and write report' })

const researcher = await orchestrator.agentSpawn('researcher', 'Gather data')
await researcher.sessionStarted({ goal: 'Gather data' })
// researcher.parent_session_id is automatically set

The collector stores the full tree, and the dashboard lets you navigate from parent to child and back.

The Stack

The TypeScript SDK is zero dependencies. It exports a single class — AOPClient — with typed methods for every event. It handles session ID generation (cryptographically secure), sequence numbering, and the fire-and-forget delivery with automatic timeout. It supports bearer token auth for production deployments and warns if you use unencrypted HTTP for a remote endpoint.

The collector runs on Fastify with SQLite via better-sqlite3. It exposes a REST API for sessions and events, an SSE endpoint for real-time streaming, and serves the bundled dashboard as static files. The entire thing — API, database, dashboard, anomaly detection — is a single process on a single port.

The dashboard is a Next.js static export bundled into the collector's npm package. When you run npx @useaop/collector start, the dashboard is already there at /dashboard/. No separate install, no second process. It connects to the collector via SSE on the same origin, so there are no CORS issues or configuration needed.

What's Next

AOP currently has a TypeScript SDK. A Python SDK is the obvious next step — most agent frameworks are Python-first, and without pip install useaop, adoption is limited to the TypeScript ecosystem.

Framework integrations are the bigger unlock. A LangChain callback handler that auto-instruments every chain and tool call without manual AOP calls. A CrewAI plugin that captures task delegation and agent reasoning automatically. The goal is zero-config observability — install the integration package, pass it to your framework, and every agent session is visible.

Longer term, a hosted collector would remove the self-hosting requirement for teams that want shared dashboards, retention policies, and access controls. But the local-first model stays as the default. Your agent data should stay on your machine unless you choose otherwise.

Try It

npm install @useaop/sdk
npx @useaop/collector start

Two lines in your agent. One command for the collector. Open the dashboard and watch your agents think.

useaop.dev · GitHub · Docs