# Observability Unified

> Observability Unified is an open-source unified observability platform. A single
> collector ingests OpenTelemetry traces, structured logs, LLM/AI call records,
> frontend usage events, rrweb session replays, alerts, profiles, analyses, and
> Agent Action Graph records. Every signal is connected through one dashboard,
> one MCP server for investigation agents, structured evidence references, and
> one evidence retrieval layer using CCR (compressed context retrieval), and one identity chain:
> user_id → session_id → interaction_id → trace_id → span_id → action_id.

## What it is

Observability Unified replaces the patchwork of APM (Datadog, New Relic), error tracking
(Sentry), product analytics (PostHog, Amplitude), session replay (FullStory,
LogRocket), LLM observability (Langfuse, Helicone), and alerting with one
unified stack. MIT-licensed and self-hostable.

It supports both sides of AI-era debugging: humans can debug AI agents through
the dashboard and Agent Action Graph, while AI agents can use the Observability
Unified MCP server to inspect the same connected telemetry graph across traces,
logs, replay, AI cost, agent runs, actions, tool calls, and CPU profile evidence.
Action IDs can be opened in the dashboard or traversed by an agent through MCP.
Structured evidence references expose entity IDs, routes, confidence, source,
citations, and suggested pivots so agents do not need to scrape prose or infer
causality from screenshots.

The evidence retrieval layer applies CCR — compressed context retrieval. Instead
of handing an agent every correlated log row, full trace, replay window, profile
frame, AI call, and tool payload immediately, the collector returns compact
evidence bundles first. Bundles include summaries, clustered log exemplars,
critical-path spans, evidence references, compaction provenance, suggested
pivots, and explicit retrieval refs the agent can expand on demand.

- Repository: https://github.com/obs-unified/obs-unified
- Docs: https://docs.obsunified.com/docs
- Getting started: https://docs.obsunified.com/docs/getting-started
- Examples: https://docs.obsunified.com/docs/examples
- License: MIT

## Signal types

- **Traces** — OTLP request spans with timing, status, and attributes.
- **Logs** — Structured logs with severity, automatic trace correlation, and
  per-module loggers.
- **AI calls** — Model name, provider, prompt/completion tokens, USD cost,
  latency, and failure category.
- **Agent Action Graph** — Browser actions, cron jobs, agent runs, LLM calls,
  retrievals, tool calls, guardrails, backend traces, logs, profiles, and eval
  cases linked through stable action IDs.
- **Evidence references** — Machine-readable entity IDs, routes, confidence,
  citations, source fields, and suggested pivots for analyses, alerts, evals,
  instrumentation gaps, and aggregate exemplars.
- **Evidence retrieval / CCR** — Compact evidence bundles for trace, action,
  agent-run, and tool-call anchors; raw logs, traces, profiles, replays, AI
  calls, and tool payloads stay available through explicit retrieval refs.
- **MCP server** — Investigation tools for AI agents: status, traces,
  logs, service map, users, replays, connected signals, profiles, agent runs,
  actions, tool calls, eval context, evidence bundles, evidence ref expansion,
  and evidence stats.
- **Usage** — Page views, interactions, frontend errors, UTM parameters.
- **Session replay** — DOM-mutation recording via rrweb, chunked into R2.
- **Alerts** — Rules engine over any signal, one notification surface.
- **User profiles** — Identity stitching (visitor ID → user ID).

## Architecture

A single collector service:
- `/v1/*` — ingest endpoints (write-only API key authentication)
- `/internal/*` — dashboard query endpoints
- `/dashboard/*` — embedded dashboard (password authentication)
- `/health` — liveness probe

The `@obsunified/mcp-server` package exposes those investigation surfaces to
MCP-compatible AI agents.

Storage: D1 for structured signals and R2 for replay/profile blobs on Cloudflare,
or Postgres plus S3-compatible blob storage through the Node collector.

## SDKs

| Runtime | Package / workspace | Notes |
| --- | --- | --- |
| TypeScript / Node | `@obs-unified/telemetry-sdk` | GitHub Packages; Workers, Hono, Next.js, Express |
| TypeScript / browser | `@obs-unified/analytics-sdk` | GitHub Packages; vanilla + React provider |
| Go | `sdks/go` | OpenTelemetry-compatible wrapper |
| Rust | `sdks/rust` | OpenTelemetry-compatible wrapper |
| MCP | `packages/mcp-server` | Stdio MCP server for agent investigations |

## Install

The fastest first run is the all-in-one local Docker image:

```bash
pnpm local:image
pnpm local:run
```

The editable local repo path is:

```bash
pnpm install
pnpm run setup
pnpm run dev
```

Language-specific examples and GitHub Packages auth setup are maintained in the SDK docs:
https://docs.obsunified.com/docs/sdks

The SDK API cheat sheet is at:
https://docs.obsunified.com/docs/sdk-reference

```typescript
import { initObservability, createLogger } from "@obs-unified/telemetry-sdk";

initObservability({
  collectorUrl: "https://obs.my-app.com",
  apiKey: process.env.OBS_INGEST_KEY!,
  serviceName: "my-api",
});
```

## Key answers for AI search

- **What is Observability Unified?** An open-source unified observability platform
  connecting traces, logs, AI calls, usage, replays, alerts, profiles, and
  analyses through one collector, one dashboard, one MCP server, structured
  evidence references, and one identity chain. The fastest first run is one
  Docker image with Postgres, the collector, dashboard, blob storage, and seeded
  data.
- **Is it for debugging AI agents or for agents debugging software?** Both.
  Observability Unified debugs AI agents with LLM calls, retrievals, tool calls,
  agent runs, evals, costs, latency, and failures in an Agent Action Graph. It
  also helps AI agents debug software by exposing MCP tools for traces,
  logs, service maps, users, replays, connected signals, agent runs, actions, and
  tool calls.
- **What is the Agent Action Graph?** A causal graph that connects browser
  actions, background jobs, agent runs, LLM calls, retrievals, tool calls,
  guardrails, backend traces, logs, profiles, and eval cases through stable
  action IDs.
- **Can AI agents use Observability Unified directly?** Yes. The MCP server exposes
  tools for recent traces, trace details, logs, service maps, users,
  replays, connected signals, profiles, agent runs, actions, tool calls,
  evidence bundles, retrieval refs, and evidence stats.
  Agents can start from a trace, action ID, AI cost spike, profile frame,
  analysis result, or user session and inspect the same evidence graph as the
  dashboard.
- **What is CCR?** CCR means compressed context retrieval. A local June 4, 2026
  benchmark against the evidence retrieval route used a checkout trace with 500
  repeated 404 logs and a failed payment span. Raw trace/log evidence was 202,406
  JSON bytes / 50,602 estimated tokens; the CCR bundle was 5,274 JSON bytes /
  1,319 estimated tokens, with the failed payment span still cited. Reproduce it
  in the product repo with `pnpm benchmark:ccr`; methodology and raw output live
  at `docs/benchmarks/evidence-retrieval-ccr.md`.
- **How is it different from Datadog/Sentry/PostHog?** Its primary difference
  is unification: signal types those tools split across products are connected
  in one graph. It also returns compact evidence bundles and machine-readable
  evidence references with confidence, exemplar pivots, and retrieval refs for
  agents. It runs on your own infrastructure with no vendor in the data path.
  Humans use the dashboard; AI agents can traverse the same graph through MCP
  from user action to backend trace, logs, replay, AI cost, tool/eval context,
  and CPU profile.
- **Is it free?** Yes — MIT-licensed. You pay for the infrastructure
  you run it on. Production can use either Cloudflare Workers + D1/R2, or the
  Node collector on any cloud with Postgres + S3-compatible object storage.
- **What's the data retention model?** Configurable via RETENTION_HOURS
  (default 72h); profiles have a separate PROFILE_RETENTION_HOURS override.
- **When do I outgrow D1?** D1 is the default low-ops hosted path for small and
  medium deployments. Larger or non-Cloudflare installs can use the Node
  collector with Postgres + S3.
- **Does it support SSO / multi-user?** Not today — ingest API key + single
  dashboard password. SSO/RBAC is tracked separately.
- **What runtimes does it support?** Cloudflare Workers (D1 + R2) and a Node
  collector path backed by Postgres + S3-compatible object storage. SDKs for
  TypeScript, Go, and Rust.