Observability & OpenTelemetry

Logs, metrics, and traces — the three pillars of system observability

Analytics for Systems

Web analytics focuses on user behavior. Observability is the parallel discipline focused on system behavior — understanding what is happening inside your servers, databases, and services from their external outputs.

The Three Pillars

                       ┌───────────────┐
                       │ Observability │
                       └──────┬────────┘
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
     ┌──────────┐     ┌──────────┐      ┌──────────┐
     │   Logs   │     │ Metrics  │      │  Traces  │
     │          │     │          │      │          │
     │ What     │     │ How is   │      │ Where    │
     │ happened │     │ it doing │      │ did the  │
     │          │     │          │      │ request  │
     │ Discrete │     │ Numeric  │      │ go       │
     │ events   │     │ time-    │      │          │
     │ with     │     │ series   │      │ End-to-  │
     │ context  │     │ data     │      │ end path │
     └──────────┘     └──────────┘      └──────────┘
  • Logs — Discrete events with context: "User 33 requested /api/items at 14:03:07 and got a 500 error because the database connection pool was exhausted." Logs answer what happened.
  • Metrics — Numeric time-series data: request rate, error rate, response time percentiles, CPU utilization. Metrics answer how is the system doing.
  • Traces — The end-to-end path of a single request through multiple services: browser → load balancer → web server → application → database → cache. Traces answer where did the request go and where did it slow down.

Analytics is observability for user behavior. Observability is analytics for software and system behavior. The two disciplines share tools, techniques, and infrastructure. A slow page load might be a user experience problem (analytics) caused by a slow query that may be caused by code or database (observability).

OpenTelemetry (OTel)

OpenTelemetry is a vendor-neutral open standard for telemetry data — traces, metrics, and logs. It provides APIs and SDKs for most languages, so you can instrument your code once and send the data to any backend: Jaeger, Prometheus, Grafana, Datadog, or your own storage.

The value of a standard is interoperability. Without OTel, switching from one monitoring vendor to another means re-instrumenting your entire codebase. With OTel, you change a configuration file.