When a request takes 2 seconds and spans five microservices, logs tell you something went wrong. Traces tell you exactly which service, which database call, and which line of code caused the delay.

OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. This tutorial instruments a Node.js service, deploys Jaeger, and shows you how to read a trace.

How Distributed Tracing Works

A trace is a collection of spans. Each span represents one unit of work — an HTTP request, a database query, a function call. Spans form a tree: a root span (the incoming request) has child spans (downstream calls).

Every span carries a trace-id that's the same across all services. When service A calls service B, it passes the trace-id in a header (traceparent). Service B creates a child span under the same trace. After the request, all spans are collected and reassembled into the tree.

HTTP request → Service A [root span, 450ms]
  └─ DB query [child, 20ms]
  └─ gRPC call → Service B [child, 380ms]
      └─ HTTP call → Service C [child, 350ms]  ← this is slow
          └─ DB query [child, 340ms]            ← this is the root cause

Part 1: Deploy Jaeger

bash

1helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
2helm repo update
3
4# All-in-one Jaeger (development — in-memory storage)
5helm install jaeger jaegertracing/jaeger \
6  --namespace observability \
7  --create-namespace \
8  --set allInOne.enabled=true \
9  --set collector.enabled=false \
10  --set query.enabled=false \
11  --set agent.enabled=false \
12  --set storage.type=memory
13
14kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=jaeger \
15  -n observability --timeout=60s

For production, use Elasticsearch or Cassandra as the storage backend. The in-memory all-in-one is fine for development and this tutorial.

Access the Jaeger UI:

bash

kubectl port-forward svc/jaeger-query 16686:16686 -n observability

Open http://localhost:16686.

The OTLP endpoint (where your services send traces) is:

http://jaeger-collector.observability.svc.cluster.local:4318  (HTTP)
grpc://jaeger-collector.observability.svc.cluster.local:4317  (gRPC)

Part 2: Instrument a Node.js Service

Step 1: Install Dependencies

bash

npm install \
  @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions

Step 2: Create the Instrumentation File

Create src/instrumentation.ts — this must load before your application code:

typescript

1// src/instrumentation.ts
2import { NodeSDK } from '@opentelemetry/sdk-node';
3import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
4import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
5import { Resource } from '@opentelemetry/resources';
6import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
7
8const exporter = new OTLPTraceExporter({
9  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
10});
11
12const sdk = new NodeSDK({
13  resource: new Resource({
14    [SEMRESATTRS_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'my-service',
15    [SEMRESATTRS_SERVICE_VERSION]: process.env.APP_VERSION ?? '1.0.0',
16  }),
17  traceExporter: exporter,
18  instrumentations: [
19    getNodeAutoInstrumentations({
20      // Auto-instruments: http, express, pg, redis, grpc, and more
21      '@opentelemetry/instrumentation-fs': { enabled: false }, // noisy, skip
22    }),
23  ],
24});
25
26sdk.start();
27
28// Graceful shutdown
29process.on('SIGTERM', () => {
30  sdk.shutdown().then(() => process.exit(0));
31});

Step 3: Load Instrumentation Before Your App

Update your start command in package.json:

json

{
  "scripts": {
    "start": "node --require ./dist/instrumentation.js dist/index.js"
  }
}

Or with ts-node:

json

{
  "scripts": {
    "start": "ts-node --require ./src/instrumentation.ts src/index.ts"
  }
}

Auto-instrumentations hook into Node.js's module system at startup. They patch http, express, pg, ioredis, and other popular libraries automatically — you get traces for database queries and HTTP calls without writing a single span manually.

Step 4: Add Custom Spans for Business Logic

Auto-instrumentation covers infrastructure calls. For your business logic, add manual spans:

typescript

1import { trace, SpanStatusCode } from '@opentelemetry/api';
2
3const tracer = trace.getTracer('my-service');
4
5async function processOrder(orderId: string): Promise<Order> {
6  return tracer.startActiveSpan('processOrder', async (span) => {
7    span.setAttribute('order.id', orderId);
8    span.setAttribute('order.source', 'api');
9
10    try {
11      const order = await db.orders.findById(orderId);
12      span.setAttribute('order.status', order.status);
13
14      if (order.status === 'pending') {
15        await tracer.startActiveSpan('validatePayment', async (paymentSpan) => {
16          paymentSpan.setAttribute('payment.method', order.paymentMethod);
17          await paymentService.validate(order.payment);
18          paymentSpan.end();
19        });
20      }
21
22      span.setStatus({ code: SpanStatusCode.OK });
23      return order;
24    } catch (error) {
25      span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
26      span.recordException(error as Error);
27      throw error;
28    } finally {
29      span.end();
30    }
31  });
32}

Step 5: Deploy with Environment Variables

In your Kubernetes Deployment:

yaml

1env:
2  - name: OTEL_SERVICE_NAME
3    value: "my-service"
4  - name: OTEL_EXPORTER_OTLP_ENDPOINT
5    value: "http://jaeger-collector.observability.svc.cluster.local:4318/v1/traces"
6  - name: OTEL_EXPORTER_OTLP_PROTOCOL
7    value: "http/protobuf"
8  - name: NODE_ENV
9    value: "production"

Part 3: Propagate Traces Across Services

Trace context propagates automatically when you make HTTP calls using Node.js's built-in http module or fetch — the auto-instrumentation injects the traceparent header.

If you're using a custom HTTP client, inject the header manually:

typescript

1import { context, propagation } from '@opentelemetry/api';
2
3async function callServiceB(data: unknown) {
4  const headers: Record<string, string> = {
5    'Content-Type': 'application/json',
6  };
7
8  // Inject current trace context into the request headers
9  propagation.inject(context.active(), headers);
10
11  const response = await fetch('http://service-b/endpoint', {
12    method: 'POST',
13    headers,
14    body: JSON.stringify(data),
15  });
16
17  return response.json();
18}

Service B must also be instrumented and extract the context on the incoming request — auto-instrumentation handles this automatically on the receiving end.

Part 4: Read a Trace in Jaeger

Open the Jaeger UI at http://localhost:16686.

Select your service from the Service dropdown
Click Find Traces
Click any trace to open the waterfall view

The waterfall view shows:

Total duration at the top (e.g., 450ms)
Spans stacked by start time, width proportional to duration
Each span shows service name, operation name, duration
Click a span to see its attributes (HTTP status, DB query, error details)

Look for:

The widest spans (most time spent)
Gaps between spans (time in transit or waiting)
Red spans (errors)
Spans with db.statement attribute (the actual SQL query that ran)

Part 5: Add Tempo as a Backend (Grafana Stack)

If you're already running the kube-prometheus-stack, replace Jaeger with Grafana Tempo and view traces inside Grafana:

bash

1helm repo add grafana https://grafana.github.io/helm-charts
2
3helm install tempo grafana/tempo \
4  --namespace observability \
5  --set tempo.storage.trace.backend=local
6
7# Update OTEL endpoint to point to Tempo
8kubectl set env deployment/my-app \
9  OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318/v1/traces

In Grafana, add a Tempo data source (URL: http://tempo:3100). Then in any Grafana panel, switch to the Explore tab, select the Tempo data source, and paste a trace-id to jump directly to the trace.

Sampling Strategy

In production, don't send 100% of traces — it's expensive. Configure head-based sampling in the SDK:

typescript

import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';

const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
  // ... rest of config
});

Or use tail-based sampling in the OpenTelemetry Collector (keeps traces with errors at 100%):

yaml

1# otel-collector-config.yaml
2processors:
3  tail_sampling:
4    decision_wait: 10s
5    policies:
6      - name: errors
7        type: status_code
8        status_code: { status_codes: [ERROR] }
9      - name: slow-requests
10        type: latency
11        latency: { threshold_ms: 1000 }
12      - name: small-percentage
13        type: probabilistic
14        probabilistic: { sampling_percentage: 5 }

Sample 100% of errors and slow requests, 5% of everything else. This keeps the signal-to-noise ratio high.

Official References

OpenTelemetry Documentation — Official OTel docs: concepts, SDKs, collectors, and instrumentation guides for all languages
OpenTelemetry Collector — The vendor-agnostic collector: receivers, processors, exporters, and deployment patterns
OpenTelemetry JavaScript SDK — Official Node.js/TypeScript instrumentation SDK with auto-instrumentation packages
Jaeger Documentation — Jaeger setup, architecture, sampling strategies, and storage backends
W3C Trace Context — The standard that defines traceparent and tracestate headers used for context propagation

Distributed Request Tracing with OpenTelemetry

Before you begin

How Distributed Tracing Works

Part 1: Deploy Jaeger

Part 2: Instrument a Node.js Service

Step 1: Install Dependencies

Step 2: Create the Instrumentation File

Step 3: Load Instrumentation Before Your App

Step 4: Add Custom Spans for Business Logic

Step 5: Deploy with Environment Variables

Part 3: Propagate Traces Across Services

Part 4: Read a Trace in Jaeger

Part 5: Add Tempo as a Backend (Grafana Stack)

Sampling Strategy

Official References

Struggling with this in production?