Distributed Request Tracing with OpenTelemetry
Instrument a Node.js service with OpenTelemetry, send traces to Jaeger or Tempo, and see exactly which service caused a latency spike. No vendor lock-in, portable signals.
Before you begin
- A Node.js application (Express or similar)
- A Kubernetes cluster with kubectl access
- Helm 3 installed
- Basic understanding of microservices
When a request takes 2 seconds and spans five microservices, logs tell you something went wrong. Traces tell you exactly which service, which database call, and which line of code caused the delay.
OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. This tutorial instruments a Node.js service, deploys Jaeger, and shows you how to read a trace.
How Distributed Tracing Works
A trace is a collection of spans. Each span represents one unit of work — an HTTP request, a database query, a function call. Spans form a tree: a root span (the incoming request) has child spans (downstream calls).
Every span carries a trace-id that's the same across all services. When service A calls service B, it passes the trace-id in a header (traceparent). Service B creates a child span under the same trace. After the request, all spans are collected and reassembled into the tree.
HTTP request → Service A [root span, 450ms]
└─ DB query [child, 20ms]
└─ gRPC call → Service B [child, 380ms]
└─ HTTP call → Service C [child, 350ms] ← this is slow
└─ DB query [child, 340ms] ← this is the root cause
Part 1: Deploy Jaeger
1helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
2helm repo update
3
4# All-in-one Jaeger (development — in-memory storage)
5helm install jaeger jaegertracing/jaeger \
6 --namespace observability \
7 --create-namespace \
8 --set allInOne.enabled=true \
9 --set collector.enabled=false \
10 --set query.enabled=false \
11 --set agent.enabled=false \
12 --set storage.type=memory
13
14kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=jaeger \
15 -n observability --timeout=60sFor production, use Elasticsearch or Cassandra as the storage backend. The in-memory all-in-one is fine for development and this tutorial.
Access the Jaeger UI:
kubectl port-forward svc/jaeger-query 16686:16686 -n observabilityOpen http://localhost:16686.
The OTLP endpoint (where your services send traces) is:
http://jaeger-collector.observability.svc.cluster.local:4318 (HTTP)
grpc://jaeger-collector.observability.svc.cluster.local:4317 (gRPC)
Part 2: Instrument a Node.js Service
Step 1: Install Dependencies
npm install \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventionsStep 2: Create the Instrumentation File
Create src/instrumentation.ts — this must load before your application code:
1// src/instrumentation.ts
2import { NodeSDK } from '@opentelemetry/sdk-node';
3import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
4import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
5import { Resource } from '@opentelemetry/resources';
6import { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
7
8const exporter = new OTLPTraceExporter({
9 url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
10});
11
12const sdk = new NodeSDK({
13 resource: new Resource({
14 [SEMRESATTRS_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'my-service',
15 [SEMRESATTRS_SERVICE_VERSION]: process.env.APP_VERSION ?? '1.0.0',
16 }),
17 traceExporter: exporter,
18 instrumentations: [
19 getNodeAutoInstrumentations({
20 // Auto-instruments: http, express, pg, redis, grpc, and more
21 '@opentelemetry/instrumentation-fs': { enabled: false }, // noisy, skip
22 }),
23 ],
24});
25
26sdk.start();
27
28// Graceful shutdown
29process.on('SIGTERM', () => {
30 sdk.shutdown().then(() => process.exit(0));
31});Step 3: Load Instrumentation Before Your App
Update your start command in package.json:
{
"scripts": {
"start": "node --require ./dist/instrumentation.js dist/index.js"
}
}Or with ts-node:
{
"scripts": {
"start": "ts-node --require ./src/instrumentation.ts src/index.ts"
}
}Auto-instrumentations hook into Node.js's module system at startup. They patch http, express, pg, ioredis, and other popular libraries automatically — you get traces for database queries and HTTP calls without writing a single span manually.
Step 4: Add Custom Spans for Business Logic
Auto-instrumentation covers infrastructure calls. For your business logic, add manual spans:
1import { trace, SpanStatusCode } from '@opentelemetry/api';
2
3const tracer = trace.getTracer('my-service');
4
5async function processOrder(orderId: string): Promise<Order> {
6 return tracer.startActiveSpan('processOrder', async (span) => {
7 span.setAttribute('order.id', orderId);
8 span.setAttribute('order.source', 'api');
9
10 try {
11 const order = await db.orders.findById(orderId);
12 span.setAttribute('order.status', order.status);
13
14 if (order.status === 'pending') {
15 await tracer.startActiveSpan('validatePayment', async (paymentSpan) => {
16 paymentSpan.setAttribute('payment.method', order.paymentMethod);
17 await paymentService.validate(order.payment);
18 paymentSpan.end();
19 });
20 }
21
22 span.setStatus({ code: SpanStatusCode.OK });
23 return order;
24 } catch (error) {
25 span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
26 span.recordException(error as Error);
27 throw error;
28 } finally {
29 span.end();
30 }
31 });
32}Step 5: Deploy with Environment Variables
In your Kubernetes Deployment:
1env:
2 - name: OTEL_SERVICE_NAME
3 value: "my-service"
4 - name: OTEL_EXPORTER_OTLP_ENDPOINT
5 value: "http://jaeger-collector.observability.svc.cluster.local:4318/v1/traces"
6 - name: OTEL_EXPORTER_OTLP_PROTOCOL
7 value: "http/protobuf"
8 - name: NODE_ENV
9 value: "production"Part 3: Propagate Traces Across Services
Trace context propagates automatically when you make HTTP calls using Node.js's built-in http module or fetch — the auto-instrumentation injects the traceparent header.
If you're using a custom HTTP client, inject the header manually:
1import { context, propagation } from '@opentelemetry/api';
2
3async function callServiceB(data: unknown) {
4 const headers: Record<string, string> = {
5 'Content-Type': 'application/json',
6 };
7
8 // Inject current trace context into the request headers
9 propagation.inject(context.active(), headers);
10
11 const response = await fetch('http://service-b/endpoint', {
12 method: 'POST',
13 headers,
14 body: JSON.stringify(data),
15 });
16
17 return response.json();
18}Service B must also be instrumented and extract the context on the incoming request — auto-instrumentation handles this automatically on the receiving end.
Part 4: Read a Trace in Jaeger
Open the Jaeger UI at http://localhost:16686.
- Select your service from the Service dropdown
- Click Find Traces
- Click any trace to open the waterfall view
The waterfall view shows:
- Total duration at the top (e.g., 450ms)
- Spans stacked by start time, width proportional to duration
- Each span shows service name, operation name, duration
- Click a span to see its attributes (HTTP status, DB query, error details)
Look for:
- The widest spans (most time spent)
- Gaps between spans (time in transit or waiting)
- Red spans (errors)
- Spans with
db.statementattribute (the actual SQL query that ran)
Part 5: Add Tempo as a Backend (Grafana Stack)
If you're already running the kube-prometheus-stack, replace Jaeger with Grafana Tempo and view traces inside Grafana:
1helm repo add grafana https://grafana.github.io/helm-charts
2
3helm install tempo grafana/tempo \
4 --namespace observability \
5 --set tempo.storage.trace.backend=local
6
7# Update OTEL endpoint to point to Tempo
8kubectl set env deployment/my-app \
9 OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.observability.svc.cluster.local:4318/v1/tracesIn Grafana, add a Tempo data source (URL: http://tempo:3100). Then in any Grafana panel, switch to the Explore tab, select the Tempo data source, and paste a trace-id to jump directly to the trace.
Sampling Strategy
In production, don't send 100% of traces — it's expensive. Configure head-based sampling in the SDK:
import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
const sdk = new NodeSDK({
sampler: new TraceIdRatioBasedSampler(0.1), // Sample 10% of traces
// ... rest of config
});Or use tail-based sampling in the OpenTelemetry Collector (keeps traces with errors at 100%):
1# otel-collector-config.yaml
2processors:
3 tail_sampling:
4 decision_wait: 10s
5 policies:
6 - name: errors
7 type: status_code
8 status_code: { status_codes: [ERROR] }
9 - name: slow-requests
10 type: latency
11 latency: { threshold_ms: 1000 }
12 - name: small-percentage
13 type: probabilistic
14 probabilistic: { sampling_percentage: 5 }Sample 100% of errors and slow requests, 5% of everything else. This keeps the signal-to-noise ratio high.
Official References
- OpenTelemetry Documentation — Official OTel docs: concepts, SDKs, collectors, and instrumentation guides for all languages
- OpenTelemetry Collector — The vendor-agnostic collector: receivers, processors, exporters, and deployment patterns
- OpenTelemetry JavaScript SDK — Official Node.js/TypeScript instrumentation SDK with auto-instrumentation packages
- Jaeger Documentation — Jaeger setup, architecture, sampling strategies, and storage backends
- W3C Trace Context — The standard that defines
traceparentandtracestateheaders used for context propagation
We built Podscape to simplify Kubernetes workflows like this — logs, events, and cluster state in one interface, without switching tools.
Struggling with this in production?
We help teams fix these exact issues. Our engineers have deployed these patterns across production environments at scale.