eBPF for Platform Engineers: Cilium, Hubble, and Tetragon (2026)

Every few years a piece of infrastructure technology gets declared revolutionary before most teams have a reason to care about it. eBPF is one of those — and unlike most, the hype is at least partially justified.

If you are already comparing Cilium vs Calico on AKS, you're already encountering the power of eBPF. But "eBPF is amazing" isn't actionable. Let me tell you what it actually is, what Cilium and Tetragon do with it, and whether you should care right now.

What eBPF Actually Is

eBPF stands for "extended Berkeley Packet Filter." The name is historical baggage — modern eBPF has little to do with packet filtering. What it actually is: a way to run sandboxed programs inside the Linux kernel without writing kernel modules.

Kernel modules are dangerous. They run with full kernel privileges, a bug in one can panic the entire system, and they require kernel version compatibility. eBPF programs are different — they're verified by the kernel's in-kernel verifier before execution. The verifier ensures no unbounded loops, no invalid memory access, no crash paths. If your program doesn't pass verification, it doesn't load.

Once loaded, eBPF programs attach to hooks — network events, system calls, kernel functions, tracepoints. They run in response to those events with near-zero overhead because they execute in kernel space, bypassing the cost of context-switching to userspace.

From a platform engineering perspective, eBPF enables three things that previously required either heavyweight agents or kernel modules:

Network data plane — intercept and process packets at kernel level without iptables
Observability — collect syscall, network, and process telemetry with minimal overhead
Security enforcement — block syscalls, network connections, and process executions in kernel space

Why kube-proxy Is a Problem Worth Solving

The default Kubernetes network implementation uses kube-proxy, which programs iptables rules on every node to implement Service routing. At small scale, this is fine. At large scale, it becomes a genuine operational problem.

iptables is not designed for dynamic updates at high frequency. Rules are evaluated linearly — adding a Service adds O(n) rules to every node. With 1000 Services and 5 pods each, you're evaluating 5000+ rules per connection. kube-proxy also watches the API server and rewrites the full iptables ruleset on every change, which can cause connection drops under churn.

The numbers I've seen in production: clusters with 500+ Services start showing measurable latency from iptables rule evaluation. At 2000+ Services, kube-proxy churning the ruleset becomes a CPU and connection-drop problem.

eBPF-based networking eliminates this entirely. Instead of iptables rules, packet processing happens via BPF maps — O(1) hash table lookups in kernel space. Adding a Service doesn't add rules to evaluate; it adds an entry to a map.

Cilium: Replacing kube-proxy and Extending NetworkPolicy

Cilium is a CNI plugin built on eBPF. It does three things: pod networking, Service load balancing (replacing kube-proxy), and network policy enforcement.

Installation

Installing Cilium with kube-proxy replacement on EKS:

bash

1helm repo add cilium https://helm.cilium.io/
2helm repo update
3
4helm install cilium cilium/cilium \
5  --namespace kube-system \
6  --set kubeProxyReplacement=strict \
7  --set k8sServiceHost=<API_SERVER_ENDPOINT> \
8  --set k8sServicePort=443 \
9  --set ipam.mode=eni \
10  --set eni.enabled=true \
11  --set egressMasqueradeInterfaces=eth0

The kubeProxyReplacement=strict flag tells Cilium it must fully replace kube-proxy — it will fail to start if it can't. Don't use partial in production; you'll end up with split brain between iptables and BPF.

If you're on EKS, make sure to disable kube-proxy before installing Cilium:

bash

kubectl -n kube-system patch daemonset kube-proxy \
  -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-existing":"true"}}}}}'

L7 Network Policy

Standard Kubernetes NetworkPolicy operates at L3/L4 — IP addresses and ports. Cilium extends this to L7. You can write policies that allow HTTP GET to /api/v1/users but deny POST, or allow DNS queries only to specific domains.

yaml

1apiVersion: cilium.io/v2
2kind: CiliumNetworkPolicy
3metadata:
4  name: allow-api-read-only
5  namespace: production
6spec:
7  endpointSelector:
8    matchLabels:
9      app: backend
10  ingress:
11  - fromEndpoints:
12    - matchLabels:
13        app: frontend
14    toPorts:
15    - ports:
16      - port: "8080"
17        protocol: TCP
18      rules:
19        http:
20        - method: GET
21          path: /api/v1/.*

This is something you simply cannot do with standard NetworkPolicy. The enforcement happens in the eBPF data plane — no sidecar proxy required.

Hubble: Network Observability Without the Overhead

One of the chronic problems with Kubernetes networking is visibility. When a connection fails, you often have no idea whether the packet was dropped at the CNI level, rejected by a NetworkPolicy, or never sent at all.

Hubble is Cilium's observability layer. It hooks into the same eBPF programs that handle packet forwarding and extracts flow data — source/destination, protocol, verdict (forwarded/dropped), and the policy rule that made the decision.

Enable it during Cilium installation:

bash

1helm upgrade cilium cilium/cilium \
2  --namespace kube-system \
3  --reuse-values \
4  --set hubble.enabled=true \
5  --set hubble.relay.enabled=true \
6  --set hubble.ui.enabled=true \
7  --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}"

Then use the CLI:

bash

1# Install hubble CLI
2HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
3curl -L --fail --remote-name-all \
4  "https://github.com/cilium/hubble/releases/download/${HUBBLE_VERSION}/hubble-linux-amd64.tar.gz"
5tar xzvf hubble-linux-amd64.tar.gz
6sudo mv hubble /usr/local/bin
7
8# Port-forward relay
9kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
10
11# Watch flows
12hubble observe --namespace production --verdict DROPPED
13hubble observe --namespace production --pod backend-7d9f5b4c8-xk2mn

The --verdict DROPPED filter is immediately useful for debugging NetworkPolicy issues. Instead of adding temporary allow-all policies and hoping the problem goes away, you can see exactly which policy dropped the packet.

Hubble UI provides a service map — a real-time graph of which pods are talking to which, with flow rates and error rates. This is generated from actual observed traffic, not service discovery metadata, so it reflects what's actually happening.

Tetragon: Runtime Security Without Sidecars

Tetragon is the security enforcement piece (which complements your Kubernetes RBAC strategy). It uses eBPF to observe and optionally block syscalls, process executions, file access, and network connections at the kernel level.

The key difference from something like Falco (which I respect and have used) is enforcement. Falco detects and alerts. Tetragon can block — a BPF program attached to the syscall hook can return EPERM before the operation completes. No process, no network call, no file write ever happens.

Install Tetragon:

bash

helm repo add cilium https://helm.cilium.io/
helm install tetragon cilium/tetragon \
  --namespace kube-system \
  --set tetragon.enableProcessCred=true \
  --set tetragon.enableProcessNs=true

TracingPolicy: Detecting Crypto Mining

Here's a real-world TracingPolicy that detects and blocks crypto mining by watching for network connections to common mining pool ports:

yaml

1apiVersion: cilium.io/v1alpha1
2kind: TracingPolicy
3metadata:
4  name: block-crypto-mining
5spec:
6  kprobes:
7  - call: tcp_connect
8    syscall: false
9    args:
10    - index: 0
11      type: sock
12    selectors:
13    - matchArgs:
14      - index: 0
15        operator: DAddr
16        values:
17        - "0.0.0.0/0"
18      matchActions:
19      - action: Signal
20        argSig: 9

A more targeted policy — detecting shell execution inside containers (a common indicator of compromise):

yaml

1apiVersion: cilium.io/v1alpha1
2kind: TracingPolicy
3metadata:
4  name: detect-shell-exec
5spec:
6  kprobes:
7  - call: sys_execve
8    syscall: true
9    args:
10    - index: 0
11      type: string
12    selectors:
13    - matchArgs:
14      - index: 0
15        operator: Postfix
16        values:
17        - /sh
18        - /bash
19        - /dash
20      matchNamespaces:
21      - namespace: Pid
22        operator: NotIn
23        values:
24        - "host_pid"
25      matchActions:
26      - action: Post

This emits a security event (without blocking) whenever a shell is exec'd inside a container. You can escalate to action: Signal with argSig: 9 (SIGKILL) once you're confident in your policy.

Overhead Reality Check

The concern I hear most often: "eBPF programs run in kernel space on every packet/syscall — won't that tank performance?"

In practice, Tetragon's overhead is lower than any userspace agent. Cilium's own benchmarks show 5-10% network throughput overhead vs raw networking — compare that to iptables which can be 20-30% at scale. For Tetragon specifically, the overhead is proportional to how selective your TracingPolicies are. Watching every execve on every container is expensive. Watching only specific binaries in specific namespaces is nearly free.

Migrating from Flannel or Calico

I've done this migration twice — once from Flannel, once from Calico. The honest answer is: there is no in-place migration. You're replacing the CNI, which means replacing the overlay network all pods run on.

The approach I recommend:

Option 1: Blue-green node groups (preferred for EKS)

Provision new node group with Cilium-compatible configuration
Cordon all old nodes: kubectl cordon <node>
Drain pods to new nodes: kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
Old nodes come up on new node group with Cilium running
Terminate old nodes

This takes 20-40 minutes per node if your drain is slow. At least it's zero-downtime if your workloads have proper PodDisruptionBudgets.

Option 2: Full cluster replacement

Provision a parallel cluster with Cilium, migrate workloads via GitOps, cut DNS over. More work, but a cleaner outcome and a useful exercise for your disaster recovery process.

What you can't do: install Cilium alongside Flannel and expect pods to communicate. Two CNIs fighting over the same network namespace is a bad time.

Should You Do This Now?

Here's my honest take: if you're running fewer than 200 Services and you don't have specific network security requirements, you probably don't need Cilium today. kube-proxy works fine at small scale, and adding Cilium introduces operational complexity.

The cases where I'd push for Cilium:

You're hitting kube-proxy scaling problems (500+ Services, high churn)
You need L7 network policy (HTTP method filtering, gRPC service filtering)
You want runtime security enforcement without deploying sidecar agents to every pod
You're building a multi-tenant platform where network isolation guarantees matter

Hubble is genuinely useful even without the full Cilium migration, but you do need Cilium as the CNI to get it.

The technology is production-ready. Cilium is CNCF Graduated. Tetragon is CNCF Incubating. Major cloud providers and large-scale operators run this in production. The question isn't whether it works — it's whether the complexity overhead is justified for your current scale.

Frequently Asked Questions

Does eBPF require a specific Linux kernel version?

Yes. Most modern Cilium/Tetragon features require at least Linux kernel 4.19, with 5.10+ being the recommended baseline for advanced features like BPF-to-BPF calls and BTF (BPF Type Format) support. Most managed Kubernetes services (EKS, GKE, AKS) currently ship with eBPF-compatible kernels.

Can I use eBPF for observability without changing my CNI?

Yes! Tools like Pixie or Parca use eBPF to collect observability data (distributed tracing, profiling) without requiring you to replace your existing networking plugin (like Flannel or Calico). However, you won't get the networking performance benefits of Cilium.

Is eBPF safe to run in production?

eBPF is designed to be safer than traditional kernel modules because every program is verified before it's allowed to run. The verifier ensures the program cannot crash the kernel or access unauthorized memory. Major tech companies like Meta, Google, and Netflix have been running eBPF at massive scale for years.

How does Tetragon differ from Falco?

Falco primarily works by monitoring syscalls from userspace. Tetragon runs its enforcement logic directly in the kernel using eBPF. This allows Tetragon to not just detect, but actually block malicious activity (like an unauthorized shell execution) before it even starts.

Running Cilium in production and hitting edge cases with your network policies or Tetragon enforcement? Talk to us at Coding Protocols. We help platform teams implement eBPF-based networking and security that actually holds up under production load.

eBPF for Platform Engineers: Cilium, Hubble, and Tetragon Without the Hype

What eBPF Actually Is

Why kube-proxy Is a Problem Worth Solving

Cilium: Replacing kube-proxy and Extending NetworkPolicy

Installation

L7 Network Policy

Hubble: Network Observability Without the Overhead

Tetragon: Runtime Security Without Sidecars

TracingPolicy: Detecting Crypto Mining

Overhead Reality Check

Migrating from Flannel or Calico

Should You Do This Now?

Frequently Asked Questions

Does eBPF require a specific Linux kernel version?

Can I use eBPF for observability without changing my CNI?

Is eBPF safe to run in production?

How does Tetragon differ from Falco?

Related Topics

Read Next

Cilium vs Calico on AKS: Which CNI Should You Actually Use?

Podscape vs Lens vs k9s: A Kubernetes Management Tool Comparison

RBAC misconfigurations that break production (and how to fix them)