Overview#

In the Kubernetes world, networking is often the most complex layer to manage. While many beginners start with simple solutions like Flannel, production environments demand something more robust, performant, and secure. This is where Calico comes in.

Calico is a high-performance, open-source networking and network security solution for containers, virtual machines, and native host-based workloads. It is widely considered the industry standard for clusters that require native performance and “Zero-Trust” security.

Background & Motivation#

Most CNI plugins (like Flannel) rely on Overlay Networks (such as VXLAN or IPIP). These work by “wrapping” Pod traffic inside another packet to move it across the cluster. While easy to set up, they introduce several problems:

  1. Overhead: Wrapping and unwrapping packets consumes CPU cycles and increases latency.
  2. Complexity: Debugging is harder because standard tools like tcpdump see the “tunnel” rather than the actual Pod traffic.
  3. Security Gap: Simple overlays provide connectivity but offer no built-in way to block traffic between Pods (Network Policies).

Calico was designed to solve these issues by treating every server as a smart router, using the same protocols that power the global Internet.

Architecture & Components#

Calico’s architecture is modular and highly symmetric. Every node in the cluster runs the same set of components to manage its local environment and communicate with its peers.

System Architecture#

      KUBERNETES DATASTORE (etcd / K8s API Server)
      ┌────────────────────────────────────────┐
      │  Desired State: Pod IPs, NetworkPolicies │
      └──────┬──────────────────────────┬──────┘
             │                          │
      (Watches State)            (Watches State)
             │                          │
  ┌──────────▼────────────────┐  ┌──────▼────────────────────┐
  │         NODE 1            │  │           NODE 2          │
  │  ┌─────────────────────┐  │  │  ┌─────────────────────┐  │
  │  │       FELIX         │  │  │  │       FELIX         │  │
  │  └──────────┬──────────┘  │  │  └──────────┬──────────┘  │
  │             │ (Writes)    │  │             │ (Writes)    │
  │  ┌──────────▼──────────┐  │  │  ┌──────────▼──────────┐  │
  │  │ Linux Kernel        │  │  │  │ Linux Kernel        │  │
  │  │ [Routing Table]     │  │  │  │ [Routing Table]     │  │
  │  │ [iptables/eBPF]     │  │  │  │ [iptables/eBPF]     │  │
  │  └──────────▲──────────┘  │  │  └──────────▲──────────┘  │
  │             │ (Updates)   │  │             │ (Updates)   │
  │  ┌──────────┴──────────┐  │  │  ┌──────────┴──────────┐  │
  │  │       BIRD          │◄─┼──┼► │       BIRD          │  │
  │  └─────────────────────┘  │  │  └─────────────────────┘  │
  │      (BGP Client)         │  │      (BGP Client)         │
  │                           │  │                           │
  └───────┬───────────┬───────┘  └───────┬───────────┬───────┘
          │           │                  │           │
   ───────┴───────────┴─── PHYSICAL L2/L3 NETWORK ───┴───────

Key Components#

Component Role Description
Felix Node Agent The “brain” on each node. It programs routes and ACLs into the kernel.
BIRD BGP Daemon Distributes routing information between nodes using BGP.
Typha Fanout Proxy (Optional) Sits between Felix and etcd to handle large-scale clusters (100+ nodes).
Datastore Source of Truth Usually the Kubernetes API server (backed by etcd).
CNI Plugin Integration The standard K8s interface used to request IPs and set up interfaces.

How It Works (Step-by-Step)#

1. The Control Plane: Sharing the Map#

When a new Pod is scheduled on Node 2:

  1. IPAM: The Calico CNI plugin assigns an IP (e.g., 10.244.2.5) to the Pod.
  2. Local Setup: Felix detects the new Pod and writes a local route into the Node 2 kernel: “To reach 10.244.2.5, use the virtual interface for this Pod.”
  3. Advertisement: BIRD on Node 2 sends a BGP message to all its peers: “I am Node 2, and I have a route for 10.244.2.5.”
  4. Convergence: Every other node in the cluster updates its routing table: “To reach 10.244.2.5, send it to the IP of Node 2.”

2. The Data Plane: Moving the Packet#

When Pod A on Node 1 wants to talk to Pod B (10.244.2.5):

  1. The packet leaves Pod A and hits the Node 1 Kernel.
  2. The kernel looks at its Routing Table and sees the entry: 10.244.2.5 via <Node 2 IP>.
  3. The packet is sent directly over the physical network (Ethernet/IP) to Node 2. There is no encapsulation.
  4. Node 2 receives the packet, looks at its routing table, and delivers it to Pod B’s interface.

The Modern Edge: eBPF & nftables Data Planes#

Calico historically used iptables for security and routing, but it has evolved into a multi-dataplane solution. As of 2026, two modern alternatives have taken center stage:

  • eBPF Data Plane: Uses custom bytecode hooks to bypass the legacy networking stack. It offers extremely low latency and native K8s Service handling (replacing kube-proxy).
  • nftables Data Plane (GA late 2025): The modern successor to iptables. It offers faster rule processing, atomic updates, and a cleaner configuration model, making it the preferred choice for clusters where eBPF is not a requirement but scale is critical.

Key Benefits of Modern Data Planes:

  • Performance: Massive reduction in “per-packet” CPU cost.
  • Native Service Handling: No more kube-proxy bottlenecks.
  • Source IP Preservation: Preserves the original client IP for external traffic.
  • Direct Server Return (DSR): Return traffic bypasses the ingress node, saving bandwidth and latency.

The Future: AI-Driven Operations (2026)#

The biggest shift in 2026 is the move from “managing configs” to “conversing with the network.”

  • Calico AI Assistant: Platform engineers can now troubleshoot using natural language. For example: “Why is traffic between service A and service B blocked?” The assistant analyzes flow logs and policies to provide plain-English answers and suggest fixes.
  • Native WAF & Gateway API: Calico has unified North-South (Ingress) and East-West security. Its Ingress Gateway now includes a native Web Application Firewall (WAF) and fully implements the Kubernetes Gateway API, eliminating the need for separate ingress controllers.

Production Scenarios#

  1. High-Performance Trading/Finance: By peering directly with Top-of-Rack (ToR) physical switches, Calico allows Pods to have “native” visibility on the corporate network with zero latency overhead.
  2. Compliance-Heavy E-commerce: Using Calico’s advanced Network Policies, companies can enforce PCI-DSS requirements by ensuring that only specific “Backend” pods can ever reach the “Database” pods.
  3. Enterprise Auditing: Calico Egress Gateways allow cluster administrators to force all traffic leaving for a specific external service (like a legacy mainframe or a 3rd party API) to exit through a fixed, known IP address for firewall auditing.

Hands-On: Basic NetworkPolicy#

Here is a typical production example. We want to protect our database so that only the backend application can talk to it.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: protect-database
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: database  # Target the database pods
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend  # ONLY allow pods with this label
    ports:
    - protocol: TCP
      port: 5432       # Only allow DB traffic

References#