Anycast + BGP: Fast Failover Without DNS
Background & Motivation#
When disaster happens, SRE usually switches traffic by updating DNS records to point to a healthy endpoint. However, this can be slow due to DNS caching at multiple layers:
| Layer | Description |
|---|---|
| Browser cache | Browsers cache DNS results (Chrome: up to 60s) |
| OS cache | OS-level DNS cache (e.g. systemd-resolved, dnsmasq) |
| DNS Resolver cache | ISP or corporate resolvers cache based on TTL |
| Application cache | Some apps/libraries cache DNS independently (e.g. JVM caches DNS indefinitely by default) |
Even after updating the DNS record, clients continue using the stale cached IP until the TTL expires. If the TTL was set to 3600s (1 hour), it could take up to 1 hour for all traffic to shift.
We need a mechanism that doesn’t change the IP address — only changes where the traffic is routed.
What Is Anycast#
Anycast is not a device — it’s an IP addressing strategy. You assign the same IP to multiple machines in different data centers. That’s it.
| Unicast | Anycast | |
|---|---|---|
| Concept | 1 IP → 1 destination | 1 IP → many destinations |
| Who decides routing | IP is unique, only one place to go | BGP routers pick the nearest |
| Setup | Assign IP to one server | Assign same IP to servers in multiple DCs |
The “implementation” is literally just configuration — each DC’s router announces the same IP prefix:
# Tokyo DC router config
router bgp 13335
network 1.2.3.4/32 ← announce this IP
# London DC router config
router bgp 13335
network 1.2.3.4/32 ← announce the SAME IP
The moment two routers announce the same IP prefix via BGP, you have anycast.
What Is BGP#
BGP (Border Gateway Protocol) is the routing protocol that glues the internet together. Each ISP, cloud provider, or large company is an Autonomous System (AS) with a unique AS Number (ASN).
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Google │ │ Comcast │ │ Cloudflare │
│ ASN: 15169 │◄──BGP──►│ ASN: 7922 │◄──BGP──►│ ASN: 13335 │
└──────────────┘ └──────────────┘ └──────────────┘
ASN vs Router ID#
| Term | Scope | Example |
|---|---|---|
| ASN (AS Number) | Per organization | Google = 15169, Cloudflare = 13335 |
| Router ID | Per router within an AS | Usually set to the router’s loopback IP (e.g. 10.0.1.1) |
| ASN range | 1-64511 = public, 65001-65534 = private | Similar to public vs private IPs |
ASN is assigned by IANA.
How BGP Routes Traffic#
BGP routers exchange route advertisements to build a routing table. Each router picks the best path based on:
Internet router's BGP table:
Prefix Next-Hop AS Path Metric Preferred
─────────────────────────────────────────────────────────────
1.2.3.4/32 203.0.1.1 13335 10 ✓ (shortest)
1.2.3.4/32 198.51.1.1 7922 13335 50
1.2.3.4/32 192.0.1.1 3356 13335 30
Same ASN, same IP prefix — but BGP differentiates them by next-hop IP and AS path length, and picks the best one.
How They Work Together#
Anycast is the idea (assign same IP to multiple locations), BGP is the engine (advertises routes and picks the shortest path).
Organization: Cloudflare (ASN 13335)
┌──────────────────────────────────────────────────────────────┐
│ │
│ Tokyo DC London DC Virginia DC │
│ Router ID: 10.0.1.1 Router ID: 10.0.2.1 Router ID: 10.0.3.1
│ Next-hop: 203.0.1.1 Next-hop: 198.51.1.1 Next-hop: 192.0.1.1
│ Announces: 1.2.3.4/32 Announces: 1.2.3.4/32 Announces: 1.2.3.4/32
│ │
└──────────────────────────────────────────────────────────────┘
Each DC announces the same IP. Internet routers see multiple paths and pick the nearest one.
Where Anycast Sits in the Load Balancing Stack#
Anycast + BGP works between the client and the L4 LB — it’s the network-level routing that decides which data center receives the traffic before any load balancer sees the packet.
Client
│
│ ① DNS resolves to Anycast IP (e.g. 1.2.3.4)
▼
┌─────────────────┐
│ DNS (GSLB) │ ← Returns the same Anycast IP (not per-region IPs)
└────────┬────────┘
│
│ ② BGP routes packet to nearest DC announcing 1.2.3.4
│
▼
┌── BGP/Anycast ──┐
│ Network Layer │ ← Anycast works HERE
│ (Internet │ Routers pick the nearest DC based on
│ Routers) │ BGP shortest AS path
└───────┬──────────┘
│
▼ (packet arrives at nearest DC)
┌─────────────────┐
│ L4 LB │ ← The Anycast IP is the VIP of L4 LB
│ (NLB / LVS) │ Distributes TCP connections across L7 LBs
└────────┬────────┘
▼
┌─────────────────┐
│ L7 LB │ ← Path routing, rate limiting, canary, headers
│ (Nginx / Envoy) │ TLS termination, request manipulation
└────────┬────────┘
▼
┌─────────────────┐
│ App Servers │
│ (RS pool) │
└─────────────────┘
Key insight: the Anycast IP is the L4 LB’s VIP. Every DC has an L4 LB listening on 1.2.3.4, and BGP decides which DC gets the packet.
With vs Without Anycast#
| Without Anycast | With Anycast | |
|---|---|---|
| DNS returns | Different IPs per region (1.1.1.1 for US, 2.2.2.2 for EU) |
Same IP everywhere (1.2.3.4) |
| DC selection | DNS (GSLB) picks the DC | BGP picks the DC |
| Failover | Update DNS → wait for TTL | BGP withdraws route → ~30-90s |
| GSLB role | Critical — decides DC routing | Optional — can layer on top for finer control |
In practice, big providers often combine both: Anycast for fast failover + GSLB for fine-grained traffic shaping (e.g. weighted routing, canary by region).
Failover Flow#
Normal state:
Tokyo ──announces 1.2.3.4──► BGP peers
London ──announces 1.2.3.4──► BGP peers
Tokyo goes down:
Tokyo ──withdraws 1.2.3.4──► BGP peers (or peers detect link failure)
London ──still announces────► BGP peers
BGP convergence: ~30-90 seconds
No DNS record changes. No TTL waiting. The IP stays the same — only the route changes.
Anycast + BGP vs DNS Failover#
| Anycast + BGP | DNS Failover | |
|---|---|---|
| Failover speed | ~30-90s (BGP convergence) | Minutes to hours (TTL dependent) |
| Client change needed | None | Must re-resolve DNS |
| IP address | Stays the same | Changes to new IP |
| Granularity | Network-level (per packet) | Per DNS query |
| Limitation | Stateful connections (TCP) may break on route change | Cached entries serve stale IPs |
Caveats & Solutions#
BGP route changes can shift traffic mid-connection, breaking TCP sessions (since TCP is bound to a specific src/dst IP pair and port).
| Solution | How it helps |
|---|---|
| ECMP pinning | Consistent hashing on flow (src IP + dst IP + ports) to keep flows on the same path |
| Connection draining | Gracefully drain existing connections before withdrawing the BGP route |
| QUIC / HTTP3 | Connection ID-based (not IP-based), naturally survives route changes |