In modern cloud-native architectures, managing traffic ingress into private clusters usually presents a frustrating trade-off. Opening up ports (like 80 or 443) wide to the public internet leaves you exposed to brute-force and vulnerability scans. Hardcoding authorized static IPs in firewall rules is brittle, manual, and fails to scale as your containers spin up and down.

What if your Kubernetes cluster could dynamically declare which ingress IPs are allowed through your border firewalls in real-time?

In this post, we’ll build an enterprise-grade, zero-downtime, BGP-driven dynamic firewall. We will combine Cilium BGP, FRRouting (FRR), ExaBGP, and nftables on Alpine Linux to dynamically inject and withdraw authorized IP addresses in kernel packet-filtering memory—with no firewall service reloads, no security drift, and zero overhead.


1. The Architectural Blueprint

The entire system acts as an automated pipeline. When a new LoadBalancer service is scale-created in Kubernetes, Cilium advertises the IP, which flows down to our core firewall within milliseconds.

graph TD
    classDef k8s fill:#326ce5,stroke:#fff,stroke-width:2px,color:#fff;
    classDef routing fill:#243a5e,stroke:#fff,stroke-width:2px,color:#fff;
    classDef scripts fill:#f07f34,stroke:#fff,stroke-width:2px,color:#fff;
    classDef kernel fill:#7f1d1d,stroke:#fff,stroke-width:2px,color:#fff;

    K_SVC["Kubernetes LoadBalancer <br/> Service Event"]:::k8s --> K_CIL["Cilium BGP Control Plane <br/> (attaches Community 64512:80)"]:::k8s
    
    K_CIL -- "eBGP Advertisement <br/> (23.189.216.x / 2620:cd:a000::x)" --> E_FRR["Edge Router (FRR) <br/> 172.20.0.28"]:::routing
    E_FRR -- "iBGP over CGNAT wg0 Tunnel <br/> (MP-BGP: IPv4 & IPv6 Families)" --> C_FRR["Core Router (FRR) <br/> 100.101.102.1"]:::routing
    C_FRR -- "Route Reflection <br/> (Local loopback peering)" --> EXA["ExaBGP Daemon <br/> (127.0.0.1 / ::1)"]:::routing
    
    EXA -- "Standard Output <br/> (JSON Event Stream)" --> PY["nftables-updater.py <br/> (Child Subprocess as Root)"]:::scripts
    PY --> DEC{{"JSON Event Type?"}}:::scripts
    
    DEC -- "ANNOUNCE" --> SET_ADD["nft add element <br/> (web_allowed_v4 / v6)"]:::kernel
    DEC -- "WITHDRAW" --> SET_DEL["nft delete element <br/> (web_allowed_v4 / v6)"]:::kernel
    
    SET_ADD --> KERN_MEM["Kernel Routing & <br/> Nftables Memory Synced"]:::kernel
    SET_DEL --> KERN_MEM

    subgraph "Kubernetes Cluster"
        K_SVC
        K_CIL
    end
    
    subgraph "Border & Transit"
        E_FRR
    end
    
    subgraph "Core Router (AS40205)"
        C_FRR
        EXA
        PY
        DEC
        SET_ADD
        SET_DEL
        KERN_MEM
    end

One of the most elegant parts of this design is how it handles IPv6 routing across virtual tunnel interfaces like WireGuard (wg0).

The Challenge

Under RFC 2545, running a native IPv6 BGP session (e.g. peering ULA addresses fd00::1 <-> fd00::2) requires a valid IPv6 link-local address (fe80::/10) on the interface. WireGuard tunnels do not auto-generate link-local addresses because they lack MAC addresses. Manually adding link-local addresses to WireGuard tunnels often causes complex routing conflicts with primary ethernet tables or SDN overlays. If link-local addresses are omitted, native IPv6 BGP sessions repeatedly drop:

Interface: wg0 does not have a v6 LL address associated with it, waiting until one is created for it

The Solution: Multiprotocol BGP (MP-BGP)

Instead of native IPv6 peering, we establish one single BGP TCP session over the conflict-free IPv4 WireGuard tunnel addresses (100.101.102.1 <-> 100.101.102.2).

Within this single session, we activate both ipv4 unicast and ipv6 unicast address families.

  • Since the TCP link is IPv4-based, FRR has zero requirement for any link-local IPv6 address.
  • With next-hop-self configured, FRR uses the global ULA IPv6 address of the advertising router (fd00::1 / fd00::2) as the next-hop. Because these are directly configured on wg0, the kernel resolves them instantly in $O(1)$ time without ever touching the link-local table.

3. Step-by-Step Implementation

A. Kubernetes / Cilium: Setting the BGP Community

We configure Cilium’s BGP control plane to attach BGP community 64512:80 to all advertised dynamic LoadBalancer services:

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: advertise-service-cidr
  namespace: kube-system
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        addresses:
          - LoadBalancerIP
      attributes:
        communities:
          standard:
            - "64512:80"
  selector:
    matchLabels:
      bgp-advertiser: "true"

B. Core Router FRRouting (/etc/frr/frr.conf)

The Core router is configured to receive routes from the Edge router over our secure CGNAT WireGuard link (100.101.102.2), and reflect them to local loopback ExaBGP sessions:

router bgp 40205
 bgp router-id 144.202.70.226
 no bgp default ipv4-unicast
 bgp graceful-restart
 bgp bestpath as-path multipath-relax
 
 ! Peering over WireGuard using CGNAT IPs
 neighbor 100.101.102.2 remote-as 40205
 neighbor 100.101.102.2 description FRR iBGP V4
 
 ! Local Loopback Peering for ExaBGP
 neighbor 127.0.0.1 remote-as 40205
 neighbor 127.0.0.1 description Local ExaBGP
 neighbor 127.0.0.1 passive
 neighbor 127.0.0.1 update-source lo
 neighbor ::1 remote-as 40205
 neighbor ::1 description Local ExaBGP IPv6
 neighbor ::1 passive
 neighbor ::1 update-source lo

 address-family ipv4 unicast
  neighbor 100.101.102.2 activate
  neighbor 100.101.102.2 next-hop-self
  neighbor 127.0.0.1 activate
  neighbor 127.0.0.1 route-reflector-client
 exit-address-family

 address-family ipv6 unicast
  neighbor 100.101.102.2 activate
  neighbor 100.101.102.2 next-hop-self
  neighbor ::1 activate
  neighbor ::1 route-reflector-client
 exit-address-family

C. ExaBGP Local Peering (/etc/exabgp/exabgp.conf)

ExaBGP connects to the local FRR daemon on port 179, receives routing updates, and pipes them directly as a standard JSON stream to our Python parser:

process nftables-updater {
    run /usr/bin/python3 /etc/exabgp/nftables-updater.py;
    encoder json;
}

neighbor 127.0.0.1 {
    router-id 127.0.0.1;
    local-address 127.0.0.1;
    local-as 40205;
    peer-as 40205;
    connect 179;
    
    api {
        processes [ nftables-updater ];
        receive {
            parsed;
            update;
        }
    }
}

neighbor ::1 {
    router-id 127.0.0.1;
    local-address ::1;
    local-as 40205;
    peer-as 40205;
    connect 179;
    
    api {
        processes [ nftables-updater ];
        receive {
            parsed;
            update;
        }
    }
}

D. The Heart of the Operations: nftables-updater.py

The lightweight Python script reads ExaBGP updates from standard input (sys.stdin). When it detects community 64512:80, it runs kernel-level nft commands to instantly add or remove the IP from nftables sets:

#!/usr/bin/env python3
import sys
import json
import subprocess
import logging

LOG_FILE = "/var/log/exabgp-nftables.log"
NFTABLES_TABLE = "inet core_fw"

# Community Mapping Schema
COMM_MAP = {
    "64512:80": {
        "v4_set": "web_allowed_v4",
        "v6_set": "web_allowed_v6"
    }
}

logging.basicConfig(filename=LOG_FILE, level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
active_routes = {}

def run_nft_command(args):
    cmd = ["nft"] + args
    try:
        subprocess.run(cmd, capture_output=True, text=True, check=True)
        return True
    except subprocess.CalledProcessError as e:
        logging.error(f"Failed to execute: {' '.join(cmd)}. Error: {e.stderr.strip()}")
        return False

def flush_all_sets():
    logging.info("Flushing sets on startup for clean state...")
    for mapping in COMM_MAP.values():
        run_nft_command(["flush", "set", *NFTABLES_TABLE.split(), mapping["v4_set"]])
        run_nft_command(["flush", "set", *NFTABLES_TABLE.split(), mapping["v6_set"]])

flush_all_sets()

for line in sys.stdin:
    line = line.strip()
    if not line:
        continue
    
    try:
        event = json.loads(line)
    except json.JSONDecodeError:
        continue

    if event.get("type") != "update":
        continue
    
    update = event.get("neighbor", {}).get("message", {}).get("update", {})
    announce = update.get("announce", {})
    attributes = update.get("attribute", {})
    
    # 1. Parse Announcements
    for family, peers in announce.items():
        for next_hop, routes in peers.items():
            for route in routes:
                prefix = route.get("nlri")
                if not prefix:
                    continue
                
                ip = prefix.split('/')[0]
                is_v6 = ":" in ip
                
                # Check for matching community
                communities = [f"{c[0]}:{c[1]}" if isinstance(c, list) else c for c in attributes.get("community", [])]
                for comm in communities:
                    if comm in COMM_MAP:
                        target_set = COMM_MAP[comm]["v6_set"] if is_v6 else COMM_MAP[comm]["v4_set"]
                        logging.info(f"Adding {ip} to set {target_set}")
                        if run_nft_command(["add", "element", *NFTABLES_TABLE.split(), target_set, "{", ip, "}"]):
                            if prefix not in active_routes:
                                active_routes[prefix] = set()
                            active_routes[prefix].add(target_set)

    # 2. Parse Withdrawals
    withdraw = update.get("withdraw", {})
    for family, routes in withdraw.items():
        for route in routes:
            prefix = route.get("nlri")
            if not prefix:
                continue
            
            ip = prefix.split('/')[0]
            if prefix in active_routes:
                for t_set in active_routes[prefix]:
                    logging.info(f"Removing {ip} from set {t_set}")
                    run_nft_command(["delete", "element", *NFTABLES_TABLE.split(), t_set, "{", ip, "}"])
                del active_routes[prefix]

E. Firewall Core Ruleset (/etc/nftables.conf)

Rather than rewriting raw IP rules, nftables checks incoming packets against our dynamic sets @web_allowed_v4 and @web_allowed_v6. This allows $O(1)$ dynamic lookups with zero reload overhead:

table inet core_fw {
	set web_allowed_v4 {
		type ipv4_addr
		flags interval
	}

	set web_allowed_v6 {
		type ipv6_addr
		flags interval
	}

	chain forward {
		type filter hook forward priority filter - 10; policy accept;
		ct state established,related accept
		
		# Allow traffic matching the dynamic BGP-authorized sets
		iifname "eth0" oifname "wg0" ip daddr @web_allowed_v4 tcp dport { 80, 443 } accept
		iifname "eth0" oifname "wg0" ip6 daddr @web_allowed_v6 tcp dport { 80, 443 } accept
		
		# Default Deny for other forward traffic to wg0
		iifname "eth0" oifname "wg0" drop
	}
}

4. Self-Healing Sequence & Crash Resilience

A common concern with stateful dynamic firewalls is: what happens if a daemon crashes, a script halts, or the router restarts?

Our architecture is completely stateless and self-healing. We use the loopback BGP session as a single-source-of-truth:

  1. When ExaBGP restarts, the Python updater initializes and flushes both sets (web_allowed_v4 and web_allowed_v6) to ensure a completely clean state and avoid stale rule drift.
  2. ExaBGP establishes the iBGP session to the local FRRouting daemon (127.0.0.1 / ::1).
  3. Since ExaBGP is configured as a Route Reflector Client, FRR instantly pushes a complete routing table dump (all active BGP announcements + communities) to ExaBGP.
  4. The Python script parses the dump and repopulates the active kernel sets in milliseconds.
sequenceDiagram
    autonumber
    participant Kernel as Nftables (Kernel Memory)
    participant Python as nftables-updater.py
    participant Exa as ExaBGP Daemon
    participant FRR as Core FRR bgpd

    Note over Kernel, FRR: ExaBGP Daemon restarts or crashes (Service Restored)
    Exa->>Python: Spawns python process as root
    activate Python
    Note over Python: Initialization Phase
    Python->>Kernel: Executes 'nft flush set inet core_fw web_allowed_v4'
    Python->>Kernel: Executes 'nft flush set inet core_fw web_allowed_v6'
    Note over Kernel: Kernel sets are cleared (Prevents Stale IP Drift)
    
    Exa->>FRR: Initiates loopback iBGP session (127.0.0.1 / ::1)
    activate FRR
    FRR-->>Exa: BGP Session Established
    Note over FRR: Route Reflector client triggers full table dump
    FRR->>Exa: Sends complete BGP UPDATE table (routes + communities)
    deactivate FRR
    
    loop For each prefix in UPDATE table
        Exa->>Python: Sends JSON UPDATE event
        Python->>Python: Extracts community (matches 64512:80)
        Python->>Kernel: Executes 'nft add element inet core_fw <set> { IP }'
    end
    Note over Kernel: Dynamic Sets are 100% repopulated and synchronized!
    deactivate Python

Conclusion

By mapping Kubernetes announcements to border firewall rules using standard routing protocols (iBGP/MP-BGP) and dynamic sets (nftables), we’ve built a system that is incredibly robust, highly secure, and extremely light on resources.

Our transition to a dedicated CGNAT WireGuard link (100.101.102.0/24) and our Multiprotocol BGP design has completely eliminated any link-local or routing metric conflicts on our core interfaces. We now have a border firewall that dynamically adapts to our Kubernetes ingress footprint in real-time, providing flawless filtering with $O(1)$ performance.