Skip to content

5.1.1 Networking

The networking layer provides pod-to-pod connectivity, load balancing, traffic routing, and DNS resolution. Cilium replaces kube-proxy and serves as the CNI, with Gateway API handling external traffic routing. CoreDNS provides custom internal DNS resolution.

How to use this page

Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.

All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.

  • Using the existing rciis-devops repository: All files already exist. Skip the mkdir and git add/git commit commands — they are for users building a new repository. Simply review the files, edit values for your environment, and push.
  • Building a new repository from scratch: Follow the mkdir, file creation, and git commands in order.
  • No Git access: Expand the "Alternative: Helm CLI" block under each Install section.

Cilium

Cilium is the CNI (Container Network Interface) plugin that provides pod networking, kube-proxy replacement, WireGuard encryption, Gateway API support, and Hubble observability. It runs as a DaemonSet on every node.

On Bare Metal, Cilium also provides L2 load balancer IP announcements (replacing MetalLB). On AWS, load balancing is handled by the AWS Load Balancer Controller with NLBs.

Install

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base
Field Value Explanation
chart cilium The Helm chart name from the Cilium registry
version 1.19.0 Pinned chart version — update this to upgrade Cilium
sourceRef.name cilium References a HelmRepository CR pointing to https://helm.cilium.io
targetNamespace kube-system Cilium must run in kube-system for CNI integration
crds: CreateReplace Automatically installs and updates Cilium CRDs
remediation.retries 3 Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/cilium.yaml:

flux/infra/base/cilium.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cilium
  namespace: flux-system
spec:
  targetNamespace: kube-system
  interval: 30m
  chart:
    spec:
      chart: cilium
      version: "1.19.0"
      sourceRef:
        kind: HelmRepository
        name: cilium
        namespace: flux-system
  releaseName: cilium
  install:
    createNamespace: true
    crds: CreateReplace
    remediation:
      retries: 3
  upgrade:
    crds: CreateReplace
    remediation:
      retries: 3
Alternative: Helm CLI

If you do not have Git access, install Cilium directly:

helm repo add cilium https://helm.cilium.io
helm repo update
helm upgrade --install cilium cilium/cilium \
  --namespace kube-system \
  --version 1.19.0 \
  -f values.yaml

Configuration

The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Cilium behaves. Select your environment and deployment size below.

Create the environment overlay directory:

mkdir -p flux/infra/aws/cilium
mkdir -p flux/infra/baremetal/cilium
mkdir -p flux/infra/baremetal/cilium

Environment Patch

The patch file sets the cluster identity, network device, and load balancing strategy. These differ fundamentally between AWS and Bare Metal.

Save the following as the patch file for your environment:

On AWS, Cilium uses tunnel mode (Geneve) for pod networking and delegates load balancing to AWS NLBs via annotations. Gateway API resources are used for HTTP routing with NLB annotations for internet-facing access.

flux/infra/aws/cilium/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cilium
spec:
  # Adopt the existing Helm release installed during bootstrap
  storageNamespace: kube-system
  values:
    cluster:
      name: rciis-aws
      id: 2
    devices: eth0
    routingMode: tunnel
    tunnelProtocol: geneve
    ingressController:
      enabled: true
    hubble:
      tls:
        auto:
          method: certmanager
          certManagerIssuerRef:
            group: cert-manager.io
            kind: Issuer
            name: cilium-ca-issuer
    clustermesh:
      apiserver:
        replicas: 1
        service:
          type: LoadBalancer
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
            service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
Setting Value Why
storageNamespace: kube-system Adopts the Helm release created during helmfile bootstrap
routingMode: tunnel Geneve tunnels AWS VPC CNI alternative — encapsulates pod traffic in Geneve tunnels
ingressController.enabled true Required for operator RBAC permissions
hubble.tls.auto.method certmanager Uses cert-manager to issue Hubble relay TLS certificates

On Bare Metal, Cilium uses native routing and provides L2 load balancing by announcing LoadBalancer IPs directly on the local network via ARP. There is no cloud load balancer — Cilium itself is the load balancer.

flux/infra/baremetal/cilium/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cilium
spec:
  values:
    cluster:
      name: rciis-kenya
      id: 1
    devices: eth0
    ingressController:
      enabled: false
    l2announcements:
      enabled: true
      interface: eth0
      leaseDuration: 15s
      leaseRenewDeadline: 5s
      leaseRetryPeriod: 1s
    clustermesh:
      apiserver:
        service:
          type: LoadBalancer
          loadBalancerIP: "192.168.30.42"
Setting Value Why
l2announcements.enabled true Cilium responds to ARP requests for LoadBalancer IPs on the local network
l2announcements.interface eth0 The network interface used for ARP announcements — change to match your NIC
loadBalancerIP 192.168.30.42 Static IP for the ClusterMesh API server — must be in the L2 pool range
ingressController.enabled false Cilium Ingress controller is disabled — Gateway API is used instead

On Bare Metal, Cilium uses native routing and provides L2 load balancing by announcing LoadBalancer IPs directly on the local network via ARP. There is no cloud load balancer — Cilium itself is the load balancer.

flux/infra/baremetal/cilium/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: cilium
spec:
  values:
    cluster:
      name: rciis-kenya
      id: 1
    devices: eth0
    ingressController:
      enabled: false
    l2announcements:
      enabled: true
      interface: eth0
      leaseDuration: 15s
      leaseRenewDeadline: 5s
      leaseRetryPeriod: 1s
    clustermesh:
      apiserver:
        service:
          type: LoadBalancer
          loadBalancerIP: "192.168.30.42"
Setting Value Why
l2announcements.enabled true Cilium responds to ARP requests for LoadBalancer IPs on the local network
l2announcements.interface eth0 The network interface used for ARP announcements — change to match your NIC
loadBalancerIP 192.168.30.42 Static IP for the ClusterMesh API server — must be in the L2 pool range
ingressController.enabled false Cilium Ingress controller is disabled — Gateway API is used instead

Helm Values

The values file controls Cilium's core features. Save the following as the values file for your environment and deployment size:

flux/infra/aws/cilium/values.yaml
# Cilium — AWS HA configuration

cluster:
  name: rciis-aws

devices: eth0

serviceMesh:
  enabled: true

envoyConfig:
  enabled: true

loadBalancer:
  l7:
    backend: envoy

encryption:
  enabled: true
  type: wireguard
  nodeEncryption: true

extraConfig:
  node-encryption-opt-out-labels: ""

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: true
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: true
  enableAlpn: true
  enableAppProtocol: true
flux/infra/aws/cilium/values.yaml
# Cilium — AWS Non-HA configuration

cluster:
  name: rciis-aws

devices: eth0

serviceMesh:
  enabled: false

envoyConfig:
  enabled: false

encryption:
  enabled: false

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: false
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: false
flux/infra/baremetal/cilium/values.yaml
# Cilium — Bare Metal HA configuration

cluster:
  name: rciis-kenya

devices: eth0

l2announcements:
  enabled: true
  interface: eth0
  leaseDuration: 15s
  leaseRenewDeadline: 5s
  leaseRetryPeriod: 1s

serviceMesh:
  enabled: true

envoyConfig:
  enabled: true

ingressController:
  enabled: true

loadBalancer:
  l7:
    backend: envoy

encryption:
  enabled: true
  type: wireguard
  nodeEncryption: true

extraConfig:
  node-encryption-opt-out-labels: ""

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: true
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: true
  enableAlpn: true
  enableAppProtocol: true
flux/infra/baremetal/cilium/values.yaml
# Cilium — Bare Metal Non-HA configuration

cluster:
  name: rciis-kenya

devices: eth0

l2announcements:
  enabled: true
  interface: eth0
  leaseDuration: 15s
  leaseRenewDeadline: 5s
  leaseRetryPeriod: 1s

serviceMesh:
  enabled: false

envoyConfig:
  enabled: false

ingressController:
  enabled: false

encryption:
  enabled: false

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: false
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: false
flux/infra/baremetal/cilium/values.yaml
# Cilium — Bare Metal HA configuration

cluster:
  name: rciis-kenya

devices: eth0

l2announcements:
  enabled: true
  interface: eth0
  leaseDuration: 15s
  leaseRenewDeadline: 5s
  leaseRetryPeriod: 1s

serviceMesh:
  enabled: true

envoyConfig:
  enabled: true

ingressController:
  enabled: true

loadBalancer:
  l7:
    backend: envoy

encryption:
  enabled: true
  type: wireguard
  nodeEncryption: true

extraConfig:
  node-encryption-opt-out-labels: ""

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: true
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: true
  enableAlpn: true
  enableAppProtocol: true
flux/infra/baremetal/cilium/values.yaml
# Cilium — Bare Metal Non-HA configuration

cluster:
  name: rciis-kenya

devices: eth0

l2announcements:
  enabled: true
  interface: eth0
  leaseDuration: 15s
  leaseRenewDeadline: 5s
  leaseRetryPeriod: 1s

serviceMesh:
  enabled: false

envoyConfig:
  enabled: false

ingressController:
  enabled: false

encryption:
  enabled: false

ipam:
  mode: kubernetes

hubble:
  ui:
    enabled: false
  relay:
    enabled: true

kubeProxyReplacement: true

securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

k8sServiceHost: localhost
k8sServicePort: 7445

gatewayAPI:
  enabled: false

Key settings (all environments):

Setting HA Non-HA Why
encryption.enabled true (WireGuard) false Encrypts all pod traffic — adds CPU overhead
hubble.ui.enabled true false Flow visibility UI — costs memory
gatewayAPI.enabled true false Gateway API replaces Ingress — requires CRDs and Envoy
k8sServiceHost localhost:7445 localhost:7445 Talos-specific — KubePrism per-node API proxy
cgroup.autoMount false false Talos-specific — Talos manages cgroup mounts

Extra Manifests

Save the following additional manifests for your environment:

On AWS, Cilium requires a cert-manager Issuer for Hubble relay TLS certificates. Save this as flux/infra/aws/cilium/hubble-issuer.yaml:

flux/infra/aws/cilium/hubble-issuer.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: cilium-ca-issuer
  namespace: kube-system
spec:
  ca:
    secretName: cilium-ca-tls

Info

This Issuer is referenced in the AWS patch via hubble.tls.auto.certManagerIssuerRef. The cilium-ca-tls Secret is generated during bootstrap and contains the Cilium CA certificate.

On Bare Metal, Cilium requires an L2 IP pool and announcement policy so it can assign and advertise LoadBalancer IPs on the local network. Save this as flux/infra/baremetal/cilium/l2-pool.yaml:

flux/infra/baremetal/cilium/l2-pool.yaml
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: baremetal-pool
spec:
  blocks:
    - start: "192.168.30.40"
      stop: "192.168.30.49"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: baremetal-l2-policy
spec:
  loadBalancerIPs: true
  interfaces:
    - eth0

IP pool sizing

The pool provides 10 IPs (192.168.30.40–49) for LoadBalancer services. Adjust the range based on your network allocation. Each service of type LoadBalancer consumes one IP from this pool.

On Bare Metal, Cilium requires an L2 IP pool and announcement policy so it can assign and advertise LoadBalancer IPs on the local network. Save this as flux/infra/baremetal/cilium/l2-pool.yaml:

flux/infra/baremetal/cilium/l2-pool.yaml
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: baremetal-pool
spec:
  blocks:
    - start: "192.168.30.40"
      stop: "192.168.30.49"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: baremetal-l2-policy
spec:
  loadBalancerIPs: true
  interfaces:
    - eth0

IP pool sizing

The pool provides 10 IPs (192.168.30.40–49) for LoadBalancer services. Adjust the range based on your network allocation. Each service of type LoadBalancer consumes one IP from this pool.

Commit and Deploy

Once all files are in place, commit and push to trigger Flux deployment:

git add flux/infra/base/cilium.yaml \
        flux/infra/aws/cilium/
git commit -m "feat(cilium): add Cilium CNI for AWS environment"
git push
git add flux/infra/base/cilium.yaml \
        flux/infra/baremetal/cilium/
git commit -m "feat(cilium): add Cilium CNI for bare metal environment"
git push
git add flux/infra/base/cilium.yaml \
        flux/infra/baremetal/cilium/
git commit -m "feat(cilium): add Cilium CNI for bare metal environment"
git push

Flux will detect the new commit and begin deploying Cilium. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-cilium -n flux-system --with-source

Verify

After Cilium is deployed, confirm it is working:

cilium status
cilium status | grep KubeProxyReplacement
cilium connectivity test
# Verify the NLB was created for the Gateway
kubectl get gateway -n kube-system
kubectl get svc -n kube-system -l io.cilium.gateway/owning-gateway
# Verify L2 IP pool and announcement policy
kubectl get CiliumLoadBalancerIPPool
kubectl get CiliumL2AnnouncementPolicy
# Check Hubble flows (HA only)
hubble observe --last 10
# Verify L2 IP pool and announcement policy
kubectl get CiliumLoadBalancerIPPool
kubectl get CiliumL2AnnouncementPolicy
# Check Hubble flows (HA only)
hubble observe --last 10

Flux Operations

This component is managed by Flux as HelmRelease cilium and Kustomization infra-cilium.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease cilium -n flux-system
flux get kustomization infra-cilium -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-cilium -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease cilium -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=cilium -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease cilium -n flux-system
flux resume helmrelease cilium -n flux-system
flux reconcile kustomization infra-cilium -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.

Next: Continue to ClusterMesh below.


ClusterMesh

ClusterMesh connects two Cilium-managed Kubernetes clusters so that pods, services, and identities are shared across sites. This enables cross-cluster service discovery, network policy enforcement, and failover. Each cluster runs a clustermesh-apiserver with an embedded etcd that exposes its state to the other cluster via KVStoreMesh.

The Multi-Cluster Services API (MCS-API) provides cross-cluster service discovery via the clusterset.local DNS domain. Per-cluster named services allow replication tools (MirrorMaker 2, CNPG Pub/Sub, SQL Server Merge Replication) to target a specific cluster's service by name.

Prerequisites

Before configuring ClusterMesh, the following must already be deployed on both clusters:

  • Cilium CNI (the Cilium section above)
  • cert-manager with a working cilium-ca-issuer Issuer
  • CoreDNS v1.12.2 or later (for MCS-API clusterset.local support)
  • Network connectivity between cluster endpoints (see Firewall Requirements below)

Cluster Identity

Each cluster must have a unique cluster.id and cluster.name in its Cilium patch. If two clusters share the same ID, ClusterMesh will silently fail.

Architecture

  Bare Metal Cluster (id=1)                     AWS Cloud Cluster (id=2)
 +------------------------------+             +------------------------------+
 |  clustermesh-apiserver       |             |  clustermesh-apiserver       |
 |  +---------+  +------------+ |             | +---------+  +------------+  |
 |  |  etcd   |  | kvstoremesh|----(TLS)------->|  etcd   |  | kvstoremesh|  |
 |  +---------+  +------------+ |             | +---------+  +------------+  |
 |       ^              |       |             |      ^              |        |
 |       | local-cert   |       |             |      | local-cert   |        |
 |  +----+----+         |       |             | +----+----+         |        |
 |  | agents  |  <------+       |             | | agents  |  <------+        |
 |  +---------+                 |             | +---------+                  |
 +------------------------------+             +------------------------------+
   LB: 192.168.30.42:2379                       LB: <NLB-hostname>:2379
   Public: 197.245.173.242:2379

Cilium agents connect to the local etcd via local-cert. The kvstoremesh container connects to the remote etcd via the connection secret, caching remote state locally.

Firewall Requirements

ClusterMesh requires two network planes between clusters:

Plane Purpose Protocol Port
Control plane etcd state synchronisation (KVStoreMesh) TCP 2379
Data plane Pod-to-pod traffic via Geneve tunnel UDP 6081

Both ports must be open bidirectionally between all nodes on both clusters.

The AWS security group for the cluster nodes must allow:

Direction Protocol Port Source/Destination Purpose
Inbound TCP 2379 Bare Metal public IP ClusterMesh etcd (control plane)
Inbound UDP 6081 Bare Metal public IP Geneve tunnel (data plane)
Outbound TCP 2379 Bare Metal public IP ClusterMesh etcd (control plane)
Outbound UDP 6081 Bare Metal public IP Geneve tunnel (data plane)

The router/firewall must port-forward the following to the cluster nodes:

Direction Protocol Port Forward to Purpose
Port forward TCP 2379 ClusterMesh LB IP (e.g. 192.168.30.42) ClusterMesh etcd (control plane)
Port forward UDP 6081 All worker node IPs Geneve tunnel (data plane)

The router/firewall must port-forward the following to the cluster nodes:

Direction Protocol Port Forward to Purpose
Port forward TCP 2379 ClusterMesh LB IP (e.g. 192.168.30.42) ClusterMesh etcd (control plane)
Port forward UDP 6081 All worker node IPs Geneve tunnel (data plane)

Data plane connectivity

Without UDP 6081 open between clusters, DNS resolution and service discovery will work (control plane), but actual TCP connections to remote pods will time out. This is the most common cause of "DNS resolves but connection fails" issues.

Step 1: Shared Cilium CA

Both clusters must use the same Cilium CA so that certificates issued on one cluster are trusted by the other. The CA is stored as a SOPS-encrypted Kubernetes Secret.

Generate the CA once and distribute the same secret to both clusters:

# Generate a self-signed CA (only do this once)
openssl req -x509 -newkey rsa:2048 -keyout cilium-ca.key -out cilium-ca.crt \
  -days 1095 -nodes -subj "/CN=Cilium CA"

Create a Kubernetes Secret manifest and encrypt it with SOPS:

flux/infra//cilium/secrets/cilium-ca.yaml (before encryption)
apiVersion: v1
kind: Secret
metadata:
  name: cilium-ca
  namespace: kube-system
type: Opaque
stringData:
  ca.crt: |
    <contents of cilium-ca.crt>
  ca.key: |
    <contents of cilium-ca.key>
sops --encrypt --in-place flux/infra/<environment>/cilium/secrets/cilium-ca.yaml

Same CA on both clusters

Copy the identical encrypted secret to both flux/infra/aws/cilium/secrets/cilium-ca.yaml and flux/infra/proxmox/cilium/secrets/cilium-ca.yaml. If the CAs differ, all cross-cluster TLS connections will fail.

Step 2: cert-manager Issuer

Each cluster needs a cert-manager Issuer that references the shared CA. This is already created as part of the Cilium deployment, but verify it exists:

flux/infra//cilium/cilium-ca-issuer.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: cilium-ca-issuer
  namespace: kube-system
spec:
  ca:
    secretName: cilium-ca

Step 3: Base HelmRelease Configuration

The base flux/infra/base/cilium.yaml contains the ClusterMesh and MCS-API configuration. The key sections are:

Setting Value Purpose
clustermesh.useAPIServer true Deploys the clustermesh-apiserver with etcd
clustermesh.config.enabled true Enables ClusterMesh configuration
clustermesh.enableEndpointSliceSynchronization true Synchronises EndpointSlices across clusters for headless services
clustermesh.mcsapi.enabled true Enables Multi-Cluster Services API
clustermesh.mcsapi.installCRDs true Auto-installs ServiceExport/ServiceImport CRDs
clustermesh.apiserver.tls.auto.method certmanager Uses cert-manager to issue TLS certificates
clustermesh.apiserver.tls.auto.certManagerIssuerRef cilium-ca-issuer References the shared CA Issuer
clustermesh.apiserver.tls.auto.certValidityDuration 1095 Certificate validity in days (3 years)
ingressController.enabled true Required for operator RBAC — grants permissions to list Ingress/IngressClass resources that the MCS-API controllers need for cache sync

The base HelmRelease also includes postRenderers that patch the cert-manager Certificate resources generated by Cilium. This is required because the Cilium Helm chart does not set extendedKeyUsage on the generated certificates, which causes etcd to reject client connections.

The postRenderers add:

Certificate Usage Added
clustermesh-apiserver-server-cert server auth
clustermesh-apiserver-admin-cert client auth
clustermesh-apiserver-remote-cert client auth
clustermesh-apiserver-local-cert client auth

Note

These postRenderers are already in the base HelmRelease. No action is needed unless you are building a new repository from scratch.

Step 4: Environment Patch — KVStoreMesh and Server Cert SANs

Each cluster's patch file must enable kvstoremesh and add the cluster's external endpoint to the server certificate SANs. Without the external endpoint in the SAN, the remote kvstoremesh client will reject the server certificate during TLS verification.

How to find the NLB hostname:

kubectl get svc clustermesh-apiserver -n kube-system \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

Add the following to flux/infra/aws/cilium/patch.yaml under clustermesh.apiserver:

flux/infra/aws/cilium/patch.yaml (clustermesh section)
clustermesh:
  apiserver:
    replicas: 1
    kvstoremesh:
      enabled: true
    tls:
      server:
        extraDnsNames:
          - "<NLB-HOSTNAME>"  # (1)!
    service:
      type: LoadBalancer
      enableSessionAffinity: "Never"
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
        service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
  1. Replace <NLB-HOSTNAME> with the actual NLB hostname from the command above. Example: a9973cf3b08d84af9b2e3581f7d8f8fa-6e8c2e15c6c300cd.elb.af-south-1.amazonaws.com
Setting Where to get the value
extraDnsNames kubectl get svc clustermesh-apiserver -n kube-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

How to find the public IP:

The public IP is the external IP address through which the bare-metal cluster's ClusterMesh endpoint is reachable from the internet (typically via port forwarding on the router). The LoadBalancer IP is the internal L2 IP assigned by Cilium.

Add the following to flux/infra/proxmox/cilium/patch.yaml under clustermesh.apiserver:

flux/infra/proxmox/cilium/patch.yaml (clustermesh section)
clustermesh:
  apiserver:
    replicas: 1
    kvstoremesh:
      enabled: true
    tls:
      server:
        extraIpAddresses:
          - "<LOADBALANCER-IP>"  # (1)!
          - "<PUBLIC-IP>"        # (2)!
    service:
      type: LoadBalancer
      loadBalancerIP: "<LOADBALANCER-IP>"
  1. The Cilium L2 LoadBalancer IP (e.g. 192.168.30.42). Must be within the L2 IP pool range.
  2. The public IP address that remote clusters connect to (e.g. 197.245.173.242). This is the IP of the router/firewall that port-forwards TCP 2379 to the LoadBalancer IP.
Setting Where to get the value
extraIpAddresses[0] The loadBalancerIP value from the service configuration
extraIpAddresses[1] Your public IP — run curl -s ifconfig.me from the bare-metal network

How to find the public IP:

The public IP is the external IP address through which the bare-metal cluster's ClusterMesh endpoint is reachable from the internet (typically via port forwarding on the router). The LoadBalancer IP is the internal L2 IP assigned by Cilium.

Add the following to flux/infra/proxmox/cilium/patch.yaml under clustermesh.apiserver:

flux/infra/proxmox/cilium/patch.yaml (clustermesh section)
clustermesh:
  apiserver:
    replicas: 1
    kvstoremesh:
      enabled: true
    tls:
      server:
        extraIpAddresses:
          - "<LOADBALANCER-IP>"  # (1)!
          - "<PUBLIC-IP>"        # (2)!
    service:
      type: LoadBalancer
      loadBalancerIP: "<LOADBALANCER-IP>"
  1. The Cilium L2 LoadBalancer IP (e.g. 192.168.30.42). Must be within the L2 IP pool range.
  2. The public IP address that remote clusters connect to (e.g. 197.245.173.242). This is the IP of the router/firewall that port-forwards TCP 2379 to the LoadBalancer IP.
Setting Where to get the value
extraIpAddresses[0] The loadBalancerIP value from the service configuration
extraIpAddresses[1] Your public IP — run curl -s ifconfig.me from the bare-metal network

Step 5: Connection Secrets

Each cluster needs two SOPS-encrypted secrets that contain the connection configuration for reaching the other cluster:

Secret Used by Contains
cilium-clustermesh Cilium agents (fallback) Remote cluster endpoint, client cert, key, CA
cilium-kvstoremesh KVStoreMesh container Same content as cilium-clustermesh

Both secrets have the same structure. They contain:

  • Endpoint configuration — the remote cluster's etcd URL
  • Client certificate and key — the remote cluster's clustermesh-apiserver-remote-cert
  • CA certificate — the shared Cilium CA

Extracting Values from the Remote Cluster

Run the following commands on the remote cluster to extract the values you need for the local cluster's connection secrets:

# Get the remote cluster's endpoint
# On AWS:
echo "https://$(kubectl get svc clustermesh-apiserver -n kube-system \
  -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'):2379"

# On Bare Metal:
echo "https://<PUBLIC-IP>:2379"  # The public IP with port forwarding to the LB IP
# Get the remote cluster's client certificate, key, and CA
kubectl get secret clustermesh-apiserver-remote-cert -n kube-system \
  -o jsonpath='{.data.tls\.crt}' | base64 -d    # -> remote cert (rciis-<name>.crt)

kubectl get secret clustermesh-apiserver-remote-cert -n kube-system \
  -o jsonpath='{.data.tls\.key}' | base64 -d    # -> remote key (rciis-<name>.key)

kubectl get secret clustermesh-apiserver-remote-cert -n kube-system \
  -o jsonpath='{.data.ca\.crt}' | base64 -d     # -> CA cert (rciis-<name>-ca.crt)

Creating the Secret

Use the extracted values to create the connection secret. The example below shows the AWS cluster's secret connecting to the Bare Metal cluster:

flux/infra/aws/cilium/secrets/cilium-clustermesh.yaml (before encryption)
apiVersion: v1
kind: Secret
metadata:
  name: cilium-clustermesh
  namespace: kube-system
type: Opaque
stringData:
  rciis-proxmox: |                              # (1)!
    endpoints:
      - https://197.245.173.242:2379            # (2)!
    trusted-ca-file: /var/lib/cilium/clustermesh/rciis-proxmox-ca.crt
    cert-file: /var/lib/cilium/clustermesh/rciis-proxmox.crt
    key-file: /var/lib/cilium/clustermesh/rciis-proxmox.key
  rciis-proxmox-ca.crt: |                      # (3)!
    -----BEGIN CERTIFICATE-----
    <CA certificate from remote cluster>
    -----END CERTIFICATE-----
  rciis-proxmox.crt: |                         # (4)!
    -----BEGIN CERTIFICATE-----
    <Client certificate from remote cluster>
    -----END CERTIFICATE-----
  rciis-proxmox.key: |                         # (5)!
    -----BEGIN RSA PRIVATE KEY-----
    <Client key from remote cluster>
    -----END RSA PRIVATE KEY-----
  1. The key name must match the remote cluster's cluster.name from its Cilium patch.
  2. The remote cluster's ClusterMesh endpoint URL.
  3. The shared Cilium CA certificate — extracted from the remote cluster's clustermesh-apiserver-remote-cert secret (ca.crt field).
  4. The remote cluster's client certificate — extracted from clustermesh-apiserver-remote-cert (tls.crt field).
  5. The remote cluster's client key — extracted from clustermesh-apiserver-remote-cert (tls.key field).

Create the cilium-kvstoremesh secret with the same content, replacing only the secret name:

# Copy the decrypted file and rename the secret
cp flux/infra/aws/cilium/secrets/cilium-clustermesh.yaml \
   flux/infra/aws/cilium/secrets/cilium-kvstoremesh.yaml

sed -i '' 's/cilium-clustermesh/cilium-kvstoremesh/' \
   flux/infra/aws/cilium/secrets/cilium-kvstoremesh.yaml

Encrypt both secrets with SOPS:

sops --encrypt --in-place flux/infra/aws/cilium/secrets/cilium-clustermesh.yaml
sops --encrypt --in-place flux/infra/aws/cilium/secrets/cilium-kvstoremesh.yaml

Repeat the process for the other cluster (Bare Metal connecting to AWS), using the AWS cluster's remote cert values and NLB endpoint.

Step 6: CoreDNS Configuration for MCS-API

CoreDNS must be configured with the clusterset.local domain and the multicluster plugin to resolve MCS-API service names. Add the following to your CoreDNS Corefile on both clusters:

CoreDNS Corefile (kubernetes block)
kubernetes cluster.local clusterset.local in-addr.arpa ip6.arpa {
    pods insecure
    multicluster clusterset.local
    fallthrough in-addr.arpa ip6.arpa
}

The two changes from a standard Corefile are:

  1. Add clusterset.local to the kubernetes plugin's zone list
  2. Add multicluster clusterset.local inside the kubernetes block

Note

The Cilium Helm value clustermesh.mcsapi.corednsAutoConfigure.enabled can auto-configure CoreDNS, but since CoreDNS is managed via GitOps (Flux), the auto-configure job's changes get overwritten on the next reconciliation. Configure CoreDNS manually in the git-managed Corefile instead.

Step 7: Per-Cluster Named Services for Replication

For cross-cluster replication (Kafka MirrorMaker 2, PostgreSQL CNPG Pub/Sub, SQL Server Merge Replication), each cluster creates uniquely named services that point to its local pods and exports them via ServiceExport. This allows the remote cluster to address a specific cluster's service by name.

Why per-cluster named services?

The MCS-API clusterset.local domain merges endpoints from all clusters that export a service with the same name. For replication, each subscriber/agent must connect to a specific cluster's instance — not load-balance across both. Using distinct service names (e.g. kafka-bootstrap-aws, kafka-bootstrap-proxmox) ensures deterministic routing.

Each cluster creates a Service and ServiceExport for each replication target:

flux/apps/aws/kafka/service-export-aws.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-bootstrap-aws
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    strimzi.io/broker-role: "true"
    strimzi.io/cluster: kafka-rciis-prod
    strimzi.io/kind: Kafka
    strimzi.io/name: kafka-rciis-prod-kafka
  ports:
    - name: tcp-scram
      port: 9092
      targetPort: 9092
    - name: tcp-plain
      port: 9093
      targetPort: 9093
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: kafka-bootstrap-aws
  namespace: rciis-prod
flux/apps/aws/nucleus/service-export-aws.yaml
---
# PostgreSQL ServiceExport (service created by CNPG managed services)
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: esb-postgres-aws
  namespace: rciis-prod
---
# SQL Server per-cluster service and export
apiVersion: v1
kind: Service
metadata:
  name: mssql-aws
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    app: mssql
  ports:
    - name: tcp
      port: 1433
      targetPort: 1433
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: mssql-aws
  namespace: rciis-prod
flux/apps/proxmox/kafka/service-export-proxmox.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-bootstrap-proxmox
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    strimzi.io/broker-role: "true"
    strimzi.io/cluster: kafka-rciis-prod
    strimzi.io/kind: Kafka
    strimzi.io/name: kafka-rciis-prod-kafka
  ports:
    - name: tcp-scram
      port: 9092
      targetPort: 9092
    - name: tcp-plain
      port: 9093
      targetPort: 9093
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: kafka-bootstrap-proxmox
  namespace: rciis-prod
flux/apps/proxmox/nucleus/service-export-proxmox.yaml
---
# PostgreSQL ServiceExport (service created by CNPG managed services)
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: esb-postgres-proxmox
  namespace: rciis-prod
---
# SQL Server per-cluster service and export
apiVersion: v1
kind: Service
metadata:
  name: mssql-proxmox
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    app: mssql
  ports:
    - name: tcp
      port: 1433
      targetPort: 1433
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: mssql-proxmox
  namespace: rciis-prod
flux/apps/proxmox/kafka/service-export-proxmox.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-bootstrap-proxmox
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    strimzi.io/broker-role: "true"
    strimzi.io/cluster: kafka-rciis-prod
    strimzi.io/kind: Kafka
    strimzi.io/name: kafka-rciis-prod-kafka
  ports:
    - name: tcp-scram
      port: 9092
      targetPort: 9092
    - name: tcp-plain
      port: 9093
      targetPort: 9093
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: kafka-bootstrap-proxmox
  namespace: rciis-prod
flux/apps/proxmox/nucleus/service-export-proxmox.yaml
---
# PostgreSQL ServiceExport (service created by CNPG managed services)
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: esb-postgres-proxmox
  namespace: rciis-prod
---
# SQL Server per-cluster service and export
apiVersion: v1
kind: Service
metadata:
  name: mssql-proxmox
  namespace: rciis-prod
spec:
  type: ClusterIP
  selector:
    app: mssql
  ports:
    - name: tcp
      port: 1433
      targetPort: 1433
---
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: mssql-proxmox
  namespace: rciis-prod

For PostgreSQL, the per-cluster service is created via the CNPG Cluster CRD's managed.services.additional section:

pg-instance.yaml (managed services section)
managed:
  services:
    additional:
      - selectorType: rw
        serviceTemplate:
          metadata:
            name: esb-postgres-<cluster-name>  # e.g. esb-postgres-aws
          spec:
            type: ClusterIP

After deployment, each service is accessible from the remote cluster via MCS-API DNS:

From AWS, reach Bare Metal DNS
Kafka kafka-bootstrap-proxmox.rciis-prod.svc.clusterset.local:9092
PostgreSQL esb-postgres-proxmox.rciis-prod.svc.clusterset.local:5432
SQL Server mssql-proxmox.rciis-prod.svc.clusterset.local:1433
From Bare Metal, reach AWS DNS
Kafka kafka-bootstrap-aws.rciis-prod.svc.clusterset.local:9092
PostgreSQL esb-postgres-aws.rciis-prod.svc.clusterset.local:5432
SQL Server mssql-aws.rciis-prod.svc.clusterset.local:1433

Step 8: Kustomization

Ensure all resources are listed in each environment's kustomizations.

Cilium kustomization (connection secrets):

flux/infra/aws/cilium/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base/cilium.yaml
  - secrets/cilium-ca.yaml
  - secrets/cilium-clustermesh.yaml
  - secrets/cilium-kvstoremesh.yaml
  - cilium-ca-issuer.yaml
patches:
  - path: patch.yaml
    target:
      kind: HelmRelease
      name: cilium
flux/infra/proxmox/cilium/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base/cilium.yaml
  - l2-pool.yaml
  - secrets/cilium-ca.yaml
  - secrets/cilium-clustermesh.yaml
  - secrets/cilium-kvstoremesh.yaml
  - cilium-ca-issuer.yaml
patches:
  - path: patch.yaml
    target:
      kind: HelmRelease
      name: cilium
flux/infra/proxmox/cilium/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base/cilium.yaml
  - l2-pool.yaml
  - secrets/cilium-ca.yaml
  - secrets/cilium-clustermesh.yaml
  - secrets/cilium-kvstoremesh.yaml
  - cilium-ca-issuer.yaml
patches:
  - path: patch.yaml
    target:
      kind: HelmRelease
      name: cilium

Application kustomizations (per-cluster services and exports): add the service-export-<cluster>.yaml file to each cluster's kafka/kustomization.yaml and nucleus/kustomization.yaml.

Step 9: Deploy and Verify

Commit all changes and push:

git add flux/infra/aws/cilium/ flux/infra/proxmox/cilium/ \
        flux/apps/aws/kafka/ flux/apps/proxmox/kafka/ \
        flux/apps/aws/nucleus/ flux/apps/proxmox/nucleus/
git commit -m "feat(clustermesh): configure ClusterMesh with MCS-API"
git push

Trigger reconciliation on both clusters:

flux reconcile kustomization infra-cilium -n flux-system --with-source

After the HelmRelease upgrade completes, delete the existing cert secrets so cert-manager re-issues them with the new SANs:

kubectl delete secret -n kube-system \
  clustermesh-apiserver-server-cert \
  clustermesh-apiserver-remote-cert \
  clustermesh-apiserver-admin-cert \
  clustermesh-apiserver-local-cert

Warning

Run this on both clusters. cert-manager will re-issue the certificates immediately. Then restart the clustermesh-apiserver and Cilium agents:

kubectl rollout restart deployment clustermesh-apiserver -n kube-system
kubectl rollout restart daemonset cilium -n kube-system

Verify ClusterMesh Control Plane

cilium clustermesh status

Expected output:

✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
  - <endpoint-ip>:2379
✅ Deployment clustermesh-apiserver is ready
ℹ️  KVStoreMesh is enabled

✅ All N nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
✅ All 1 KVStoreMesh replicas are connected to all clusters [min:1 / avg:1.0 / max:1]

🔌 Cluster Connections:
  - rciis-<remote>: N/N configured, N/N connected - KVStoreMesh: 1/1 configured, 1/1 connected

Run from both clusters to confirm bidirectional connectivity.

Verify MCS-API Service Discovery

Check that ServiceExports and ServiceImports exist:

kubectl get serviceexports -n rciis-prod
kubectl get serviceimports -n rciis-prod

Test DNS resolution from a pod in the cluster:

# From AWS, resolve Bare Metal's Kafka
kubectl exec -n rciis-prod deployment/health-aggregator -- \
  nslookup kafka-bootstrap-proxmox.rciis-prod.svc.clusterset.local

# From Bare Metal, resolve AWS's PostgreSQL
kubectl exec -n rciis-prod deployment/health-aggregator -- \
  nslookup esb-postgres-aws.rciis-prod.svc.clusterset.local

Verify Data Plane Connectivity

Test TCP connectivity to the remote service:

# From AWS, test TCP to Bare Metal's Kafka (port 9092)
kubectl exec -n rciis-prod deployment/health-aggregator -- \
  nc -vz -w 5 kafka-bootstrap-proxmox.rciis-prod.svc.clusterset.local 9092

# From Bare Metal, test TCP to AWS's SQL Server (port 1433)
kubectl exec -n rciis-prod deployment/health-aggregator -- \
  nc -vz -w 5 mssql-aws.rciis-prod.svc.clusterset.local 1433

If DNS resolves but TCP times out, check the Firewall Requirements section — UDP 6081 (Geneve tunnel) must be open between clusters.

Updating Connection Secrets After Certificate Rotation

When cert-manager rotates the clustermesh-apiserver-remote-cert on a cluster, the connection secrets on the other cluster become stale. To update them:

  1. Extract the new cert, key, and CA from the cluster that rotated (Step 5 commands)
  2. Decrypt the SOPS secret on the other cluster: sops --decrypt cilium-clustermesh.yaml
  3. Replace the cert, key, and CA values
  4. Re-encrypt: sops --encrypt --in-place cilium-clustermesh.yaml
  5. Repeat for cilium-kvstoremesh.yaml
  6. Commit, push, and reconcile
  7. Restart clustermesh-apiserver and cilium DaemonSet on the other cluster

Troubleshooting

Symptom Cause Fix
cilium clustermesh status shows 0/1 connected Connection secret has wrong cert or endpoint Re-extract certs from remote cluster (Step 5)
tls: bad certificate in etcd logs Server cert SAN missing external endpoint Add extraIpAddresses/extraDnsNames (Step 4), delete cert secret, restart
DNS resolves but TCP times out Geneve tunnel port blocked Open UDP 6081 between clusters
ServiceImport not created cilium-operator cache sync failed Restart cilium-operator; verify ingressController.enabled: true
cilium-dbg status shows 0 MCS-API service exports Expected behaviour — agent does not populate this counter Use cilium clustermesh status or check kubectl get serviceimports instead
ServiceExport exists but no ServiceImport Matching Service does not exist in the same namespace Create the Service first, then the ServiceExport

Flux Operations

ClusterMesh is managed as part of the Cilium HelmRelease. Use the same Flux commands documented in the Cilium Flux Operations section above.

Next: Continue to CoreDNS below.


CoreDNS

CoreDNS provides custom DNS resolution within the cluster. It is deployed by Talos as part of the Kubernetes bootstrap — only the Corefile ConfigMap needs to be customised. Custom host entries allow in-cluster services to resolve external-facing domain names to internal IPs, avoiding hairpin routing through external DNS.

Install

CoreDNS is already running after Talos bootstrap. There is no Helm chart — the customisation is a plain Kubernetes ConfigMap managed by Flux as a Kustomization (not a HelmRelease).

Create the directory and Kustomization file:

mkdir -p flux/infra/aws/coredns

The directory will contain:

flux/infra/aws/coredns/
├── kustomization.yaml    # References coredns.yaml
└── coredns.yaml          # The ConfigMap with the custom Corefile
mkdir -p flux/infra/baremetal/coredns

The directory will contain:

flux/infra/baremetal/coredns/
├── kustomization.yaml    # References coredns.yaml
└── coredns.yaml          # The ConfigMap with the custom Corefile
mkdir -p flux/infra/baremetal/coredns

The directory will contain:

flux/infra/baremetal/coredns/
├── kustomization.yaml    # References coredns.yaml
└── coredns.yaml          # The ConfigMap with the custom Corefile

Save the following kustomization.yaml — it tells Flux to apply the ConfigMap in the same directory:

kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - coredns.yaml
Alternative: kubectl

If you do not have Git access, apply the ConfigMap directly:

kubectl apply -f coredns.yaml

CoreDNS watches for ConfigMap changes and reloads automatically (via the reload plugin in the Corefile).

Configuration

The hosts block maps domain names to IP addresses. Pods resolving these names will get the internal IP directly instead of going through external DNS. Update the entries to match your environment and save as coredns.yaml in the directory created above:

On AWS, DNS is primarily handled by Route 53 and services are accessed via NLB hostnames. The hosts block is typically empty unless ClusterMesh or other internal routing is needed.

flux/infra/aws/coredns/coredns.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
data:
  Corefile: |
    .:53 {
        errors {
        }
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 1.1.1.1 8.8.8.8
        cache 30
        loop
        reload
        loadbalance
        hosts {
            # AWS environment — DNS resolved via Route53 / NLB
            # Add custom DNS entries here as needed
            fallthrough
        }
    }

On Bare Metal, services are exposed via Cilium L2 IPs on the local network. The hosts block maps public-facing domain names to these internal L2 IPs so that pods can reach services without leaving the cluster.

flux/infra/baremetal/coredns/coredns.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
data:
  Corefile: |
    .:53 {
        errors {
        }
        log {
            class error
        }
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 1.1.1.1 8.8.8.8
        cache 30
        loop
        reload
        loadbalance
        hosts {
            192.168.30.41 s3.rciis.africa
            192.168.30.41 ceph.rciis.africa
            192.168.30.41 keycloak.rciis.africa
            192.168.30.41 auth.rciis.africa
            192.168.30.41 gateway.rciis.africa
            # Add custom DNS entries here as needed
            fallthrough
        }
    }

IP address mapping

All entries point to the same L2 LoadBalancer IP (192.168.30.41) because the Gateway API handles routing based on the Host header. Adjust the IP to match your Cilium L2 pool allocation.

On Bare Metal, services are exposed via Cilium L2 IPs on the local network. The hosts block maps public-facing domain names to these internal L2 IPs so that pods can reach services without leaving the cluster.

flux/infra/baremetal/coredns/coredns.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    k8s-app: kube-dns
data:
  Corefile: |
    .:53 {
        errors {
        }
        log {
            class error
        }
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 1.1.1.1 8.8.8.8
        cache 30
        loop
        reload
        loadbalance
        hosts {
            192.168.30.41 s3.rciis.africa
            192.168.30.41 ceph.rciis.africa
            192.168.30.41 keycloak.rciis.africa
            192.168.30.41 auth.rciis.africa
            192.168.30.41 gateway.rciis.africa
            # Add custom DNS entries here as needed
            fallthrough
        }
    }

IP address mapping

All entries point to the same L2 LoadBalancer IP (192.168.30.41) because the Gateway API handles routing based on the Host header. Adjust the IP to match your Cilium L2 pool allocation.

Key Corefile settings:

Setting Value Why
forward . 1.1.1.1 8.8.8.8 Upstream DNS External resolution uses Cloudflare and Google DNS
hosts { ... } Custom entries Maps domains to internal IPs so pods skip external DNS
prometheus :9153 Metrics port Exposes DNS metrics for Prometheus scraping
cache 30 30-second TTL Reduces upstream queries — increase for low-traffic clusters
log { class error } Error logging (HA) Captures DNS failures without excessive log volume

Commit and Deploy

Commit the CoreDNS configuration and push to trigger Flux deployment:

git add flux/infra/aws/coredns/
git commit -m "feat(coredns): add custom DNS entries for AWS environment"
git push
git add flux/infra/baremetal/coredns/
git commit -m "feat(coredns): add custom DNS entries for bare metal environment"
git push
git add flux/infra/baremetal/coredns/
git commit -m "feat(coredns): add custom DNS entries for bare metal environment"
git push

Trigger an immediate sync:

flux reconcile kustomization infra-coredns -n flux-system --with-source

Verify

# Check CoreDNS pods are running
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Verify cluster DNS works
kubectl run dns-test --rm -it --restart=Never --image=busybox:1.36 -- \
  nslookup kubernetes.default.svc.cluster.local
# Verify custom host entries resolve to the L2 IP
kubectl run dns-test --rm -it --restart=Never --image=busybox:1.36 -- \
  nslookup s3.rciis.africa
# Verify standard cluster DNS still works
kubectl run dns-test --rm -it --restart=Never --image=busybox:1.36 -- \
  nslookup kubernetes.default.svc.cluster.local
# Verify custom host entries resolve to the L2 IP
kubectl run dns-test --rm -it --restart=Never --image=busybox:1.36 -- \
  nslookup s3.rciis.africa
# Verify standard cluster DNS still works
kubectl run dns-test --rm -it --restart=Never --image=busybox:1.36 -- \
  nslookup kubernetes.default.svc.cluster.local

Flux Operations

CoreDNS is managed by Flux as Kustomization infra-coredns (there is no HelmRelease — it is a plain manifest).

Check whether the Kustomization is in a Ready state:

flux get kustomization infra-coredns -n flux-system

Trigger an immediate sync — use after pushing changes to the Corefile:

flux reconcile kustomization infra-coredns -n flux-system --with-source

View the applied ConfigMap — confirm your DNS entries are live:

kubectl get configmap coredns -n kube-system -o yaml

Next Steps

Networking is now configured. Proceed to 5.1.2 Certificates to set up cert-manager and automatic TLS certificate provisioning.