3.3 Set Up Load Balancing¶

The Kubernetes API (port 6443) and Talos API (port 50000) must be reachable via a stable endpoint that distributes traffic across control plane nodes. This requires a Layer 4 (TCP) load balancer or virtual IP with health checks.

Requirements¶

Listener	Backend Port	Health Check	Protocol	Backend Nodes
Kubernetes API	6443	TCP connect on 6443	TCP	All control plane nodes
Talos API	50000	TCP connect on 50000	TCP	All control plane nodes

The load balancer endpoint becomes the cluster_endpoint that all tools (kubectl, talosctl, worker nodes) use to reach the control plane.

AWSBare MetalProxmox VMs

The load balancer module (terraform/modules/aws/loadbalancer) creates a Network Load Balancer (NLB) that provides access to the Kubernetes API and Talos API on the control plane nodes. This is deployed as part of terraform apply via terraform/cluster/aws/main.tf.

Step 1: Configure Load Balancer Variables¶

Open terraform/cluster/envs/aws.tfvars and set the NLB parameters:

terraform/cluster/envs/aws.tfvars

nlb_internal                     = false  # true = internal, false = internet-facing
enable_deletion_protection       = false
enable_cross_zone_load_balancing = false   # Single AZ, not needed

Internal NLB (nlb_internal = true) -- requires VPN or bastion host to reach the control plane from outside the VPC. Use this for production.

Internet-facing NLB (nlb_internal = false) -- the NLB gets a public DNS name. Combine with allowed_admin_cidrs in the security group to restrict access. Use this for demo/testing:

terraform/cluster/envs/aws.tfvars

nlb_internal                     = false
enable_deletion_protection       = false
enable_cross_zone_load_balancing = false

allowed_admin_cidrs = [
  "196.45.28.20/32",
]

Health Check Tuning¶

terraform/cluster/aws/variables.tf

variable "health_check_interval" {
  description = "NLB health check interval in seconds"
  type        = number
  default     = 10
}

variable "health_check_timeout" {
  description = "NLB health check timeout in seconds"
  type        = number
  default     = 5
}

variable "healthy_threshold" {
  description = "Consecutive successful checks before marking target healthy"
  type        = number
  default     = 2
}

variable "unhealthy_threshold" {
  description = "Consecutive failed checks before marking target unhealthy"
  type        = number
  default     = 2
}

Step 2: Understand the Module¶

The module is at terraform/modules/aws/loadbalancer/. The root module (main.tf) passes in the VPC ID and public subnet IDs from the network module.

Network Load Balancer¶

terraform/modules/aws/loadbalancer/main.tf

resource "aws_lb" "kubernetes_api" {
  name               = "${var.environment}-talos-api-nlb"
  internal           = var.internal
  load_balancer_type = "network"
  subnets            = var.subnet_ids

  enable_deletion_protection       = var.enable_deletion_protection
  enable_cross_zone_load_balancing = var.enable_cross_zone_load_balancing
}

Key points:

internal -- toggles between internal and internet-facing based on var.internal (mapped from nlb_internal in the root module)
subnets -- placed in public subnets (from module.network.public_subnet_ids)

Target Groups and Listeners¶

The loadbalancer module creates two target groups and listeners:

Listener Port	Target Port	Protocol	Target Group	Purpose
6443	6443	TCP	`<env>-talos-api-tg`	Kubernetes API
50000	50000	TCP	`<env>-talos-apid-tg`	Talos API (control plane)

The root module adds a third listener for worker Talos API access:

Listener Port	Target Port	Protocol	Target Group	Purpose
50001	50000	TCP	`<env>-talos-wk-apid-tg`	Talos API (workers)

All target groups use:

preserve_client_ip = true -- the NLB preserves the original source IP
deregistration_delay = 30 -- allows in-flight requests to complete before removing a target
TCP health checks on the respective service port

Target Group Attachments¶

Target group attachments are defined in the root module (terraform/cluster/aws/main.tf) as separate resources, so that EC2 instances can be replaced without affecting the NLB:

terraform/cluster/aws/main.tf

resource "aws_lb_target_group_attachment" "cp_kubernetes_api" {
  count            = var.control_plane_count
  target_group_arn = module.loadbalancer.kubernetes_api_target_group_arn
  target_id        = module.compute.control_plane_instance_ids[count.index]
  port             = 6443
}

resource "aws_lb_target_group_attachment" "cp_talos_api" {
  count            = var.control_plane_count
  target_group_arn = module.loadbalancer.talos_api_target_group_arn
  target_id        = module.compute.control_plane_instance_ids[count.index]
  port             = 50000
}

Step 3: Module Outputs¶

The loadbalancer module exports:

Output	Description
`load_balancer_dns_name`	NLB DNS name for external access
`load_balancer_arn`	NLB ARN (used for additional listeners in root module)
`kubernetes_api_endpoint`	`https://<nlb-dns>:6443`
`talos_api_endpoint`	`<nlb-dns>:50000`
`kubernetes_api_target_group_arn`	For target group attachments
`talos_api_target_group_arn`	For target group attachments

These are surfaced as root module outputs and used when configuring talosctl and kubectl after deployment:

# Get the NLB DNS name
terraform output nlb_dns_name

# Configure talosctl to use the NLB endpoint
talosctl config endpoint $(terraform output -raw nlb_dns_name)

# The kubeconfig will use the NLB DNS as the server URL
# https://<nlb-dns>:6443
terraform output kubernetes_api_endpoint

Customisation Summary¶

What to Change	Where	Variable
Internal vs internet-facing	`aws.tfvars`	`nlb_internal`
Deletion protection	`aws.tfvars`	`enable_deletion_protection`
Cross-zone load balancing	`aws.tfvars`	`enable_cross_zone_load_balancing`
Health check interval	`aws.tfvars`	`health_check_interval`
Health check timeout	`aws.tfvars`	`health_check_timeout`
Healthy/unhealthy thresholds	`aws.tfvars`	`healthy_threshold`, `unhealthy_threshold`

Warning

The deregistration delay (30s) and client IP preservation are hardcoded in the loadbalancer module. To change these, edit terraform/modules/aws/loadbalancer/main.tf directly.

For bare metal, use HAProxy + Keepalived for a highly available load balancer, or a single HAProxy instance for simpler setups.

Option 1: HAProxy + Keepalived (Recommended for HA)¶

Deploy HAProxy on two dedicated servers (or VMs) with Keepalived managing a floating VIP. If one HAProxy fails, the VIP moves to the other.

HAProxy Configuration¶

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 4096

defaults
    mode tcp
    timeout connect 10s
    timeout client 30s
    timeout server 30s
    option tcp-check

frontend kubernetes_api
    bind *:6443
    default_backend kubernetes_api_backend

backend kubernetes_api_backend
    balance roundrobin
    option tcp-check
    server cp-01 192.168.30.31:6443 check
    server cp-02 192.168.30.32:6443 check
    server cp-03 192.168.30.33:6443 check

frontend talos_api
    bind *:50000
    default_backend talos_api_backend

backend talos_api_backend
    balance roundrobin
    option tcp-check
    server cp-01 192.168.30.31:50000 check
    server cp-02 192.168.30.32:50000 check
    server cp-03 192.168.30.33:50000 check

Keepalived Configuration¶

On each HAProxy server, configure Keepalived to manage the VIP:

# /etc/keepalived/keepalived.conf (on primary)

vrrp_instance VI_1 {
    state MASTER              # BACKUP on secondary
    interface eth0
    virtual_router_id 51
    priority 100              # 90 on secondary
    advert_int 1

    virtual_ipaddress {
        192.168.30.30/24
    }

    track_script {
        chk_haproxy
    }
}

vrrp_script chk_haproxy {
    script "pidof haproxy"
    interval 2
    weight 2
}

The VIP (192.168.30.30) becomes your cluster endpoint:

talosctl config endpoint 192.168.30.30
kubectl config set-cluster rciis --server=https://192.168.30.30:6443

Option 2: Single HAProxy (Non-HA)¶

For smaller environments, a single HAProxy instance works. Use the same configuration as above without Keepalived. The HAProxy server's IP becomes the cluster endpoint.

Option 3: DNS Round-Robin (Simplest)¶

For development or testing, create DNS A records pointing to all control plane IPs:

k8s.rciis.local  A  192.168.30.31
k8s.rciis.local  A  192.168.30.32
k8s.rciis.local  A  192.168.30.33

Warning

DNS round-robin provides no health checking. If a CP node goes down, clients may still be directed to it until DNS TTL expires.

For Proxmox deployments, Terraform only provisions VMs with static IPs via cloud-init. Talos configuration — including load balancing — is applied separately using talosctl.

The recommended approach is to use Talos built-in VIP for a lightweight virtual IP that requires no external infrastructure.

Option 1: Talos Built-in VIP (Recommended)¶

Talos has native VIP support. When configured in the machine config, one control plane node holds the VIP at any time. If that node fails, another CP node takes over automatically via GARP.

Configure the VIP in your Talos machine config (applied via talosctl apply-config):

machine:
  network:
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.30.31/24   # Node's own IP
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.30.1
        vip:
          ip: 192.168.30.30    # Shared VIP

Apply to each control plane node, changing the addresses field per node:

talosctl apply-config --insecure --nodes 192.168.30.31 --file controlplane.yaml

The VIP becomes the cluster endpoint used by all clients:

# talosctl uses the VIP
talosctl config endpoint 192.168.30.30

# kubectl uses the VIP
# https://192.168.30.30:6443

Both port 6443 (Kubernetes API) and port 50000 (Talos API) are available on the VIP. Traffic is forwarded to whichever CP node currently holds the VIP.

VIP Requirements¶

The VIP must be an unused IP on the same subnet as the control plane nodes
The VIP must not be assigned to any other device
ARP must not be filtered on the network (GARP is used for failover)
All control plane nodes must be on the same Layer 2 network

Option 2: Single Control Plane (No VIP)¶

For single control plane setups, use the node's IP directly as the cluster endpoint:

talosctl config endpoint 192.168.30.31

No VIP or load balancer is needed, but there is no failover.

Option 3: External Load Balancer¶

For environments where Talos VIP is not suitable (e.g., nodes on different L2 segments), use an external load balancer such as HAProxy on the Proxmox host or a network appliance. See the Bare Metal tab for HAProxy configuration examples.