Skip to content

3.3 Set Up Load Balancing

The Kubernetes API (port 6443) and Talos API (port 50000) must be reachable via a stable endpoint that distributes traffic across control plane nodes. This requires a Layer 4 (TCP) load balancer or virtual IP with health checks.

Requirements

Listener Backend Port Health Check Protocol Backend Nodes
Kubernetes API 6443 TCP connect on 6443 TCP All control plane nodes
Talos API 50000 TCP connect on 50000 TCP All control plane nodes

The load balancer endpoint becomes the cluster_endpoint that all tools (kubectl, talosctl, worker nodes) use to reach the control plane.


The load balancer module (terraform/modules/aws/loadbalancer) creates a Network Load Balancer (NLB) that provides access to the Kubernetes API and Talos API on the control plane nodes. This is deployed as part of terraform apply via terraform/cluster/aws/main.tf.

Step 1: Configure Load Balancer Variables

Open terraform/cluster/envs/aws.tfvars and set the NLB parameters:

terraform/cluster/envs/aws.tfvars
nlb_internal                     = false  # true = internal, false = internet-facing
enable_deletion_protection       = false
enable_cross_zone_load_balancing = false   # Single AZ, not needed

Internal NLB (nlb_internal = true) -- requires VPN or bastion host to reach the control plane from outside the VPC. Use this for production.

Internet-facing NLB (nlb_internal = false) -- the NLB gets a public DNS name. Combine with allowed_admin_cidrs in the security group to restrict access. Use this for demo/testing:

terraform/cluster/envs/aws.tfvars
nlb_internal                     = false
enable_deletion_protection       = false
enable_cross_zone_load_balancing = false

allowed_admin_cidrs = [
  "196.45.28.20/32",
]

Health Check Tuning

terraform/cluster/aws/variables.tf
variable "health_check_interval" {
  description = "NLB health check interval in seconds"
  type        = number
  default     = 10
}

variable "health_check_timeout" {
  description = "NLB health check timeout in seconds"
  type        = number
  default     = 5
}

variable "healthy_threshold" {
  description = "Consecutive successful checks before marking target healthy"
  type        = number
  default     = 2
}

variable "unhealthy_threshold" {
  description = "Consecutive failed checks before marking target unhealthy"
  type        = number
  default     = 2
}

Step 2: Understand the Module

The module is at terraform/modules/aws/loadbalancer/. The root module (main.tf) passes in the VPC ID and public subnet IDs from the network module.

Network Load Balancer

terraform/modules/aws/loadbalancer/main.tf
resource "aws_lb" "kubernetes_api" {
  name               = "${var.environment}-talos-api-nlb"
  internal           = var.internal
  load_balancer_type = "network"
  subnets            = var.subnet_ids

  enable_deletion_protection       = var.enable_deletion_protection
  enable_cross_zone_load_balancing = var.enable_cross_zone_load_balancing
}

Key points:

  • internal -- toggles between internal and internet-facing based on var.internal (mapped from nlb_internal in the root module)
  • subnets -- placed in public subnets (from module.network.public_subnet_ids)

Target Groups and Listeners

The loadbalancer module creates two target groups and listeners:

Listener Port Target Port Protocol Target Group Purpose
6443 6443 TCP <env>-talos-api-tg Kubernetes API
50000 50000 TCP <env>-talos-apid-tg Talos API (control plane)

The root module adds a third listener for worker Talos API access:

Listener Port Target Port Protocol Target Group Purpose
50001 50000 TCP <env>-talos-wk-apid-tg Talos API (workers)

All target groups use:

  • preserve_client_ip = true -- the NLB preserves the original source IP
  • deregistration_delay = 30 -- allows in-flight requests to complete before removing a target
  • TCP health checks on the respective service port

Target Group Attachments

Target group attachments are defined in the root module (terraform/cluster/aws/main.tf) as separate resources, so that EC2 instances can be replaced without affecting the NLB:

terraform/cluster/aws/main.tf
resource "aws_lb_target_group_attachment" "cp_kubernetes_api" {
  count            = var.control_plane_count
  target_group_arn = module.loadbalancer.kubernetes_api_target_group_arn
  target_id        = module.compute.control_plane_instance_ids[count.index]
  port             = 6443
}

resource "aws_lb_target_group_attachment" "cp_talos_api" {
  count            = var.control_plane_count
  target_group_arn = module.loadbalancer.talos_api_target_group_arn
  target_id        = module.compute.control_plane_instance_ids[count.index]
  port             = 50000
}

Step 3: Module Outputs

The loadbalancer module exports:

Output Description
load_balancer_dns_name NLB DNS name for external access
load_balancer_arn NLB ARN (used for additional listeners in root module)
kubernetes_api_endpoint https://<nlb-dns>:6443
talos_api_endpoint <nlb-dns>:50000
kubernetes_api_target_group_arn For target group attachments
talos_api_target_group_arn For target group attachments

These are surfaced as root module outputs and used when configuring talosctl and kubectl after deployment:

# Get the NLB DNS name
terraform output nlb_dns_name

# Configure talosctl to use the NLB endpoint
talosctl config endpoint $(terraform output -raw nlb_dns_name)

# The kubeconfig will use the NLB DNS as the server URL
# https://<nlb-dns>:6443
terraform output kubernetes_api_endpoint

Customisation Summary

What to Change Where Variable
Internal vs internet-facing aws.tfvars nlb_internal
Deletion protection aws.tfvars enable_deletion_protection
Cross-zone load balancing aws.tfvars enable_cross_zone_load_balancing
Health check interval aws.tfvars health_check_interval
Health check timeout aws.tfvars health_check_timeout
Healthy/unhealthy thresholds aws.tfvars healthy_threshold, unhealthy_threshold

Warning

The deregistration delay (30s) and client IP preservation are hardcoded in the loadbalancer module. To change these, edit terraform/modules/aws/loadbalancer/main.tf directly.

For bare metal, use HAProxy + Keepalived for a highly available load balancer, or a single HAProxy instance for simpler setups.

Deploy HAProxy on two dedicated servers (or VMs) with Keepalived managing a floating VIP. If one HAProxy fails, the VIP moves to the other.

HAProxy Configuration

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 4096

defaults
    mode tcp
    timeout connect 10s
    timeout client 30s
    timeout server 30s
    option tcp-check

frontend kubernetes_api
    bind *:6443
    default_backend kubernetes_api_backend

backend kubernetes_api_backend
    balance roundrobin
    option tcp-check
    server cp-01 192.168.30.31:6443 check
    server cp-02 192.168.30.32:6443 check
    server cp-03 192.168.30.33:6443 check

frontend talos_api
    bind *:50000
    default_backend talos_api_backend

backend talos_api_backend
    balance roundrobin
    option tcp-check
    server cp-01 192.168.30.31:50000 check
    server cp-02 192.168.30.32:50000 check
    server cp-03 192.168.30.33:50000 check

Keepalived Configuration

On each HAProxy server, configure Keepalived to manage the VIP:

# /etc/keepalived/keepalived.conf (on primary)

vrrp_instance VI_1 {
    state MASTER              # BACKUP on secondary
    interface eth0
    virtual_router_id 51
    priority 100              # 90 on secondary
    advert_int 1

    virtual_ipaddress {
        192.168.30.30/24
    }

    track_script {
        chk_haproxy
    }
}

vrrp_script chk_haproxy {
    script "pidof haproxy"
    interval 2
    weight 2
}

The VIP (192.168.30.30) becomes your cluster endpoint:

talosctl config endpoint 192.168.30.30
kubectl config set-cluster rciis --server=https://192.168.30.30:6443

Option 2: Single HAProxy (Non-HA)

For smaller environments, a single HAProxy instance works. Use the same configuration as above without Keepalived. The HAProxy server's IP becomes the cluster endpoint.

Option 3: DNS Round-Robin (Simplest)

For development or testing, create DNS A records pointing to all control plane IPs:

k8s.rciis.local  A  192.168.30.31
k8s.rciis.local  A  192.168.30.32
k8s.rciis.local  A  192.168.30.33

Warning

DNS round-robin provides no health checking. If a CP node goes down, clients may still be directed to it until DNS TTL expires.

For Proxmox deployments, Terraform only provisions VMs with static IPs via cloud-init. Talos configuration — including load balancing — is applied separately using talosctl.

The recommended approach is to use Talos built-in VIP for a lightweight virtual IP that requires no external infrastructure.

Talos has native VIP support. When configured in the machine config, one control plane node holds the VIP at any time. If that node fails, another CP node takes over automatically via GARP.

Configure the VIP in your Talos machine config (applied via talosctl apply-config):

machine:
  network:
    interfaces:
      - interface: eth0
        addresses:
          - 192.168.30.31/24   # Node's own IP
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.30.1
        vip:
          ip: 192.168.30.30    # Shared VIP

Apply to each control plane node, changing the addresses field per node:

talosctl apply-config --insecure --nodes 192.168.30.31 --file controlplane.yaml

The VIP becomes the cluster endpoint used by all clients:

# talosctl uses the VIP
talosctl config endpoint 192.168.30.30

# kubectl uses the VIP
# https://192.168.30.30:6443

Both port 6443 (Kubernetes API) and port 50000 (Talos API) are available on the VIP. Traffic is forwarded to whichever CP node currently holds the VIP.

VIP Requirements

  • The VIP must be an unused IP on the same subnet as the control plane nodes
  • The VIP must not be assigned to any other device
  • ARP must not be filtered on the network (GARP is used for failover)
  • All control plane nodes must be on the same Layer 2 network

Option 2: Single Control Plane (No VIP)

For single control plane setups, use the node's IP directly as the cluster endpoint:

talosctl config endpoint 192.168.30.31

No VIP or load balancer is needed, but there is no failover.

Option 3: External Load Balancer

For environments where Talos VIP is not suitable (e.g., nodes on different L2 segments), use an external load balancer such as HAProxy on the Proxmox host or a network appliance. See the Bare Metal tab for HAProxy configuration examples.