Skip to content

3.2 Set Up Network Fabric

The network fabric provides connectivity between all Kubernetes nodes, outbound internet access for image pulls, and a stable endpoint for the Kubernetes API. The requirements are:

  • All nodes must be able to reach each other on the required ports (see Firewall Rules)
  • Control plane nodes need a stable endpoint (VIP or load balancer) for the Kubernetes API
  • All nodes need outbound internet access (directly or via NAT/proxy)
  • DNS resolution must work for both internal and external domains

The network module (terraform/modules/aws/network) creates the full VPC networking layer for the cluster. You do not run this module independently -- it is composed into terraform/cluster/aws/main.tf and deployed as part of the full terraform apply. This section explains what the module creates, how it works, and where to customise it.

Architecture

Internet
┌───┴───┐
│  IGW  │
└───┬───┘
┌───┴──────────────────────────────────┐
│  Public Subnets (one per AZ)         │
│  - NAT Gateways                      │
│  - Network Load Balancer             │
│  - map_public_ip_on_launch = true    │
└───┬──────────────────────────────────┘
    │ (NAT)
┌───┴──────────────────────────────────┐
│  Private Subnets (one per AZ)        │
│  - Control Plane EC2 instances       │
│  - Worker EC2 instances              │
│  - map_public_ip_on_launch = false   │
└──────────────────────────────────────┘

Step 1: Configure Network Variables

Open terraform/cluster/envs/aws.tfvars and set the network parameters.

VPC CIDR

The vpc_cidr defines the overall address space for the VPC. All subnets must fall within this range:

terraform/cluster/aws/variables.tf
variable "vpc_cidr" {
  description = "VPC CIDR block"
  type        = string
  default     = "10.0.0.0/16"
}

Change this if 10.0.0.0/16 conflicts with your existing networks. The demo environment uses 10.2.0.0/16:

terraform/cluster/envs/aws.tfvars
vpc_cidr = "10.2.0.0/16"

Availability Zones and Subnets

You must define one public and one private subnet per availability zone. These three lists must have the same length:

terraform/cluster/envs/aws.tfvars
availability_zones   = ["af-south-1a"]
public_subnet_cidrs  = ["10.2.1.0/24"]
private_subnet_cidrs = ["10.2.11.0/24"]

Single AZ (demo / cost saving):

availability_zones   = ["af-south-1a"]
public_subnet_cidrs  = ["10.2.1.0/24"]
private_subnet_cidrs = ["10.2.11.0/24"]

Multi-AZ (production / HA):

availability_zones   = ["af-south-1a", "af-south-1b", "af-south-1c"]
public_subnet_cidrs  = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_subnet_cidrs = ["10.0.11.0/24", "10.0.12.0/24", "10.0.13.0/24"]

NAT Gateway Strategy

NAT gateways give private subnet nodes outbound internet access. Choose a strategy:

terraform/cluster/envs/aws.tfvars
enable_nat_gateway = true
single_nat_gateway = true   # false = one per AZ (HA)
Setting Behaviour Cost
single_nat_gateway = true One NAT gateway shared across all AZs ~$41.61/mo
single_nat_gateway = false One NAT gateway per AZ (HA -- survives AZ failure) ~$41.61/mo per AZ

Step 2: Understand the Module

The module is at terraform/modules/aws/network/. Here is what it creates.

VPC

terraform/modules/aws/network/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.tags, {
    Name = "${var.environment}-talos-vpc"
  })
}
  • cidr_block -- reads from var.vpc_cidr
  • enable_dns_hostnames / enable_dns_support -- required for internal DNS resolution within the VPC
  • The VPC is tagged with kubernetes.io/cluster/<name> = "owned" when cluster_name is set, enabling AWS CCM and LB Controller resource discovery

Internet Gateway

terraform/modules/aws/network/main.tf
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

In Terraform the IGW is directly attached to the VPC via vpc_id.

Public Subnets

One public subnet is created per availability zone. These host the NAT gateways and NLB:

terraform/modules/aws/network/main.tf
resource "aws_subnet" "public" {
  count                   = length(var.public_subnet_cidrs)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index % length(var.availability_zones)]
  map_public_ip_on_launch = true

  tags = merge(var.tags, {
    Name                        = "${var.environment}-talos-public-${var.availability_zones[count.index]}"
    Type                        = "public"
    "kubernetes.io/role/elb"    = "1"
  })
}

Key points:

  • The count meta-argument iterates over var.public_subnet_cidrs, creating one subnet per entry
  • count.index selects the matching AZ from var.availability_zones using modulo
  • map_public_ip_on_launch = true -- instances in public subnets get public IPs
  • The kubernetes.io/role/elb tag tells the AWS LB Controller which subnets to use for public load balancers

Private Subnets

Private subnets host the Kubernetes nodes. They follow the same pattern but with map_public_ip_on_launch = false:

terraform/modules/aws/network/main.tf
resource "aws_subnet" "private" {
  count             = length(var.private_subnet_cidrs)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index % length(var.availability_zones)]

  tags = merge(var.tags, {
    Name                                = "${var.environment}-talos-private-${var.availability_zones[count.index]}"
    Type                                = "private"
    "kubernetes.io/role/internal-elb"   = "1"
  })
}
  • kubernetes.io/role/internal-elb tag tells the AWS LB Controller which subnets to use for internal load balancers

Elastic IPs and NAT Gateways

The number of Elastic IPs and NAT gateways depends on the single_nat_gateway setting:

terraform/modules/aws/network/main.tf
resource "aws_eip" "nat" {
  count  = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
  domain = "vpc"
  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "main" {
  count         = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 0
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  depends_on    = [aws_internet_gateway.main]
}

Each NAT gateway is placed in its corresponding public subnet and receives a dedicated Elastic IP.

Route Tables

Two types of route tables are created:

Public route table -- routes internet traffic through the IGW:

terraform/modules/aws/network/main.tf
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

Private route table(s) -- routes internet traffic through the NAT gateway. When single_nat_gateway = true, one route table is shared; otherwise, one per AZ:

terraform/modules/aws/network/main.tf
resource "aws_route_table" "private" {
  count  = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.availability_zones)) : 1
  vpc_id = aws_vpc.main.id

  dynamic "route" {
    for_each = var.enable_nat_gateway ? [1] : []
    content {
      cidr_block     = "0.0.0.0/0"
      nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.main[0].id : aws_nat_gateway.main[count.index].id
    }
  }
}

Security Groups

The network module also creates security groups in security_groups.tf. Two security groups are created -- one for control plane nodes and one for workers -- with rules for:

Rule Port(s) Source Notes
Kubernetes API 6443 VPC CIDR Control plane SG
Talos API 50000 VPC CIDR Both SGs
etcd peer 2379-2380 CP SG self Control plane SG
Kubelet API 10250 CP SG, self, VPC Both SGs
Cilium GENEVE 6081/udp CP + Worker SGs Conditional on cni_type = "cilium"
Cilium health 4240 CP + Worker SGs Conditional on cni_type = "cilium"
NodePort range 30000-32767 Worker SG, VPC Conditional on enable_nodeport
ICMP all VPC CIDR Both SGs
All egress all 0.0.0.0/0 Both SGs

The root module (terraform/cluster/aws/main.tf) adds additional security group rules for:

  • Admin access: K8s API (6443) and Talos API (50000) from allowed_admin_cidrs
  • Cilium ClusterMesh: ports 2379, 4240, 4244, 8472/udp, ICMP from clustermesh_peer_cidrs

Step 3: Module Outputs

The network module exports references that other modules consume. Terraform resolves these automatically via module references in main.tf:

Output Consumed By
vpc_id compute, loadbalancer modules
vpc_cidr Security group rules (VPC-wide CIDR rules)
public_subnet_ids loadbalancer module (NLB placement)
private_subnet_ids compute module (EC2 instance placement)
control_plane_security_group_id compute module, admin access rules in root module
worker_security_group_id compute module, admin access rules in root module
nat_gateway_public_ips Cluster output (for firewall whitelisting)

Step 4: Deploy

The network module is deployed as part of the full infrastructure:

cd terraform/cluster/aws
terraform init
terraform plan  -var-file=../envs/aws.tfvars
terraform apply -var-file=../envs/aws.tfvars

Terraform resolves all cross-module references and deploys resources in dependency order. Network resources (VPC, subnets, IGW, NAT, routes, security groups) are created before compute and load balancer resources that depend on them.

Customisation Summary

What to Change Where Variable
VPC address space aws.tfvars vpc_cidr
Number of AZs aws.tfvars availability_zones, public_subnet_cidrs, private_subnet_cidrs
NAT gateway strategy aws.tfvars single_nat_gateway
Disable NAT entirely aws.tfvars enable_nat_gateway = false
CNI type (Cilium/Calico) aws.tfvars cni_type
NodePort access aws.tfvars enable_nodeport
Admin IP restrictions aws.tfvars allowed_admin_cidrs
ClusterMesh peer CIDRs aws.tfvars clustermesh_peer_cidrs

The network fabric for bare metal is your physical and logical network infrastructure — switches, VLANs, routers, and cabling.

Step 0: Rack Servers & Cable Network Interfaces

Before configuring switches and VLANs, physically install and cable all servers.

Rack installation checklist:

  • [ ] Mount servers in assigned rack units (document U positions)
  • [ ] Label each server on the front and rear panels with its hostname
  • [ ] Connect redundant power supplies to separate PDUs (A + B feeds)
  • [ ] Verify power LEDs on all servers before cabling network

Cable the 4 NICs per node:

NIC Speed Connect To Port Type Purpose
NIC 1 (eth0) 25 GbE Core switch VLAN trunk (VLAN 10 + 30) Production traffic (primary)
NIC 2 (eth1) 25 GbE Core switch VLAN trunk (VLAN 10 + 30) Production traffic (secondary / bond)
NIC 3 (MGMT) 1 GbE OOB management switch Access Out-of-band management
NIC 4 (IPMI) 1 GbE IPMI / BMC network Access Baseboard management controller

Verify before proceeding

Check that link lights are active on all ports before moving on to switch configuration. A missing link light now saves hours of debugging later.

Refer to the server inventory table in Provision Compute for hostname-to-IP mappings.

Network Architecture

Internet
┌───┴────────────────────────────┐
│  Router / Firewall             │
│  (NAT, firewall rules)         │
└───┬────────────────────────────┘
┌───┴────────────────────────────┐
│  Core Switch                   │
│  ├── VLAN 10: Management       │
│  │   (IPMI/BMC interfaces)     │
│  ├── VLAN 30: Kubernetes       │
│  │   (Talos node interfaces)   │
│  └── VLAN 1: Default           │
│      (Admin workstations)      │
└────────────────────────────────┘

VLAN Planning

VLAN Subnet Purpose Nodes
10 192.168.10.0/24 Management (IPMI/BMC) Server BMCs
30 192.168.30.0/24 Kubernetes (data plane) All Talos nodes
1 192.168.1.0/24 Default (admin access) Workstations, PXE server

Switch Configuration

  1. Create VLANs on your managed switch
  2. Configure trunk ports between switches to carry all VLANs
  3. Configure access ports for each server NIC:
    • BMC/IPMI NICVLAN 10 (management)
    • Data NICVLAN 30 (Kubernetes)
  4. Configure the gateway (router/firewall) with interfaces on each VLAN

IP Addressing

Assign static IPs to all Talos nodes. These will be configured in the Talos machine config:

Hostname IP Address Gateway Role
rciis-cp-01 192.168.30.31/24 192.168.30.1 Control plane
rciis-cp-02 192.168.30.32/24 192.168.30.1 Control plane
rciis-cp-03 192.168.30.33/24 192.168.30.1 Control plane
rciis-wn-01 192.168.30.34/24 192.168.30.1 Worker
rciis-wn-02 192.168.30.35/24 192.168.30.1 Worker
rciis-wn-03 192.168.30.36/24 192.168.30.1 Worker
VIP 192.168.30.30 Kubernetes API (kube-vip or HAProxy)

Outbound Access

Talos nodes need outbound access to: - ghcr.io, docker.io, quay.io — container images - factory.talos.dev — Talos installer images - NTP servers — time synchronisation

Configure NAT or a proxy on your gateway for the Kubernetes VLAN.

Proxmox networking for the Talos cluster uses Proxmox's Linux bridge (vmbr0) and optionally VLANs. The Terraform project configures each VM's network via cloud-init.

Network Architecture

Internet / Upstream Router
┌───┴────────────────────────────┐
│  Proxmox Host                  │
│  ├── vmbr0 (Linux bridge)      │
│  │   └── Physical NIC (eno1)   │
│  │                             │
│  ├── CP VMs (192.168.30.31-33) │
│  ├── Worker VMs (.34-.36)      │
│  └── VIP: 192.168.30.30        │
└────────────────────────────────┘

Bridge Configuration

The default Proxmox bridge vmbr0 is used. Verify it exists on your Proxmox node:

# On the Proxmox node
cat /etc/network/interfaces

You should see something like:

auto vmbr0
iface vmbr0 inet static
    address 192.168.30.225/24
    gateway 192.168.30.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0

Terraform Network Configuration

The VM module configures networking via these variables in your .tfvars:

# Network bridge on Proxmox
network_bridge = "vmbr0"

# Optional VLAN tag (null = untagged)
# network_vlan_id = 30

# Gateway for all nodes
ipv4_gateway = "192.168.30.1"

# DNS resolvers
dns_servers = ["1.1.1.1", "192.168.10.17"]

# Static IPs (CIDR notation)
control_plane_ips = [
  "192.168.30.31/24",
  "192.168.30.32/24",
  "192.168.30.33/24"
]

worker_ips = [
  "192.168.30.34/24",
  "192.168.30.35/24",
  "192.168.30.36/24"
]

Cloud-init passes these IP addresses to each VM at creation time.

VLAN Support

To place VMs on a specific VLAN, set network_vlan_id:

network_vlan_id = 30

The Proxmox bridge must be VLAN-aware for this to work. Enable it in /etc/network/interfaces:

auto vmbr0
iface vmbr0 inet manual
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes