Skip to content

Glossary

A categorised reference of technical terms, acronyms, and jargon used throughout this documentation. Terms are grouped by domain and sorted alphabetically within each category.

For in-depth explanations of observability and SRE concepts (SLIs, SLOs, percentiles, golden signals, etc.), see SRE & Observability Concepts.


Kubernetes & Containers

Term Definition
Admission controller A Kubernetes API server plugin that intercepts requests before objects are persisted. Mutating admission controllers modify objects; validating admission controllers reject non-compliant ones. Kyverno is the admission controller used in RCIIS.
Affinity / Anti-affinity Scheduling rules that attract (affinity) or repel (anti-affinity) pods relative to nodes or other pods. Used to spread replicas across failure domains.
ConfigMap A Kubernetes object for storing non-sensitive configuration data as key-value pairs. Mounted as files or environment variables in pods.
Container A lightweight, isolated process running from a container image. Containers share the host kernel but have their own filesystem, network, and process namespace.
containerd The container runtime used by Kubernetes (and Talos) to pull images, create containers, and manage their lifecycle. Replaced Docker as the default runtime.
Cordon Mark a node as unschedulable so no new pods are assigned to it. Existing pods continue running. Used before draining a node for maintenance.
CRD (Custom Resource Definition) An extension to the Kubernetes API that defines a new resource type (e.g., Cluster, Keycloak, CephCluster). Operators use CRDs to manage applications declaratively.
CR (Custom Resource) An instance of a CRD. For example, a Keycloak CR tells the Keycloak Operator to create a running Keycloak deployment.
DaemonSet A workload that runs exactly one pod on every (or selected) node. Used for node-level agents like Falco, Fluent Bit, and Cilium.
Deployment A workload that manages a set of identical pod replicas with rolling update support. The most common workload type for stateless services.
Drain Evict all pods from a node (respecting PodDisruptionBudgets) and cordon it. Used to safely remove a node from the cluster for maintenance.
etcd The distributed key-value store that backs the Kubernetes API. Stores all cluster state including Secrets, ConfigMaps, and resource definitions.
Eviction The process of terminating a pod, either by the kubelet (resource pressure), the scheduler (preemption), or an administrator (drain).
Helm A package manager for Kubernetes. Helm charts are templated YAML manifests with configurable values files.
Init container A container that runs to completion before the main application container starts. Used for setup tasks like loading PKCS#11 libraries.
kubelet The agent running on each node that manages pod lifecycle — starts containers, reports node status, and enforces resource limits.
kube-proxy The default network proxy on each node that implements Kubernetes Service load balancing. Replaced by Cilium eBPF in RCIIS.
Kustomize A template-free configuration management tool for Kubernetes that uses overlays and patches. Used with KSOPS for secret management.
Leader election A pattern where one replica in a group is elected as the active leader. Used by controllers (cert-manager, Velero) to prevent duplicate work.
Namespace A logical partition within a Kubernetes cluster for organising and isolating resources. Each platform tool deploys into its own namespace.
Node A machine (physical or virtual) in the Kubernetes cluster. Control plane nodes run etcd and the API server; worker nodes run application pods.
Operator (pattern) A Kubernetes controller that uses CRDs to manage the full lifecycle of a complex application — install, configure, upgrade, backup, and repair.
Pod The smallest deployable unit in Kubernetes — one or more containers sharing a network namespace and storage volumes.
PodDisruptionBudget (PDB) A policy that limits how many pods in a workload can be unavailable simultaneously during voluntary disruptions (drain, rolling update).
RBAC (Role-Based Access Control) Kubernetes authorisation system that grants permissions (verbs on resources) to users, groups, or service accounts via Roles and RoleBindings.
Rolling update A deployment strategy that replaces old pods with new ones incrementally, maintaining availability throughout the update.
Secret A Kubernetes object for storing sensitive data (passwords, tokens, certificates). Stored in etcd and optionally encrypted at rest via KMS.
ServiceAccount An identity for pods to authenticate with the Kubernetes API. Used for RBAC and workload identity.
Sidecar A secondary container running alongside the main container in the same pod. Used for logging, proxying, or HSM PKCS#11 access.
StatefulSet A workload for stateful applications (databases, message brokers) that provides stable network identities and persistent storage per replica.
Static pod A pod managed directly by the kubelet on a specific node, not by the Kubernetes API. Used for critical control plane components.
Toleration A pod property that allows it to schedule onto nodes with matching taints. Used with node taints to dedicate nodes for specific workloads.
Topology spread constraint A scheduling rule that distributes pods evenly across failure domains (zones, nodes). Used for HA deployments.
Webhook (mutating / validating) An HTTP callback that Kubernetes invokes during admission. Kyverno, cert-manager, and OPA/Gatekeeper all use admission webhooks.

Networking

Term Definition
Anycast A routing method where the same IP address is announced from multiple locations. The network routes traffic to the nearest location. Cloudflare uses Anycast for global load balancing.
ARP / GARP Address Resolution Protocol maps IP addresses to MAC addresses. Gratuitous ARP (GARP) is an unsolicited broadcast used by kube-vip and MetalLB to announce VIP ownership.
BGP (Border Gateway Protocol) The routing protocol that connects autonomous systems on the internet. Used by MetalLB and Cilium for advertising service IPs to network routers.
CIDR Classless Inter-Domain Routing — a notation for IP address ranges (e.g., 10.0.0.0/16). Used to define pod networks, service networks, and subnet boundaries.
Cilium An eBPF-based CNI plugin that provides networking, observability (Hubble), load balancing, network policy, and WireGuard encryption. The primary CNI for RCIIS.
ClusterIP The default Kubernetes Service type — an internal-only virtual IP that load-balances traffic across pod endpoints. Not accessible from outside the cluster.
CNI (Container Network Interface) A specification and plugin system for configuring networking in Linux containers. Cilium is the CNI used in RCIIS.
CoreDNS The DNS server running inside Kubernetes that resolves Service names to ClusterIPs. Also used for custom DNS zones in RCIIS.
DNS (Domain Name System) The hierarchical naming system that translates human-readable domain names to IP addresses.
eBPF (extended Berkeley Packet Filter) A Linux kernel technology that allows running sandboxed programs in kernel space without modifying kernel code. Used by Cilium (networking), Falco, and Tracee (security).
Egress Outbound traffic from a pod or cluster to an external destination. Egress network policies control which external services pods can reach.
Gateway API A Kubernetes-native API for managing ingress, load balancing, and traffic routing. The successor to the Ingress resource, supported by Cilium.
GENEVE Generic Network Virtualization Encapsulation — a tunnelling protocol used by Cilium for pod-to-pod traffic across nodes.
Hubble The observability layer built into Cilium that provides network flow visibility, DNS query logging, and HTTP request tracing.
Ingress Inbound traffic from outside the cluster to services inside. Also refers to the Kubernetes Ingress resource that defines HTTP routing rules.
kube-vip A lightweight load balancer for the Kubernetes control plane that provides a virtual IP (VIP) for the API server using ARP or BGP.
L2 / L4 / L7 OSI model layers — Layer 2 (data link / MAC), Layer 4 (transport / TCP/UDP), Layer 7 (application / HTTP). Load balancers and firewalls operate at different layers.
Load balancer A component that distributes traffic across multiple backends. External LBs (AWS NLB, HAProxy) front the cluster; internal LBs (Cilium, kube-proxy) distribute within it.
MetalLB A bare-metal load balancer for Kubernetes that assigns external IPs to Services using ARP or BGP. An alternative to Cilium L2 announcements.
mTLS (Mutual TLS) TLS where both client and server authenticate each other with certificates. Used by Talos API, etcd, and optionally by Cilium (WireGuard).
NAT (Network Address Translation) Translates private IP addresses to public addresses. NAT gateways provide internet access for private-subnet nodes on AWS.
Network policy A Kubernetes resource that defines firewall rules for pod-to-pod and pod-to-external traffic. Cilium implements network policies using eBPF.
NodePort A Kubernetes Service type that exposes the service on a static port on every node. Rarely used directly — usually fronted by a load balancer.
PoP (Point of Presence) A physical location where a CDN or network provider has servers. Cloudflare has PoPs in 300+ cities for edge caching and traffic routing.
Subnet A subdivision of an IP network. VPCs are divided into subnets (public, private, management) to isolate traffic.
TLS (Transport Layer Security) Cryptographic protocol that provides encryption and authentication for network connections. Successor to SSL.
VIP (Virtual IP) A floating IP address that can move between hosts for failover. Used by kube-vip and Keepalived for control plane HA.
VLAN Virtual LAN — a logical network segment within a physical network. Used to isolate management, storage, and application traffic on bare metal.
VPC (Virtual Private Cloud) An isolated virtual network within a cloud provider (AWS). Contains subnets, route tables, security groups, and NAT gateways.
VPN (Virtual Private Network) An encrypted tunnel between two networks. Used for site-to-site connectivity between data centres or cloud regions.
WireGuard A modern VPN protocol used by Cilium for transparent pod-to-pod encryption across nodes.

Security & Compliance

Term Definition
Admission control The process of intercepting API requests before persistence to validate or modify them. See admission controller above.
CA (Certificate Authority) An entity that issues digital certificates. cert-manager automates CA operations in RCIIS; HSMs protect the CA signing key.
CIS Benchmark A set of security configuration recommendations published by the Centre for Internet Security. The Kubernetes CIS Benchmark defines hardening rules for clusters.
Container escape An attack where a process breaks out of container isolation to access the host. Detected by Falco and Tracee.
cosign A tool from the Sigstore project for signing and verifying container images. RCIIS uses cosign in CI to sign images and Kyverno to verify signatures at admission.
CVE (Common Vulnerabilities and Exposures) A unique identifier for a publicly disclosed security vulnerability (e.g., CVE-2024-1234). Trivy scans for CVEs in container images.
DDoS (Distributed Denial of Service) An attack that overwhelms a service with traffic from many sources. Mitigated by Cloudflare WAF and rate limiting.
Defence in depth A security strategy that layers multiple independent controls so that a failure in one layer does not compromise the system.
DEK (Data Encryption Key) A key used to encrypt data directly. In KMS v2, the DEK is wrapped (encrypted) by a Key Encryption Key stored in the HSM.
Encryption at rest Protecting stored data by encrypting it on disk. Implemented via LUKS2 (Talos), KMS (AWS), or ZFS encryption (Proxmox).
Encryption in transit Protecting data moving across a network using TLS or mTLS. Talos manages all control plane TLS automatically.
Falco A CNCF runtime security tool that detects anomalous syscall behaviour in containers (shell execution, unexpected file access, network connections).
FIPS 140-2 / 140-3 A US government standard for cryptographic module validation. Level 3 requires physical tamper-resistance (HSMs).
Fulcio A Sigstore component that issues short-lived code-signing certificates tied to OIDC identity (e.g., GitHub Actions). Used in keyless signing workflows.
HSM (Hardware Security Module) A tamper-resistant hardware device that stores cryptographic keys and performs signing/encryption operations. Keys never leave the HSM. See Encryption & HSM Provisioning.
Image signing Cryptographically signing a container image to prove its authenticity and integrity. Verified by Kyverno at admission time.
ISO 27001 An international standard for information security management systems (ISMS). Defines controls for access, cryptography, operations, and development.
JWT (JSON Web Token) A compact, signed token used for authentication and authorisation. Keycloak issues JWTs; services verify them using the public key from the OIDC discovery endpoint.
KEK (Key Encryption Key) A key used to encrypt other keys (DEKs). In KMS v2, the KEK lives in the HSM and wraps DEKs that encrypt Kubernetes Secrets.
Key ceremony A formal, witnessed process for initialising an HSM and generating root cryptographic keys. Documented with photographs and signed logs.
KMS (Key Management Service) A service that manages cryptographic keys. AWS KMS provides cloud-managed keys; KMS v2 is a Kubernetes API for delegating Secret encryption to an external KMS.
Kyverno A Kubernetes-native policy engine that enforces admission policies, mutates resources, and generates reports — without requiring a separate policy language.
Lateral movement An attacker's technique of moving from one compromised component to another within the network. Prevented by network policies and namespace isolation.
LUKS2 Linux Unified Key Setup v2 — the standard for full-disk encryption on Linux. Talos uses LUKS2 for STATE and EPHEMERAL partition encryption.
M-of-N secret sharing A scheme where a secret (e.g., HSM SO PIN) is split into N shares, and any M shares are needed to reconstruct it. Used for HSM key custody.
OIDC (OpenID Connect) An identity layer on top of OAuth 2.0 that provides authentication. Keycloak is the OIDC provider for RCIIS; FluxCD dashboard and the Kubernetes API are OIDC clients.
PKCS#11 A standard API for communicating with cryptographic hardware (HSMs, smart cards). Applications use PKCS#11 to perform signing and encryption without accessing raw key material.
PKI (Public Key Infrastructure) The framework of certificates, CAs, and trust chains that enables TLS and code signing. Talos manages the Kubernetes PKI; cert-manager manages application PKI.
Privilege escalation An attack where a process gains higher permissions than intended. Detected by Falco and Tracee; prevented by Kyverno policies.
Rekor A Sigstore component that provides an immutable transparency log for software signing events. Used to verify that a signature was created at a specific time.
SAML Security Assertion Markup Language — an XML-based protocol for exchanging authentication and authorisation data. Used for enterprise SSO alongside OIDC.
SBOM (Software Bill of Materials) A list of all components (libraries, dependencies) in a software artifact. Used for supply chain transparency and vulnerability tracking.
Sigstore An open-source project for software signing, verification, and transparency. Includes cosign (signing), Fulcio (certificates), and Rekor (transparency log).
Supply chain attack An attack that compromises software through its dependencies, build tools, or distribution channels. Mitigated by image signing and admission control.
Syscall A system call — the interface between user-space programs and the kernel. Falco monitors syscalls to detect anomalous container behaviour.
Tracee An Aqua Security runtime security tool that uses eBPF for deep forensic capture of container and host events. Complements Falco.
Trivy An Aqua Security scanner for vulnerabilities (CVEs), misconfigurations, and secret leaks in container images, Kubernetes resources, and filesystems.
WAF (Web Application Firewall) A firewall that filters HTTP traffic based on rules (SQL injection, XSS, rate limiting). Cloudflare WAF protects RCIIS external endpoints.
X.509 The standard format for public key certificates used in TLS. cert-manager automates X.509 certificate issuance and renewal.
Zero trust A security model that assumes no implicit trust — every request must be authenticated, authorised, and encrypted regardless of network location.

Storage

Term Definition
Block storage Storage presented as raw block devices (like a virtual disk). Ceph RBD and AWS EBS provide block storage for Kubernetes PVCs.
Ceph A distributed storage system that provides block (RBD), object (RGW), and file (CephFS) storage. Deployed via Rook-Ceph in RCIIS.
CephFS Ceph's POSIX-compliant distributed filesystem. Used for ReadWriteMany (RWX) workloads where multiple pods need shared access.
CSI (Container Storage Interface) A standard API for storage drivers in Kubernetes. Rook-Ceph implements CSI to provision PVCs from the Ceph cluster.
EBS (Elastic Block Store) AWS block storage service. EBS volumes are attached to EC2 instances and used for Kubernetes PVCs on AWS.
Erasure coding A storage redundancy technique that splits data into fragments with parity, using less space than full replication. Not currently used in RCIIS (replication is used instead).
gp3 An AWS EBS volume type providing baseline 3,000 IOPS and 125 MB/s throughput. The default volume type for RCIIS AWS deployments.
IOPS Input/Output Operations Per Second — a measure of storage performance. gp3 volumes provide 3,000 baseline IOPS, scalable to 16,000.
MGR (Ceph Manager) The Ceph manager daemon that provides monitoring, orchestration, and a management dashboard. Runs alongside MONs.
MON (Ceph Monitor) The Ceph monitor daemon that maintains cluster membership, state maps, and quorum. Requires an odd number (3 or 5) for consensus.
Object storage Storage accessed via HTTP APIs (S3, Swift). Ceph RGW provides S3-compatible object storage for backups and log archives.
OSD (Object Storage Daemon) A Ceph daemon that manages a physical or logical disk. Each OSD stores data, handles replication, and participates in recovery.
PV (PersistentVolume) A Kubernetes resource representing a piece of provisioned storage. Created dynamically by a StorageClass or statically by an administrator.
PVC (PersistentVolumeClaim) A request for storage by a pod. The PVC binds to a PV that satisfies its size and access mode requirements.
Quorum The minimum number of members that must agree for a distributed system to make progress. Ceph MONs, etcd, and PostgreSQL streaming replication all require quorum.
RBD (RADOS Block Device) Ceph's block storage interface. RBD provides thin-provisioned, snapshotable block devices for Kubernetes PVCs.
RGW (RADOS Gateway) Ceph's S3-compatible object storage gateway. Used for Velero backups, Loki log storage, and CNPG WAL archiving.
Rook A Kubernetes operator that automates deployment and management of Ceph clusters. Rook handles OSD provisioning, MON placement, and cluster health.
Snapshot Controller A Kubernetes controller that manages CSI VolumeSnapshots — point-in-time copies of PVCs used for backup and cloning.
StorageClass A Kubernetes resource that defines how PVCs are dynamically provisioned — which CSI driver, replication factor, and parameters to use.
Throughput The rate of data transfer (MB/s or GB/s). Relevant for streaming workloads, database WAL writes, and backup operations.
VolumeSnapshot A point-in-time copy of a PVC, backed by the CSI driver (e.g., Ceph RBD snapshot). Used for pre-upgrade backups and database cloning.
ZFS A combined filesystem and volume manager with built-in compression, snapshots, and encryption. Used on Proxmox hosts for VM storage.

Observability & SRE

For detailed explanations with practical examples and PromQL queries, see SRE & Observability Concepts.

Term Definition
Alertmanager The Prometheus component that routes, deduplicates, groups, and silences alerts. Sends notifications to Slack, PagerDuty, email, or webhooks.
Availability The proportion of time a service is operational, expressed as a percentage ("nines"). See Availability & Nines.
Blackbox monitoring Monitoring a service from the outside — probing endpoints without knowledge of internal state. The Blackbox Exporter performs HTTP, TCP, and ICMP probes.
Burn rate The rate at which an error budget is being consumed. A burn rate of 1.0 means the budget is being used evenly; >1.0 means faster than sustainable. See Error Budgets.
Cardinality The number of unique time series in Prometheus. High cardinality (from unbounded label values) causes memory and performance issues. See Cardinality.
Chaos engineering The practice of deliberately injecting failures to verify system resilience. See Chaos Engineering.
Counter A Prometheus metric type that only increases (or resets to zero). Used for totals like request count, bytes transferred. Always use rate() on counters.
Dashboard A visual display of metrics and logs, typically in Grafana. Dashboards provide real-time operational visibility into platform health.
Error budget The amount of unreliability permitted by the SLO. Calculated as 100% - SLO target. See Error Budgets.
Game day A planned exercise where the team practices incident response against a simulated failure. See Game Days.
Gauge A Prometheus metric type that can go up or down. Used for current values like memory usage, temperature, active connections.
Golden signals The four key metrics for any service: latency, traffic, errors, saturation. See Four Golden Signals.
Grafana A visualisation platform for creating dashboards from Prometheus, Loki, and other data sources.
Histogram A Prometheus metric type that counts observations in configurable buckets. Used for latency distributions and calculated with histogram_quantile().
LogQL The query language for Grafana Loki. Uses label selectors and pipeline stages to filter and aggregate log lines. See Loki & LogQL.
Loki A log aggregation system by Grafana Labs. Indexes log metadata (labels) rather than full text, making it efficient for Kubernetes log storage.
MTBF (Mean Time Between Failures) The average time between consecutive failures. MTBF = MTTF + MTTR. See Incident Metrics.
MTTF (Mean Time to Failure) The average time a system runs before failing. See Incident Metrics.
MTTR (Mean Time to Recovery) The average time from incident detection to service restoration. The most actionable reliability metric. See Incident Metrics.
Observability The ability to understand a system's internal state from its external outputs (metrics, logs, traces). The combination of Prometheus, Loki, and Grafana provides observability for RCIIS.
On-call A rotation where designated team members are available to respond to incidents outside business hours.
P50 / P95 / P99 / P999 Latency percentiles — the maximum latency experienced by 50%, 95%, 99%, or 99.9% of requests. See Latency Percentiles.
Post-mortem A blameless document written after an incident that describes the timeline, root cause, impact, and preventive action items. See Post-Mortems.
Prometheus An open-source monitoring system that collects metrics via a pull model, stores them in a time-series database, and evaluates alert rules.
PromQL The query language for Prometheus. Used in dashboards, alerts, and recording rules. See Prometheus & PromQL.
Recording rule A PromQL expression that is pre-computed at regular intervals and stored as a new time series. Improves query performance for dashboards and alerts.
RED method A monitoring framework for request-driven services: Rate, Errors, Duration. See RED Method.
Runbook A documented procedure linked to an alert that describes how to diagnose and resolve the issue. See Alerting Best Practices.
Scrape The process of Prometheus pulling metrics from a target's /metrics endpoint at a configured interval.
ServiceMonitor A Prometheus Operator CRD that defines how Prometheus should scrape metrics from a Kubernetes Service.
SLA (Service Level Agreement) A contractual commitment guaranteeing a minimum level of service reliability, with consequences for breach. See Service Levels.
SLI (Service Level Indicator) A quantitative measurement of service performance (e.g., error rate, latency). See Service Levels.
SLO (Service Level Objective) A target value for an SLI over a time window (e.g., "99.9% availability over 30 days"). See Service Levels.
SRE (Site Reliability Engineering) A discipline that applies software engineering practices to infrastructure and operations. Originated at Google. See SRE & Observability Concepts.
Toil Repetitive, manual, automatable operational work that does not provide enduring value. See Toil & Automation.
USE method A monitoring framework for infrastructure resources: Utilization, Saturation, Errors. See USE Method.
VPA (Vertical Pod Autoscaler) A Kubernetes component that recommends or automatically adjusts pod CPU and memory requests based on observed usage. Goldilocks uses VPA in recommendation mode.
Whitebox monitoring Monitoring a service from the inside — using internal metrics, logs, and instrumentation. Prometheus scraping application metrics is whitebox monitoring.

GitOps & CI/CD

Term Definition
Age A modern file encryption tool used with SOPS for encrypting Kubernetes secrets. Simpler than GPG with smaller keys.
Kustomization (FluxCD) A FluxCD resource that defines a desired state (source repo + target cluster + reconciliation interval) and continuously reconciles the cluster to match.
HelmRelease A FluxCD resource that declaratively manages Helm chart installations with automated upgrades, rollbacks, and drift detection.
FluxCD A declarative GitOps continuous delivery tool for Kubernetes. Watches Git repositories and automatically syncs cluster state to match.
Argo Rollouts A Kubernetes controller for progressive delivery strategies — canary deployments, blue-green deployments, and analysis-driven rollbacks.
Blue-green deployment A release strategy that runs two identical environments (blue = current, green = new) and switches traffic atomically. Zero-downtime but requires double the resources.
Canary deployment A release strategy that routes a small percentage of traffic to the new version, gradually increasing if metrics are healthy. Detected regressions trigger automatic rollback.
CI/CD Continuous Integration (automated build + test) and Continuous Delivery/Deployment (automated release to environments).
Declarative A configuration style that describes the desired end state rather than the steps to reach it. Kubernetes manifests and FluxCD Kustomizations are declarative.
Drift When the actual state of the cluster diverges from the desired state defined in Git. FluxCD detects and can auto-correct drift via continuous reconciliation.
GitOps An operational model where Git is the single source of truth for infrastructure and application configuration. Changes are applied via pull requests and automated reconciliation.
Harbor An open-source container registry with vulnerability scanning, image signing, and replication. The RCIIS private registry for Helm charts and container images.
Helm chart A package of Kubernetes YAML templates with a values.yaml file for configuration. Charts are versioned and stored in registries (Harbor, OCI).
Idempotent An operation that produces the same result regardless of how many times it is applied. kubectl apply and Helm upgrades are idempotent.
KSOPS A Kustomize plugin that integrates SOPS decryption into the Kustomize build process. Enables encrypted secrets in GitOps workflows.
Multi-source Kustomization A FluxCD Kustomization that references multiple sources — typically a HelmRepository for the chart and a GitRepository for values overrides.
OCI (Open Container Initiative) A set of standards for container image formats and registries. Helm charts can be stored as OCI artifacts in registries like Harbor.
Progressive delivery A release strategy that gradually shifts traffic to a new version based on real-time metrics. Canary and blue-green are types of progressive delivery.
Reconciliation The process of comparing actual state to desired state and making corrections. FluxCD reconciles at a configurable interval (default 10 minutes).
Renovate An automated dependency update tool that creates pull requests when new versions of Helm charts, container images, or other dependencies are available.
Dependency ordering A FluxCD Kustomization feature that uses dependsOn to ensure resources are deployed in the correct order across environments.
Drift detection A FluxCD feature that automatically reverts manual changes to cluster resources, ensuring the Git-defined state is maintained when prune and force are enabled.
SOPS Secrets OPerationS — a tool for encrypting/decrypting files using Age, PGP, or KMS keys. Only values are encrypted; keys and metadata remain readable.
Reconciliation (FluxCD) The process of applying the desired state from Git to the cluster. Runs automatically at the configured interval, with options for pruning deleted resources.
DependsOn A deployment ordering mechanism in FluxCD Kustomizations. Resources with dependencies wait for their prerequisites to become ready before deploying.

Infrastructure & IaC

Term Definition
AMI (Amazon Machine Image) A pre-built virtual machine image for AWS EC2. The Talos AMI provides a pre-installed Talos Linux image for each AWS region.
BMC / IPMI Baseboard Management Controller / Intelligent Platform Management Interface — remote management hardware on servers for power control, console access, and monitoring without an OS.
Bare metal Physical servers without virtualisation. Provides maximum performance but requires manual hardware management, PXE booting, and physical access for maintenance.
Cloud-init A tool for initialising cloud instances on first boot — setting hostname, SSH keys, network config. Used by some Proxmox and cloud deployments; Talos uses its own machine config instead.
Control plane The set of Kubernetes components (API server, etcd, scheduler, controller manager) that manage cluster state. Runs on dedicated nodes in production.
Day-0 / Day-1 / Day-2 Lifecycle phases — Day-0 is planning and design, Day-1 is initial deployment, Day-2 is ongoing operations. See Reliability Engineering Practices.
EC2 Amazon Elastic Compute Cloud — virtual machine instances in AWS. RCIIS uses EC2 for control plane and worker nodes on AWS.
Terraform An Infrastructure as Code (IaC) tool by HashiCorp that provisions and manages cloud and on-premises infrastructure using declarative HCL configuration files. Used in RCIIS for provisioning AWS and Proxmox infrastructure.
HA (High Availability) A design approach that eliminates single points of failure through redundancy, replication, and automatic failover. Requires 3+ nodes.
IaC (Infrastructure as Code) Managing infrastructure through version-controlled configuration files rather than manual processes. Terraform, HCL, and Helm are all IaC tools.
iDRAC / iLO Dell (iDRAC) and HPE (iLO) implementations of BMC for remote server management. Accessed via web UI or CLI for power, console, and hardware monitoring.
Image Factory A Talos service that builds custom Talos Linux images with specific system extensions (e.g., iscsi-tools, qemu-guest-agent) baked in.
Immutable OS An operating system with a read-only root filesystem that cannot be modified at runtime. Talos Linux is immutable — configuration is applied declaratively via machine config.
iPXE / PXE Network boot protocols for loading an OS image from a server over the network. Used for bare metal Talos installations.
Kind Kubernetes in Docker — a tool for running local Kubernetes clusters using Docker containers as nodes. Used for development and testing.
Machine config The YAML configuration file that defines a Talos node's identity, networking, disk encryption, and cluster membership. Applied at boot or via talosctl apply-config.
NTP Network Time Protocol — synchronises system clocks across servers. Critical for TLS certificate validation, log correlation, and distributed consensus.
NVMe / SSD / HDD Storage device types — NVMe (fastest, PCIe-attached), SSD (fast, SATA/SAS), HDD (slowest, mechanical). NVMe is recommended for Ceph OSDs.
OOB (Out-of-Band) Management access to a server that is independent of the main operating system — typically via IPMI/BMC over a dedicated management network.
HCL (HashiCorp Configuration Language) The declarative configuration language used by Terraform for defining infrastructure resources. Supports variables, modules, and expressions.
Proxmox VE An open-source virtualisation platform based on KVM and LXC. Used as an alternative to cloud providers for running Kubernetes VMs on-premises.
RAID Redundant Array of Independent Disks — combines multiple disks for redundancy or performance. Hardware RAID is configured in BIOS/UEFI before OS installation.
Schematic A Talos configuration that defines which system extensions to include in a custom Talos image. Submitted to Image Factory to produce a downloadable image.
System extension A read-only overlay that adds functionality to Talos Linux (e.g., iscsi-tools, qemu-guest-agent, bpf for Falco). Baked into the image at build time.
Talos Linux A minimal, immutable, API-managed Linux distribution designed exclusively for running Kubernetes. There is no SSH, shell, or package manager — all management is via the Talos API.
talosctl The CLI for managing Talos Linux nodes — applying config, upgrading, reading logs, and accessing etcd. Authenticated via mTLS certificates.
Talhelper A helper tool that generates Talos machine configurations from a single talconfig.yaml definition, simplifying multi-node config management.

| Worker node | A Kubernetes node that runs application workloads (pods). Does not run control plane components (on dedicated control plane nodes). |


Domain & Acronyms

Term Definition
APISIX Apache APISIX — a cloud-native API gateway used as the primary ingress and traffic management layer for RCIIS application services.
CloudNativePG (CNPG) A Kubernetes operator for managing PostgreSQL clusters with automated failover, backup, and recovery. Used for Keycloak and RCIIS application databases.
DR (Disaster Recovery) The process and procedures for restoring service after a catastrophic failure. Includes backups (Velero, CNPG), geo-load balancing, and documented recovery runbooks.
EAC East African Community — the regional intergovernmental organisation for which the RCIIS platform is built. Partner states include Kenya, Tanzania, Uganda, Rwanda, Burundi, DRC, South Sudan, and Somalia.
ESB (Enterprise Service Bus) A middleware pattern for integrating applications via a central message broker. The RCIIS ESB handles customs data exchange between partner states.
Fluent Bit A lightweight log processor and forwarder deployed as a DaemonSet. Collects container logs and ships them to Loki.
Goldilocks A tool that runs VPA in recommendation mode and provides a dashboard showing right-sizing suggestions for pod resource requests and limits.
Keepalived A Linux daemon that provides VRRP-based failover for virtual IPs. Used alongside HAProxy for load balancer HA on bare metal.
HAProxy A high-performance TCP/HTTP load balancer. Used for Kubernetes API load balancing on bare metal and Proxmox deployments.
Kafka Apache Kafka — a distributed event streaming platform. Deployed via Strimzi in RCIIS for asynchronous customs data processing.
Partner state A member country of the EAC that participates in the RCIIS customs interconnectivity system.
PITR (Point-in-Time Recovery) Restoring a database to a specific moment by replaying WAL segments from a backup. CloudNativePG supports PITR for PostgreSQL.
RCIIS Regional Customs Interconnectivity Integration System — the platform being deployed by this documentation. Connects customs systems of EAC partner states.
RPO (Recovery Point Objective) The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means backups must be no more than 1 hour old.
RTO (Recovery Time Objective) The maximum acceptable time to restore service after a failure. An RTO of 30 minutes means the service must be back within 30 minutes.
SIEM (Security Information and Event Management) A system that aggregates and analyses security events from multiple sources. Falco and Tracee events can be forwarded to a SIEM.
Strimzi A Kubernetes operator for running Apache Kafka clusters. Manages brokers, topics, users, and connectors declaratively via CRDs.
Velero A Kubernetes backup and disaster recovery tool that backs up cluster resources and PersistentVolumes to S3-compatible storage.
WAL (Write-Ahead Log) A database transaction log where changes are written before being applied to data files. PostgreSQL WAL segments are archived by CNPG for continuous backup and PITR.
WCO SAFE Framework World Customs Organization Framework of Standards to Secure and Facilitate Global Trade — an international standard relevant to RCIIS compliance.