Glossary¶

A categorised reference of technical terms, acronyms, and jargon used throughout this documentation. Terms are grouped by domain and sorted alphabetically within each category.

For in-depth explanations of observability and SRE concepts (SLIs, SLOs, percentiles, golden signals, etc.), see SRE & Observability Concepts.

Kubernetes & Containers¶

Term	Definition
Admission controller	A Kubernetes API server plugin that intercepts requests before objects are persisted. Mutating admission controllers modify objects; validating admission controllers reject non-compliant ones. Kyverno is the admission controller used in RCIIS.
Affinity / Anti-affinity	Scheduling rules that attract (affinity) or repel (anti-affinity) pods relative to nodes or other pods. Used to spread replicas across failure domains.
ConfigMap	A Kubernetes object for storing non-sensitive configuration data as key-value pairs. Mounted as files or environment variables in pods.
Container	A lightweight, isolated process running from a container image. Containers share the host kernel but have their own filesystem, network, and process namespace.
containerd	The container runtime used by Kubernetes (and Talos) to pull images, create containers, and manage their lifecycle. Replaced Docker as the default runtime.
Cordon	Mark a node as unschedulable so no new pods are assigned to it. Existing pods continue running. Used before draining a node for maintenance.
CRD (Custom Resource Definition)	An extension to the Kubernetes API that defines a new resource type (e.g., `Cluster`, `Keycloak`, `CephCluster`). Operators use CRDs to manage applications declaratively.
CR (Custom Resource)	An instance of a CRD. For example, a `Keycloak` CR tells the Keycloak Operator to create a running Keycloak deployment.
DaemonSet	A workload that runs exactly one pod on every (or selected) node. Used for node-level agents like Falco, Fluent Bit, and Cilium.
Deployment	A workload that manages a set of identical pod replicas with rolling update support. The most common workload type for stateless services.
Drain	Evict all pods from a node (respecting PodDisruptionBudgets) and cordon it. Used to safely remove a node from the cluster for maintenance.
etcd	The distributed key-value store that backs the Kubernetes API. Stores all cluster state including Secrets, ConfigMaps, and resource definitions.
Eviction	The process of terminating a pod, either by the kubelet (resource pressure), the scheduler (preemption), or an administrator (drain).
Helm	A package manager for Kubernetes. Helm charts are templated YAML manifests with configurable values files.
Init container	A container that runs to completion before the main application container starts. Used for setup tasks like loading PKCS#11 libraries.
kubelet	The agent running on each node that manages pod lifecycle — starts containers, reports node status, and enforces resource limits.
kube-proxy	The default network proxy on each node that implements Kubernetes Service load balancing. Replaced by Cilium eBPF in RCIIS.
Kustomize	A template-free configuration management tool for Kubernetes that uses overlays and patches. Used with KSOPS for secret management.
Leader election	A pattern where one replica in a group is elected as the active leader. Used by controllers (cert-manager, Velero) to prevent duplicate work.
Namespace	A logical partition within a Kubernetes cluster for organising and isolating resources. Each platform tool deploys into its own namespace.
Node	A machine (physical or virtual) in the Kubernetes cluster. Control plane nodes run etcd and the API server; worker nodes run application pods.
Operator (pattern)	A Kubernetes controller that uses CRDs to manage the full lifecycle of a complex application — install, configure, upgrade, backup, and repair.
Pod	The smallest deployable unit in Kubernetes — one or more containers sharing a network namespace and storage volumes.
PodDisruptionBudget (PDB)	A policy that limits how many pods in a workload can be unavailable simultaneously during voluntary disruptions (drain, rolling update).
RBAC (Role-Based Access Control)	Kubernetes authorisation system that grants permissions (verbs on resources) to users, groups, or service accounts via Roles and RoleBindings.
Rolling update	A deployment strategy that replaces old pods with new ones incrementally, maintaining availability throughout the update.
Secret	A Kubernetes object for storing sensitive data (passwords, tokens, certificates). Stored in etcd and optionally encrypted at rest via KMS.
ServiceAccount	An identity for pods to authenticate with the Kubernetes API. Used for RBAC and workload identity.
Sidecar	A secondary container running alongside the main container in the same pod. Used for logging, proxying, or HSM PKCS#11 access.
StatefulSet	A workload for stateful applications (databases, message brokers) that provides stable network identities and persistent storage per replica.
Static pod	A pod managed directly by the kubelet on a specific node, not by the Kubernetes API. Used for critical control plane components.
Toleration	A pod property that allows it to schedule onto nodes with matching taints. Used with node taints to dedicate nodes for specific workloads.
Topology spread constraint	A scheduling rule that distributes pods evenly across failure domains (zones, nodes). Used for HA deployments.
Webhook (mutating / validating)	An HTTP callback that Kubernetes invokes during admission. Kyverno, cert-manager, and OPA/Gatekeeper all use admission webhooks.

Networking¶

Term	Definition
Anycast	A routing method where the same IP address is announced from multiple locations. The network routes traffic to the nearest location. Cloudflare uses Anycast for global load balancing.
ARP / GARP	Address Resolution Protocol maps IP addresses to MAC addresses. Gratuitous ARP (GARP) is an unsolicited broadcast used by kube-vip and MetalLB to announce VIP ownership.
BGP (Border Gateway Protocol)	The routing protocol that connects autonomous systems on the internet. Used by MetalLB and Cilium for advertising service IPs to network routers.
CIDR	Classless Inter-Domain Routing — a notation for IP address ranges (e.g., `10.0.0.0/16`). Used to define pod networks, service networks, and subnet boundaries.
Cilium	An eBPF-based CNI plugin that provides networking, observability (Hubble), load balancing, network policy, and WireGuard encryption. The primary CNI for RCIIS.
ClusterIP	The default Kubernetes Service type — an internal-only virtual IP that load-balances traffic across pod endpoints. Not accessible from outside the cluster.
CNI (Container Network Interface)	A specification and plugin system for configuring networking in Linux containers. Cilium is the CNI used in RCIIS.
CoreDNS	The DNS server running inside Kubernetes that resolves Service names to ClusterIPs. Also used for custom DNS zones in RCIIS.
DNS (Domain Name System)	The hierarchical naming system that translates human-readable domain names to IP addresses.
eBPF (extended Berkeley Packet Filter)	A Linux kernel technology that allows running sandboxed programs in kernel space without modifying kernel code. Used by Cilium (networking), Falco, and Tracee (security).
Egress	Outbound traffic from a pod or cluster to an external destination. Egress network policies control which external services pods can reach.
Gateway API	A Kubernetes-native API for managing ingress, load balancing, and traffic routing. The successor to the Ingress resource, supported by Cilium.
GENEVE	Generic Network Virtualization Encapsulation — a tunnelling protocol used by Cilium for pod-to-pod traffic across nodes.
Hubble	The observability layer built into Cilium that provides network flow visibility, DNS query logging, and HTTP request tracing.
Ingress	Inbound traffic from outside the cluster to services inside. Also refers to the Kubernetes Ingress resource that defines HTTP routing rules.
kube-vip	A lightweight load balancer for the Kubernetes control plane that provides a virtual IP (VIP) for the API server using ARP or BGP.
L2 / L4 / L7	OSI model layers — Layer 2 (data link / MAC), Layer 4 (transport / TCP/UDP), Layer 7 (application / HTTP). Load balancers and firewalls operate at different layers.
Load balancer	A component that distributes traffic across multiple backends. External LBs (AWS NLB, HAProxy) front the cluster; internal LBs (Cilium, kube-proxy) distribute within it.
MetalLB	A bare-metal load balancer for Kubernetes that assigns external IPs to Services using ARP or BGP. An alternative to Cilium L2 announcements.
mTLS (Mutual TLS)	TLS where both client and server authenticate each other with certificates. Used by Talos API, etcd, and optionally by Cilium (WireGuard).
NAT (Network Address Translation)	Translates private IP addresses to public addresses. NAT gateways provide internet access for private-subnet nodes on AWS.
Network policy	A Kubernetes resource that defines firewall rules for pod-to-pod and pod-to-external traffic. Cilium implements network policies using eBPF.
NodePort	A Kubernetes Service type that exposes the service on a static port on every node. Rarely used directly — usually fronted by a load balancer.
PoP (Point of Presence)	A physical location where a CDN or network provider has servers. Cloudflare has PoPs in 300+ cities for edge caching and traffic routing.
Subnet	A subdivision of an IP network. VPCs are divided into subnets (public, private, management) to isolate traffic.
TLS (Transport Layer Security)	Cryptographic protocol that provides encryption and authentication for network connections. Successor to SSL.
VIP (Virtual IP)	A floating IP address that can move between hosts for failover. Used by kube-vip and Keepalived for control plane HA.
VLAN	Virtual LAN — a logical network segment within a physical network. Used to isolate management, storage, and application traffic on bare metal.
VPC (Virtual Private Cloud)	An isolated virtual network within a cloud provider (AWS). Contains subnets, route tables, security groups, and NAT gateways.
VPN (Virtual Private Network)	An encrypted tunnel between two networks. Used for site-to-site connectivity between data centres or cloud regions.
WireGuard	A modern VPN protocol used by Cilium for transparent pod-to-pod encryption across nodes.

Security & Compliance¶

Term	Definition
Admission control	The process of intercepting API requests before persistence to validate or modify them. See admission controller above.
CA (Certificate Authority)	An entity that issues digital certificates. cert-manager automates CA operations in RCIIS; HSMs protect the CA signing key.
CIS Benchmark	A set of security configuration recommendations published by the Centre for Internet Security. The Kubernetes CIS Benchmark defines hardening rules for clusters.
Container escape	An attack where a process breaks out of container isolation to access the host. Detected by Falco and Tracee.
cosign	A tool from the Sigstore project for signing and verifying container images. RCIIS uses cosign in CI to sign images and Kyverno to verify signatures at admission.
CVE (Common Vulnerabilities and Exposures)	A unique identifier for a publicly disclosed security vulnerability (e.g., CVE-2024-1234). Trivy scans for CVEs in container images.
DDoS (Distributed Denial of Service)	An attack that overwhelms a service with traffic from many sources. Mitigated by Cloudflare WAF and rate limiting.
Defence in depth	A security strategy that layers multiple independent controls so that a failure in one layer does not compromise the system.
DEK (Data Encryption Key)	A key used to encrypt data directly. In KMS v2, the DEK is wrapped (encrypted) by a Key Encryption Key stored in the HSM.
Encryption at rest	Protecting stored data by encrypting it on disk. Implemented via LUKS2 (Talos), KMS (AWS), or ZFS encryption (Proxmox).
Encryption in transit	Protecting data moving across a network using TLS or mTLS. Talos manages all control plane TLS automatically.
Falco	A CNCF runtime security tool that detects anomalous syscall behaviour in containers (shell execution, unexpected file access, network connections).
FIPS 140-2 / 140-3	A US government standard for cryptographic module validation. Level 3 requires physical tamper-resistance (HSMs).
Fulcio	A Sigstore component that issues short-lived code-signing certificates tied to OIDC identity (e.g., GitHub Actions). Used in keyless signing workflows.
HSM (Hardware Security Module)	A tamper-resistant hardware device that stores cryptographic keys and performs signing/encryption operations. Keys never leave the HSM. See Encryption & HSM Provisioning.
Image signing	Cryptographically signing a container image to prove its authenticity and integrity. Verified by Kyverno at admission time.
ISO 27001	An international standard for information security management systems (ISMS). Defines controls for access, cryptography, operations, and development.
JWT (JSON Web Token)	A compact, signed token used for authentication and authorisation. Keycloak issues JWTs; services verify them using the public key from the OIDC discovery endpoint.
KEK (Key Encryption Key)	A key used to encrypt other keys (DEKs). In KMS v2, the KEK lives in the HSM and wraps DEKs that encrypt Kubernetes Secrets.
Key ceremony	A formal, witnessed process for initialising an HSM and generating root cryptographic keys. Documented with photographs and signed logs.
KMS (Key Management Service)	A service that manages cryptographic keys. AWS KMS provides cloud-managed keys; KMS v2 is a Kubernetes API for delegating Secret encryption to an external KMS.
Kyverno	A Kubernetes-native policy engine that enforces admission policies, mutates resources, and generates reports — without requiring a separate policy language.
Lateral movement	An attacker's technique of moving from one compromised component to another within the network. Prevented by network policies and namespace isolation.
LUKS2	Linux Unified Key Setup v2 — the standard for full-disk encryption on Linux. Talos uses LUKS2 for STATE and EPHEMERAL partition encryption.
M-of-N secret sharing	A scheme where a secret (e.g., HSM SO PIN) is split into N shares, and any M shares are needed to reconstruct it. Used for HSM key custody.
OIDC (OpenID Connect)	An identity layer on top of OAuth 2.0 that provides authentication. Keycloak is the OIDC provider for RCIIS; FluxCD dashboard and the Kubernetes API are OIDC clients.
PKCS#11	A standard API for communicating with cryptographic hardware (HSMs, smart cards). Applications use PKCS#11 to perform signing and encryption without accessing raw key material.
PKI (Public Key Infrastructure)	The framework of certificates, CAs, and trust chains that enables TLS and code signing. Talos manages the Kubernetes PKI; cert-manager manages application PKI.
Privilege escalation	An attack where a process gains higher permissions than intended. Detected by Falco and Tracee; prevented by Kyverno policies.
Rekor	A Sigstore component that provides an immutable transparency log for software signing events. Used to verify that a signature was created at a specific time.
SAML	Security Assertion Markup Language — an XML-based protocol for exchanging authentication and authorisation data. Used for enterprise SSO alongside OIDC.
SBOM (Software Bill of Materials)	A list of all components (libraries, dependencies) in a software artifact. Used for supply chain transparency and vulnerability tracking.
Sigstore	An open-source project for software signing, verification, and transparency. Includes cosign (signing), Fulcio (certificates), and Rekor (transparency log).
Supply chain attack	An attack that compromises software through its dependencies, build tools, or distribution channels. Mitigated by image signing and admission control.
Syscall	A system call — the interface between user-space programs and the kernel. Falco monitors syscalls to detect anomalous container behaviour.
Tracee	An Aqua Security runtime security tool that uses eBPF for deep forensic capture of container and host events. Complements Falco.
Trivy	An Aqua Security scanner for vulnerabilities (CVEs), misconfigurations, and secret leaks in container images, Kubernetes resources, and filesystems.
WAF (Web Application Firewall)	A firewall that filters HTTP traffic based on rules (SQL injection, XSS, rate limiting). Cloudflare WAF protects RCIIS external endpoints.
X.509	The standard format for public key certificates used in TLS. cert-manager automates X.509 certificate issuance and renewal.
Zero trust	A security model that assumes no implicit trust — every request must be authenticated, authorised, and encrypted regardless of network location.

Storage¶

Term	Definition
Block storage	Storage presented as raw block devices (like a virtual disk). Ceph RBD and AWS EBS provide block storage for Kubernetes PVCs.
Ceph	A distributed storage system that provides block (RBD), object (RGW), and file (CephFS) storage. Deployed via Rook-Ceph in RCIIS.
CephFS	Ceph's POSIX-compliant distributed filesystem. Used for ReadWriteMany (RWX) workloads where multiple pods need shared access.
CSI (Container Storage Interface)	A standard API for storage drivers in Kubernetes. Rook-Ceph implements CSI to provision PVCs from the Ceph cluster.
EBS (Elastic Block Store)	AWS block storage service. EBS volumes are attached to EC2 instances and used for Kubernetes PVCs on AWS.
Erasure coding	A storage redundancy technique that splits data into fragments with parity, using less space than full replication. Not currently used in RCIIS (replication is used instead).
gp3	An AWS EBS volume type providing baseline 3,000 IOPS and 125 MB/s throughput. The default volume type for RCIIS AWS deployments.
IOPS	Input/Output Operations Per Second — a measure of storage performance. gp3 volumes provide 3,000 baseline IOPS, scalable to 16,000.
MGR (Ceph Manager)	The Ceph manager daemon that provides monitoring, orchestration, and a management dashboard. Runs alongside MONs.
MON (Ceph Monitor)	The Ceph monitor daemon that maintains cluster membership, state maps, and quorum. Requires an odd number (3 or 5) for consensus.
Object storage	Storage accessed via HTTP APIs (S3, Swift). Ceph RGW provides S3-compatible object storage for backups and log archives.
OSD (Object Storage Daemon)	A Ceph daemon that manages a physical or logical disk. Each OSD stores data, handles replication, and participates in recovery.
PV (PersistentVolume)	A Kubernetes resource representing a piece of provisioned storage. Created dynamically by a StorageClass or statically by an administrator.
PVC (PersistentVolumeClaim)	A request for storage by a pod. The PVC binds to a PV that satisfies its size and access mode requirements.
Quorum	The minimum number of members that must agree for a distributed system to make progress. Ceph MONs, etcd, and PostgreSQL streaming replication all require quorum.
RBD (RADOS Block Device)	Ceph's block storage interface. RBD provides thin-provisioned, snapshotable block devices for Kubernetes PVCs.
RGW (RADOS Gateway)	Ceph's S3-compatible object storage gateway. Used for Velero backups, Loki log storage, and CNPG WAL archiving.
Rook	A Kubernetes operator that automates deployment and management of Ceph clusters. Rook handles OSD provisioning, MON placement, and cluster health.
Snapshot Controller	A Kubernetes controller that manages CSI VolumeSnapshots — point-in-time copies of PVCs used for backup and cloning.
StorageClass	A Kubernetes resource that defines how PVCs are dynamically provisioned — which CSI driver, replication factor, and parameters to use.
Throughput	The rate of data transfer (MB/s or GB/s). Relevant for streaming workloads, database WAL writes, and backup operations.
VolumeSnapshot	A point-in-time copy of a PVC, backed by the CSI driver (e.g., Ceph RBD snapshot). Used for pre-upgrade backups and database cloning.
ZFS	A combined filesystem and volume manager with built-in compression, snapshots, and encryption. Used on Proxmox hosts for VM storage.

Observability & SRE¶

For detailed explanations with practical examples and PromQL queries, see SRE & Observability Concepts.

Term	Definition
Alertmanager	The Prometheus component that routes, deduplicates, groups, and silences alerts. Sends notifications to Slack, PagerDuty, email, or webhooks.
Availability	The proportion of time a service is operational, expressed as a percentage ("nines"). See Availability & Nines.
Blackbox monitoring	Monitoring a service from the outside — probing endpoints without knowledge of internal state. The Blackbox Exporter performs HTTP, TCP, and ICMP probes.
Burn rate	The rate at which an error budget is being consumed. A burn rate of 1.0 means the budget is being used evenly; >1.0 means faster than sustainable. See Error Budgets.
Cardinality	The number of unique time series in Prometheus. High cardinality (from unbounded label values) causes memory and performance issues. See Cardinality.
Chaos engineering	The practice of deliberately injecting failures to verify system resilience. See Chaos Engineering.
Counter	A Prometheus metric type that only increases (or resets to zero). Used for totals like request count, bytes transferred. Always use `rate()` on counters.
Dashboard	A visual display of metrics and logs, typically in Grafana. Dashboards provide real-time operational visibility into platform health.
Error budget	The amount of unreliability permitted by the SLO. Calculated as `100% - SLO target`. See Error Budgets.
Game day	A planned exercise where the team practices incident response against a simulated failure. See Game Days.
Gauge	A Prometheus metric type that can go up or down. Used for current values like memory usage, temperature, active connections.
Golden signals	The four key metrics for any service: latency, traffic, errors, saturation. See Four Golden Signals.
Grafana	A visualisation platform for creating dashboards from Prometheus, Loki, and other data sources.
Histogram	A Prometheus metric type that counts observations in configurable buckets. Used for latency distributions and calculated with `histogram_quantile()`.
LogQL	The query language for Grafana Loki. Uses label selectors and pipeline stages to filter and aggregate log lines. See Loki & LogQL.
Loki	A log aggregation system by Grafana Labs. Indexes log metadata (labels) rather than full text, making it efficient for Kubernetes log storage.
MTBF (Mean Time Between Failures)	The average time between consecutive failures. `MTBF = MTTF + MTTR`. See Incident Metrics.
MTTF (Mean Time to Failure)	The average time a system runs before failing. See Incident Metrics.
MTTR (Mean Time to Recovery)	The average time from incident detection to service restoration. The most actionable reliability metric. See Incident Metrics.
Observability	The ability to understand a system's internal state from its external outputs (metrics, logs, traces). The combination of Prometheus, Loki, and Grafana provides observability for RCIIS.
On-call	A rotation where designated team members are available to respond to incidents outside business hours.
P50 / P95 / P99 / P999	Latency percentiles — the maximum latency experienced by 50%, 95%, 99%, or 99.9% of requests. See Latency Percentiles.
Post-mortem	A blameless document written after an incident that describes the timeline, root cause, impact, and preventive action items. See Post-Mortems.
Prometheus	An open-source monitoring system that collects metrics via a pull model, stores them in a time-series database, and evaluates alert rules.
PromQL	The query language for Prometheus. Used in dashboards, alerts, and recording rules. See Prometheus & PromQL.
Recording rule	A PromQL expression that is pre-computed at regular intervals and stored as a new time series. Improves query performance for dashboards and alerts.
RED method	A monitoring framework for request-driven services: Rate, Errors, Duration. See RED Method.
Runbook	A documented procedure linked to an alert that describes how to diagnose and resolve the issue. See Alerting Best Practices.
Scrape	The process of Prometheus pulling metrics from a target's `/metrics` endpoint at a configured interval.
ServiceMonitor	A Prometheus Operator CRD that defines how Prometheus should scrape metrics from a Kubernetes Service.
SLA (Service Level Agreement)	A contractual commitment guaranteeing a minimum level of service reliability, with consequences for breach. See Service Levels.
SLI (Service Level Indicator)	A quantitative measurement of service performance (e.g., error rate, latency). See Service Levels.
SLO (Service Level Objective)	A target value for an SLI over a time window (e.g., "99.9% availability over 30 days"). See Service Levels.
SRE (Site Reliability Engineering)	A discipline that applies software engineering practices to infrastructure and operations. Originated at Google. See SRE & Observability Concepts.
Toil	Repetitive, manual, automatable operational work that does not provide enduring value. See Toil & Automation.
USE method	A monitoring framework for infrastructure resources: Utilization, Saturation, Errors. See USE Method.
VPA (Vertical Pod Autoscaler)	A Kubernetes component that recommends or automatically adjusts pod CPU and memory requests based on observed usage. Goldilocks uses VPA in recommendation mode.
Whitebox monitoring	Monitoring a service from the inside — using internal metrics, logs, and instrumentation. Prometheus scraping application metrics is whitebox monitoring.

GitOps & CI/CD¶

Term	Definition
Age	A modern file encryption tool used with SOPS for encrypting Kubernetes secrets. Simpler than GPG with smaller keys.
Kustomization (FluxCD)	A FluxCD resource that defines a desired state (source repo + target cluster + reconciliation interval) and continuously reconciles the cluster to match.
HelmRelease	A FluxCD resource that declaratively manages Helm chart installations with automated upgrades, rollbacks, and drift detection.
FluxCD	A declarative GitOps continuous delivery tool for Kubernetes. Watches Git repositories and automatically syncs cluster state to match.
Argo Rollouts	A Kubernetes controller for progressive delivery strategies — canary deployments, blue-green deployments, and analysis-driven rollbacks.
Blue-green deployment	A release strategy that runs two identical environments (blue = current, green = new) and switches traffic atomically. Zero-downtime but requires double the resources.
Canary deployment	A release strategy that routes a small percentage of traffic to the new version, gradually increasing if metrics are healthy. Detected regressions trigger automatic rollback.
CI/CD	Continuous Integration (automated build + test) and Continuous Delivery/Deployment (automated release to environments).
Declarative	A configuration style that describes the desired end state rather than the steps to reach it. Kubernetes manifests and FluxCD Kustomizations are declarative.
Drift	When the actual state of the cluster diverges from the desired state defined in Git. FluxCD detects and can auto-correct drift via continuous reconciliation.
GitOps	An operational model where Git is the single source of truth for infrastructure and application configuration. Changes are applied via pull requests and automated reconciliation.
Harbor	An open-source container registry with vulnerability scanning, image signing, and replication. The RCIIS private registry for Helm charts and container images.
Helm chart	A package of Kubernetes YAML templates with a `values.yaml` file for configuration. Charts are versioned and stored in registries (Harbor, OCI).
Idempotent	An operation that produces the same result regardless of how many times it is applied. `kubectl apply` and Helm upgrades are idempotent.
KSOPS	A Kustomize plugin that integrates SOPS decryption into the Kustomize build process. Enables encrypted secrets in GitOps workflows.
Multi-source Kustomization	A FluxCD Kustomization that references multiple sources — typically a HelmRepository for the chart and a GitRepository for values overrides.
OCI (Open Container Initiative)	A set of standards for container image formats and registries. Helm charts can be stored as OCI artifacts in registries like Harbor.
Progressive delivery	A release strategy that gradually shifts traffic to a new version based on real-time metrics. Canary and blue-green are types of progressive delivery.
Reconciliation	The process of comparing actual state to desired state and making corrections. FluxCD reconciles at a configurable interval (default 10 minutes).
Renovate	An automated dependency update tool that creates pull requests when new versions of Helm charts, container images, or other dependencies are available.
Dependency ordering	A FluxCD Kustomization feature that uses `dependsOn` to ensure resources are deployed in the correct order across environments.
Drift detection	A FluxCD feature that automatically reverts manual changes to cluster resources, ensuring the Git-defined state is maintained when `prune` and `force` are enabled.
SOPS	Secrets OPerationS — a tool for encrypting/decrypting files using Age, PGP, or KMS keys. Only values are encrypted; keys and metadata remain readable.
Reconciliation (FluxCD)	The process of applying the desired state from Git to the cluster. Runs automatically at the configured interval, with options for pruning deleted resources.
DependsOn	A deployment ordering mechanism in FluxCD Kustomizations. Resources with dependencies wait for their prerequisites to become ready before deploying.

Infrastructure & IaC¶

Term	Definition
AMI (Amazon Machine Image)	A pre-built virtual machine image for AWS EC2. The Talos AMI provides a pre-installed Talos Linux image for each AWS region.
BMC / IPMI	Baseboard Management Controller / Intelligent Platform Management Interface — remote management hardware on servers for power control, console access, and monitoring without an OS.
Bare metal	Physical servers without virtualisation. Provides maximum performance but requires manual hardware management, PXE booting, and physical access for maintenance.
Cloud-init	A tool for initialising cloud instances on first boot — setting hostname, SSH keys, network config. Used by some Proxmox and cloud deployments; Talos uses its own machine config instead.
Control plane	The set of Kubernetes components (API server, etcd, scheduler, controller manager) that manage cluster state. Runs on dedicated nodes in production.
Day-0 / Day-1 / Day-2	Lifecycle phases — Day-0 is planning and design, Day-1 is initial deployment, Day-2 is ongoing operations. See Reliability Engineering Practices.
EC2	Amazon Elastic Compute Cloud — virtual machine instances in AWS. RCIIS uses EC2 for control plane and worker nodes on AWS.
Terraform	An Infrastructure as Code (IaC) tool by HashiCorp that provisions and manages cloud and on-premises infrastructure using declarative HCL configuration files. Used in RCIIS for provisioning AWS and Proxmox infrastructure.
HA (High Availability)	A design approach that eliminates single points of failure through redundancy, replication, and automatic failover. Requires 3+ nodes.
IaC (Infrastructure as Code)	Managing infrastructure through version-controlled configuration files rather than manual processes. Terraform, HCL, and Helm are all IaC tools.
iDRAC / iLO	Dell (iDRAC) and HPE (iLO) implementations of BMC for remote server management. Accessed via web UI or CLI for power, console, and hardware monitoring.
Image Factory	A Talos service that builds custom Talos Linux images with specific system extensions (e.g., iscsi-tools, qemu-guest-agent) baked in.
Immutable OS	An operating system with a read-only root filesystem that cannot be modified at runtime. Talos Linux is immutable — configuration is applied declaratively via machine config.
iPXE / PXE	Network boot protocols for loading an OS image from a server over the network. Used for bare metal Talos installations.
Kind	Kubernetes in Docker — a tool for running local Kubernetes clusters using Docker containers as nodes. Used for development and testing.
Machine config	The YAML configuration file that defines a Talos node's identity, networking, disk encryption, and cluster membership. Applied at boot or via `talosctl apply-config`.
NTP	Network Time Protocol — synchronises system clocks across servers. Critical for TLS certificate validation, log correlation, and distributed consensus.
NVMe / SSD / HDD	Storage device types — NVMe (fastest, PCIe-attached), SSD (fast, SATA/SAS), HDD (slowest, mechanical). NVMe is recommended for Ceph OSDs.
OOB (Out-of-Band)	Management access to a server that is independent of the main operating system — typically via IPMI/BMC over a dedicated management network.
HCL (HashiCorp Configuration Language)	The declarative configuration language used by Terraform for defining infrastructure resources. Supports variables, modules, and expressions.
Proxmox VE	An open-source virtualisation platform based on KVM and LXC. Used as an alternative to cloud providers for running Kubernetes VMs on-premises.
RAID	Redundant Array of Independent Disks — combines multiple disks for redundancy or performance. Hardware RAID is configured in BIOS/UEFI before OS installation.
Schematic	A Talos configuration that defines which system extensions to include in a custom Talos image. Submitted to Image Factory to produce a downloadable image.
System extension	A read-only overlay that adds functionality to Talos Linux (e.g., `iscsi-tools`, `qemu-guest-agent`, `bpf` for Falco). Baked into the image at build time.
Talos Linux	A minimal, immutable, API-managed Linux distribution designed exclusively for running Kubernetes. There is no SSH, shell, or package manager — all management is via the Talos API.
talosctl	The CLI for managing Talos Linux nodes — applying config, upgrading, reading logs, and accessing etcd. Authenticated via mTLS certificates.
Talhelper	A helper tool that generates Talos machine configurations from a single `talconfig.yaml` definition, simplifying multi-node config management.

| Worker node | A Kubernetes node that runs application workloads (pods). Does not run control plane components (on dedicated control plane nodes). |

Domain & Acronyms¶

Term	Definition
APISIX	Apache APISIX — a cloud-native API gateway used as the primary ingress and traffic management layer for RCIIS application services.
CloudNativePG (CNPG)	A Kubernetes operator for managing PostgreSQL clusters with automated failover, backup, and recovery. Used for Keycloak and RCIIS application databases.
DR (Disaster Recovery)	The process and procedures for restoring service after a catastrophic failure. Includes backups (Velero, CNPG), geo-load balancing, and documented recovery runbooks.
EAC	East African Community — the regional intergovernmental organisation for which the RCIIS platform is built. Partner states include Kenya, Tanzania, Uganda, Rwanda, Burundi, DRC, South Sudan, and Somalia.
ESB (Enterprise Service Bus)	A middleware pattern for integrating applications via a central message broker. The RCIIS ESB handles customs data exchange between partner states.
Fluent Bit	A lightweight log processor and forwarder deployed as a DaemonSet. Collects container logs and ships them to Loki.
Goldilocks	A tool that runs VPA in recommendation mode and provides a dashboard showing right-sizing suggestions for pod resource requests and limits.
Keepalived	A Linux daemon that provides VRRP-based failover for virtual IPs. Used alongside HAProxy for load balancer HA on bare metal.
HAProxy	A high-performance TCP/HTTP load balancer. Used for Kubernetes API load balancing on bare metal and Proxmox deployments.
Kafka	Apache Kafka — a distributed event streaming platform. Deployed via Strimzi in RCIIS for asynchronous customs data processing.
Partner state	A member country of the EAC that participates in the RCIIS customs interconnectivity system.
PITR (Point-in-Time Recovery)	Restoring a database to a specific moment by replaying WAL segments from a backup. CloudNativePG supports PITR for PostgreSQL.
RCIIS	Regional Customs Interconnectivity Integration System — the platform being deployed by this documentation. Connects customs systems of EAC partner states.
RPO (Recovery Point Objective)	The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means backups must be no more than 1 hour old.
RTO (Recovery Time Objective)	The maximum acceptable time to restore service after a failure. An RTO of 30 minutes means the service must be back within 30 minutes.
SIEM (Security Information and Event Management)	A system that aggregates and analyses security events from multiple sources. Falco and Tracee events can be forwarded to a SIEM.
Strimzi	A Kubernetes operator for running Apache Kafka clusters. Manages brokers, topics, users, and connectors declaratively via CRDs.
Velero	A Kubernetes backup and disaster recovery tool that backs up cluster resources and PersistentVolumes to S3-compatible storage.
WAL (Write-Ahead Log)	A database transaction log where changes are written before being applied to data files. PostgreSQL WAL segments are archived by CNPG for continuous backup and PITR.
WCO SAFE Framework	World Customs Organization Framework of Standards to Secure and Facilitate Global Trade — an international standard relevant to RCIIS compliance.