Glossary¶
A categorised reference of technical terms, acronyms, and jargon used throughout this documentation. Terms are grouped by domain and sorted alphabetically within each category.
For in-depth explanations of observability and SRE concepts (SLIs, SLOs, percentiles, golden signals, etc.), see SRE & Observability Concepts.
Kubernetes & Containers¶
| Term | Definition |
|---|---|
| Admission controller | A Kubernetes API server plugin that intercepts requests before objects are persisted. Mutating admission controllers modify objects; validating admission controllers reject non-compliant ones. Kyverno is the admission controller used in RCIIS. |
| Affinity / Anti-affinity | Scheduling rules that attract (affinity) or repel (anti-affinity) pods relative to nodes or other pods. Used to spread replicas across failure domains. |
| ConfigMap | A Kubernetes object for storing non-sensitive configuration data as key-value pairs. Mounted as files or environment variables in pods. |
| Container | A lightweight, isolated process running from a container image. Containers share the host kernel but have their own filesystem, network, and process namespace. |
| containerd | The container runtime used by Kubernetes (and Talos) to pull images, create containers, and manage their lifecycle. Replaced Docker as the default runtime. |
| Cordon | Mark a node as unschedulable so no new pods are assigned to it. Existing pods continue running. Used before draining a node for maintenance. |
| CRD (Custom Resource Definition) | An extension to the Kubernetes API that defines a new resource type (e.g., Cluster, Keycloak, CephCluster). Operators use CRDs to manage applications declaratively. |
| CR (Custom Resource) | An instance of a CRD. For example, a Keycloak CR tells the Keycloak Operator to create a running Keycloak deployment. |
| DaemonSet | A workload that runs exactly one pod on every (or selected) node. Used for node-level agents like Falco, Fluent Bit, and Cilium. |
| Deployment | A workload that manages a set of identical pod replicas with rolling update support. The most common workload type for stateless services. |
| Drain | Evict all pods from a node (respecting PodDisruptionBudgets) and cordon it. Used to safely remove a node from the cluster for maintenance. |
| etcd | The distributed key-value store that backs the Kubernetes API. Stores all cluster state including Secrets, ConfigMaps, and resource definitions. |
| Eviction | The process of terminating a pod, either by the kubelet (resource pressure), the scheduler (preemption), or an administrator (drain). |
| Helm | A package manager for Kubernetes. Helm charts are templated YAML manifests with configurable values files. |
| Init container | A container that runs to completion before the main application container starts. Used for setup tasks like loading PKCS#11 libraries. |
| kubelet | The agent running on each node that manages pod lifecycle — starts containers, reports node status, and enforces resource limits. |
| kube-proxy | The default network proxy on each node that implements Kubernetes Service load balancing. Replaced by Cilium eBPF in RCIIS. |
| Kustomize | A template-free configuration management tool for Kubernetes that uses overlays and patches. Used with KSOPS for secret management. |
| Leader election | A pattern where one replica in a group is elected as the active leader. Used by controllers (cert-manager, Velero) to prevent duplicate work. |
| Namespace | A logical partition within a Kubernetes cluster for organising and isolating resources. Each platform tool deploys into its own namespace. |
| Node | A machine (physical or virtual) in the Kubernetes cluster. Control plane nodes run etcd and the API server; worker nodes run application pods. |
| Operator (pattern) | A Kubernetes controller that uses CRDs to manage the full lifecycle of a complex application — install, configure, upgrade, backup, and repair. |
| Pod | The smallest deployable unit in Kubernetes — one or more containers sharing a network namespace and storage volumes. |
| PodDisruptionBudget (PDB) | A policy that limits how many pods in a workload can be unavailable simultaneously during voluntary disruptions (drain, rolling update). |
| RBAC (Role-Based Access Control) | Kubernetes authorisation system that grants permissions (verbs on resources) to users, groups, or service accounts via Roles and RoleBindings. |
| Rolling update | A deployment strategy that replaces old pods with new ones incrementally, maintaining availability throughout the update. |
| Secret | A Kubernetes object for storing sensitive data (passwords, tokens, certificates). Stored in etcd and optionally encrypted at rest via KMS. |
| ServiceAccount | An identity for pods to authenticate with the Kubernetes API. Used for RBAC and workload identity. |
| Sidecar | A secondary container running alongside the main container in the same pod. Used for logging, proxying, or HSM PKCS#11 access. |
| StatefulSet | A workload for stateful applications (databases, message brokers) that provides stable network identities and persistent storage per replica. |
| Static pod | A pod managed directly by the kubelet on a specific node, not by the Kubernetes API. Used for critical control plane components. |
| Toleration | A pod property that allows it to schedule onto nodes with matching taints. Used with node taints to dedicate nodes for specific workloads. |
| Topology spread constraint | A scheduling rule that distributes pods evenly across failure domains (zones, nodes). Used for HA deployments. |
| Webhook (mutating / validating) | An HTTP callback that Kubernetes invokes during admission. Kyverno, cert-manager, and OPA/Gatekeeper all use admission webhooks. |
Networking¶
| Term | Definition |
|---|---|
| Anycast | A routing method where the same IP address is announced from multiple locations. The network routes traffic to the nearest location. Cloudflare uses Anycast for global load balancing. |
| ARP / GARP | Address Resolution Protocol maps IP addresses to MAC addresses. Gratuitous ARP (GARP) is an unsolicited broadcast used by kube-vip and MetalLB to announce VIP ownership. |
| BGP (Border Gateway Protocol) | The routing protocol that connects autonomous systems on the internet. Used by MetalLB and Cilium for advertising service IPs to network routers. |
| CIDR | Classless Inter-Domain Routing — a notation for IP address ranges (e.g., 10.0.0.0/16). Used to define pod networks, service networks, and subnet boundaries. |
| Cilium | An eBPF-based CNI plugin that provides networking, observability (Hubble), load balancing, network policy, and WireGuard encryption. The primary CNI for RCIIS. |
| ClusterIP | The default Kubernetes Service type — an internal-only virtual IP that load-balances traffic across pod endpoints. Not accessible from outside the cluster. |
| CNI (Container Network Interface) | A specification and plugin system for configuring networking in Linux containers. Cilium is the CNI used in RCIIS. |
| CoreDNS | The DNS server running inside Kubernetes that resolves Service names to ClusterIPs. Also used for custom DNS zones in RCIIS. |
| DNS (Domain Name System) | The hierarchical naming system that translates human-readable domain names to IP addresses. |
| eBPF (extended Berkeley Packet Filter) | A Linux kernel technology that allows running sandboxed programs in kernel space without modifying kernel code. Used by Cilium (networking), Falco, and Tracee (security). |
| Egress | Outbound traffic from a pod or cluster to an external destination. Egress network policies control which external services pods can reach. |
| Gateway API | A Kubernetes-native API for managing ingress, load balancing, and traffic routing. The successor to the Ingress resource, supported by Cilium. |
| GENEVE | Generic Network Virtualization Encapsulation — a tunnelling protocol used by Cilium for pod-to-pod traffic across nodes. |
| Hubble | The observability layer built into Cilium that provides network flow visibility, DNS query logging, and HTTP request tracing. |
| Ingress | Inbound traffic from outside the cluster to services inside. Also refers to the Kubernetes Ingress resource that defines HTTP routing rules. |
| kube-vip | A lightweight load balancer for the Kubernetes control plane that provides a virtual IP (VIP) for the API server using ARP or BGP. |
| L2 / L4 / L7 | OSI model layers — Layer 2 (data link / MAC), Layer 4 (transport / TCP/UDP), Layer 7 (application / HTTP). Load balancers and firewalls operate at different layers. |
| Load balancer | A component that distributes traffic across multiple backends. External LBs (AWS NLB, HAProxy) front the cluster; internal LBs (Cilium, kube-proxy) distribute within it. |
| MetalLB | A bare-metal load balancer for Kubernetes that assigns external IPs to Services using ARP or BGP. An alternative to Cilium L2 announcements. |
| mTLS (Mutual TLS) | TLS where both client and server authenticate each other with certificates. Used by Talos API, etcd, and optionally by Cilium (WireGuard). |
| NAT (Network Address Translation) | Translates private IP addresses to public addresses. NAT gateways provide internet access for private-subnet nodes on AWS. |
| Network policy | A Kubernetes resource that defines firewall rules for pod-to-pod and pod-to-external traffic. Cilium implements network policies using eBPF. |
| NodePort | A Kubernetes Service type that exposes the service on a static port on every node. Rarely used directly — usually fronted by a load balancer. |
| PoP (Point of Presence) | A physical location where a CDN or network provider has servers. Cloudflare has PoPs in 300+ cities for edge caching and traffic routing. |
| Subnet | A subdivision of an IP network. VPCs are divided into subnets (public, private, management) to isolate traffic. |
| TLS (Transport Layer Security) | Cryptographic protocol that provides encryption and authentication for network connections. Successor to SSL. |
| VIP (Virtual IP) | A floating IP address that can move between hosts for failover. Used by kube-vip and Keepalived for control plane HA. |
| VLAN | Virtual LAN — a logical network segment within a physical network. Used to isolate management, storage, and application traffic on bare metal. |
| VPC (Virtual Private Cloud) | An isolated virtual network within a cloud provider (AWS). Contains subnets, route tables, security groups, and NAT gateways. |
| VPN (Virtual Private Network) | An encrypted tunnel between two networks. Used for site-to-site connectivity between data centres or cloud regions. |
| WireGuard | A modern VPN protocol used by Cilium for transparent pod-to-pod encryption across nodes. |
Security & Compliance¶
| Term | Definition |
|---|---|
| Admission control | The process of intercepting API requests before persistence to validate or modify them. See admission controller above. |
| CA (Certificate Authority) | An entity that issues digital certificates. cert-manager automates CA operations in RCIIS; HSMs protect the CA signing key. |
| CIS Benchmark | A set of security configuration recommendations published by the Centre for Internet Security. The Kubernetes CIS Benchmark defines hardening rules for clusters. |
| Container escape | An attack where a process breaks out of container isolation to access the host. Detected by Falco and Tracee. |
| cosign | A tool from the Sigstore project for signing and verifying container images. RCIIS uses cosign in CI to sign images and Kyverno to verify signatures at admission. |
| CVE (Common Vulnerabilities and Exposures) | A unique identifier for a publicly disclosed security vulnerability (e.g., CVE-2024-1234). Trivy scans for CVEs in container images. |
| DDoS (Distributed Denial of Service) | An attack that overwhelms a service with traffic from many sources. Mitigated by Cloudflare WAF and rate limiting. |
| Defence in depth | A security strategy that layers multiple independent controls so that a failure in one layer does not compromise the system. |
| DEK (Data Encryption Key) | A key used to encrypt data directly. In KMS v2, the DEK is wrapped (encrypted) by a Key Encryption Key stored in the HSM. |
| Encryption at rest | Protecting stored data by encrypting it on disk. Implemented via LUKS2 (Talos), KMS (AWS), or ZFS encryption (Proxmox). |
| Encryption in transit | Protecting data moving across a network using TLS or mTLS. Talos manages all control plane TLS automatically. |
| Falco | A CNCF runtime security tool that detects anomalous syscall behaviour in containers (shell execution, unexpected file access, network connections). |
| FIPS 140-2 / 140-3 | A US government standard for cryptographic module validation. Level 3 requires physical tamper-resistance (HSMs). |
| Fulcio | A Sigstore component that issues short-lived code-signing certificates tied to OIDC identity (e.g., GitHub Actions). Used in keyless signing workflows. |
| HSM (Hardware Security Module) | A tamper-resistant hardware device that stores cryptographic keys and performs signing/encryption operations. Keys never leave the HSM. See Encryption & HSM Provisioning. |
| Image signing | Cryptographically signing a container image to prove its authenticity and integrity. Verified by Kyverno at admission time. |
| ISO 27001 | An international standard for information security management systems (ISMS). Defines controls for access, cryptography, operations, and development. |
| JWT (JSON Web Token) | A compact, signed token used for authentication and authorisation. Keycloak issues JWTs; services verify them using the public key from the OIDC discovery endpoint. |
| KEK (Key Encryption Key) | A key used to encrypt other keys (DEKs). In KMS v2, the KEK lives in the HSM and wraps DEKs that encrypt Kubernetes Secrets. |
| Key ceremony | A formal, witnessed process for initialising an HSM and generating root cryptographic keys. Documented with photographs and signed logs. |
| KMS (Key Management Service) | A service that manages cryptographic keys. AWS KMS provides cloud-managed keys; KMS v2 is a Kubernetes API for delegating Secret encryption to an external KMS. |
| Kyverno | A Kubernetes-native policy engine that enforces admission policies, mutates resources, and generates reports — without requiring a separate policy language. |
| Lateral movement | An attacker's technique of moving from one compromised component to another within the network. Prevented by network policies and namespace isolation. |
| LUKS2 | Linux Unified Key Setup v2 — the standard for full-disk encryption on Linux. Talos uses LUKS2 for STATE and EPHEMERAL partition encryption. |
| M-of-N secret sharing | A scheme where a secret (e.g., HSM SO PIN) is split into N shares, and any M shares are needed to reconstruct it. Used for HSM key custody. |
| OIDC (OpenID Connect) | An identity layer on top of OAuth 2.0 that provides authentication. Keycloak is the OIDC provider for RCIIS; FluxCD dashboard and the Kubernetes API are OIDC clients. |
| PKCS#11 | A standard API for communicating with cryptographic hardware (HSMs, smart cards). Applications use PKCS#11 to perform signing and encryption without accessing raw key material. |
| PKI (Public Key Infrastructure) | The framework of certificates, CAs, and trust chains that enables TLS and code signing. Talos manages the Kubernetes PKI; cert-manager manages application PKI. |
| Privilege escalation | An attack where a process gains higher permissions than intended. Detected by Falco and Tracee; prevented by Kyverno policies. |
| Rekor | A Sigstore component that provides an immutable transparency log for software signing events. Used to verify that a signature was created at a specific time. |
| SAML | Security Assertion Markup Language — an XML-based protocol for exchanging authentication and authorisation data. Used for enterprise SSO alongside OIDC. |
| SBOM (Software Bill of Materials) | A list of all components (libraries, dependencies) in a software artifact. Used for supply chain transparency and vulnerability tracking. |
| Sigstore | An open-source project for software signing, verification, and transparency. Includes cosign (signing), Fulcio (certificates), and Rekor (transparency log). |
| Supply chain attack | An attack that compromises software through its dependencies, build tools, or distribution channels. Mitigated by image signing and admission control. |
| Syscall | A system call — the interface between user-space programs and the kernel. Falco monitors syscalls to detect anomalous container behaviour. |
| Tracee | An Aqua Security runtime security tool that uses eBPF for deep forensic capture of container and host events. Complements Falco. |
| Trivy | An Aqua Security scanner for vulnerabilities (CVEs), misconfigurations, and secret leaks in container images, Kubernetes resources, and filesystems. |
| WAF (Web Application Firewall) | A firewall that filters HTTP traffic based on rules (SQL injection, XSS, rate limiting). Cloudflare WAF protects RCIIS external endpoints. |
| X.509 | The standard format for public key certificates used in TLS. cert-manager automates X.509 certificate issuance and renewal. |
| Zero trust | A security model that assumes no implicit trust — every request must be authenticated, authorised, and encrypted regardless of network location. |
Storage¶
| Term | Definition |
|---|---|
| Block storage | Storage presented as raw block devices (like a virtual disk). Ceph RBD and AWS EBS provide block storage for Kubernetes PVCs. |
| Ceph | A distributed storage system that provides block (RBD), object (RGW), and file (CephFS) storage. Deployed via Rook-Ceph in RCIIS. |
| CephFS | Ceph's POSIX-compliant distributed filesystem. Used for ReadWriteMany (RWX) workloads where multiple pods need shared access. |
| CSI (Container Storage Interface) | A standard API for storage drivers in Kubernetes. Rook-Ceph implements CSI to provision PVCs from the Ceph cluster. |
| EBS (Elastic Block Store) | AWS block storage service. EBS volumes are attached to EC2 instances and used for Kubernetes PVCs on AWS. |
| Erasure coding | A storage redundancy technique that splits data into fragments with parity, using less space than full replication. Not currently used in RCIIS (replication is used instead). |
| gp3 | An AWS EBS volume type providing baseline 3,000 IOPS and 125 MB/s throughput. The default volume type for RCIIS AWS deployments. |
| IOPS | Input/Output Operations Per Second — a measure of storage performance. gp3 volumes provide 3,000 baseline IOPS, scalable to 16,000. |
| MGR (Ceph Manager) | The Ceph manager daemon that provides monitoring, orchestration, and a management dashboard. Runs alongside MONs. |
| MON (Ceph Monitor) | The Ceph monitor daemon that maintains cluster membership, state maps, and quorum. Requires an odd number (3 or 5) for consensus. |
| Object storage | Storage accessed via HTTP APIs (S3, Swift). Ceph RGW provides S3-compatible object storage for backups and log archives. |
| OSD (Object Storage Daemon) | A Ceph daemon that manages a physical or logical disk. Each OSD stores data, handles replication, and participates in recovery. |
| PV (PersistentVolume) | A Kubernetes resource representing a piece of provisioned storage. Created dynamically by a StorageClass or statically by an administrator. |
| PVC (PersistentVolumeClaim) | A request for storage by a pod. The PVC binds to a PV that satisfies its size and access mode requirements. |
| Quorum | The minimum number of members that must agree for a distributed system to make progress. Ceph MONs, etcd, and PostgreSQL streaming replication all require quorum. |
| RBD (RADOS Block Device) | Ceph's block storage interface. RBD provides thin-provisioned, snapshotable block devices for Kubernetes PVCs. |
| RGW (RADOS Gateway) | Ceph's S3-compatible object storage gateway. Used for Velero backups, Loki log storage, and CNPG WAL archiving. |
| Rook | A Kubernetes operator that automates deployment and management of Ceph clusters. Rook handles OSD provisioning, MON placement, and cluster health. |
| Snapshot Controller | A Kubernetes controller that manages CSI VolumeSnapshots — point-in-time copies of PVCs used for backup and cloning. |
| StorageClass | A Kubernetes resource that defines how PVCs are dynamically provisioned — which CSI driver, replication factor, and parameters to use. |
| Throughput | The rate of data transfer (MB/s or GB/s). Relevant for streaming workloads, database WAL writes, and backup operations. |
| VolumeSnapshot | A point-in-time copy of a PVC, backed by the CSI driver (e.g., Ceph RBD snapshot). Used for pre-upgrade backups and database cloning. |
| ZFS | A combined filesystem and volume manager with built-in compression, snapshots, and encryption. Used on Proxmox hosts for VM storage. |
Observability & SRE¶
For detailed explanations with practical examples and PromQL queries, see SRE & Observability Concepts.
| Term | Definition |
|---|---|
| Alertmanager | The Prometheus component that routes, deduplicates, groups, and silences alerts. Sends notifications to Slack, PagerDuty, email, or webhooks. |
| Availability | The proportion of time a service is operational, expressed as a percentage ("nines"). See Availability & Nines. |
| Blackbox monitoring | Monitoring a service from the outside — probing endpoints without knowledge of internal state. The Blackbox Exporter performs HTTP, TCP, and ICMP probes. |
| Burn rate | The rate at which an error budget is being consumed. A burn rate of 1.0 means the budget is being used evenly; >1.0 means faster than sustainable. See Error Budgets. |
| Cardinality | The number of unique time series in Prometheus. High cardinality (from unbounded label values) causes memory and performance issues. See Cardinality. |
| Chaos engineering | The practice of deliberately injecting failures to verify system resilience. See Chaos Engineering. |
| Counter | A Prometheus metric type that only increases (or resets to zero). Used for totals like request count, bytes transferred. Always use rate() on counters. |
| Dashboard | A visual display of metrics and logs, typically in Grafana. Dashboards provide real-time operational visibility into platform health. |
| Error budget | The amount of unreliability permitted by the SLO. Calculated as 100% - SLO target. See Error Budgets. |
| Game day | A planned exercise where the team practices incident response against a simulated failure. See Game Days. |
| Gauge | A Prometheus metric type that can go up or down. Used for current values like memory usage, temperature, active connections. |
| Golden signals | The four key metrics for any service: latency, traffic, errors, saturation. See Four Golden Signals. |
| Grafana | A visualisation platform for creating dashboards from Prometheus, Loki, and other data sources. |
| Histogram | A Prometheus metric type that counts observations in configurable buckets. Used for latency distributions and calculated with histogram_quantile(). |
| LogQL | The query language for Grafana Loki. Uses label selectors and pipeline stages to filter and aggregate log lines. See Loki & LogQL. |
| Loki | A log aggregation system by Grafana Labs. Indexes log metadata (labels) rather than full text, making it efficient for Kubernetes log storage. |
| MTBF (Mean Time Between Failures) | The average time between consecutive failures. MTBF = MTTF + MTTR. See Incident Metrics. |
| MTTF (Mean Time to Failure) | The average time a system runs before failing. See Incident Metrics. |
| MTTR (Mean Time to Recovery) | The average time from incident detection to service restoration. The most actionable reliability metric. See Incident Metrics. |
| Observability | The ability to understand a system's internal state from its external outputs (metrics, logs, traces). The combination of Prometheus, Loki, and Grafana provides observability for RCIIS. |
| On-call | A rotation where designated team members are available to respond to incidents outside business hours. |
| P50 / P95 / P99 / P999 | Latency percentiles — the maximum latency experienced by 50%, 95%, 99%, or 99.9% of requests. See Latency Percentiles. |
| Post-mortem | A blameless document written after an incident that describes the timeline, root cause, impact, and preventive action items. See Post-Mortems. |
| Prometheus | An open-source monitoring system that collects metrics via a pull model, stores them in a time-series database, and evaluates alert rules. |
| PromQL | The query language for Prometheus. Used in dashboards, alerts, and recording rules. See Prometheus & PromQL. |
| Recording rule | A PromQL expression that is pre-computed at regular intervals and stored as a new time series. Improves query performance for dashboards and alerts. |
| RED method | A monitoring framework for request-driven services: Rate, Errors, Duration. See RED Method. |
| Runbook | A documented procedure linked to an alert that describes how to diagnose and resolve the issue. See Alerting Best Practices. |
| Scrape | The process of Prometheus pulling metrics from a target's /metrics endpoint at a configured interval. |
| ServiceMonitor | A Prometheus Operator CRD that defines how Prometheus should scrape metrics from a Kubernetes Service. |
| SLA (Service Level Agreement) | A contractual commitment guaranteeing a minimum level of service reliability, with consequences for breach. See Service Levels. |
| SLI (Service Level Indicator) | A quantitative measurement of service performance (e.g., error rate, latency). See Service Levels. |
| SLO (Service Level Objective) | A target value for an SLI over a time window (e.g., "99.9% availability over 30 days"). See Service Levels. |
| SRE (Site Reliability Engineering) | A discipline that applies software engineering practices to infrastructure and operations. Originated at Google. See SRE & Observability Concepts. |
| Toil | Repetitive, manual, automatable operational work that does not provide enduring value. See Toil & Automation. |
| USE method | A monitoring framework for infrastructure resources: Utilization, Saturation, Errors. See USE Method. |
| VPA (Vertical Pod Autoscaler) | A Kubernetes component that recommends or automatically adjusts pod CPU and memory requests based on observed usage. Goldilocks uses VPA in recommendation mode. |
| Whitebox monitoring | Monitoring a service from the inside — using internal metrics, logs, and instrumentation. Prometheus scraping application metrics is whitebox monitoring. |
GitOps & CI/CD¶
| Term | Definition |
|---|---|
| Age | A modern file encryption tool used with SOPS for encrypting Kubernetes secrets. Simpler than GPG with smaller keys. |
| Kustomization (FluxCD) | A FluxCD resource that defines a desired state (source repo + target cluster + reconciliation interval) and continuously reconciles the cluster to match. |
| HelmRelease | A FluxCD resource that declaratively manages Helm chart installations with automated upgrades, rollbacks, and drift detection. |
| FluxCD | A declarative GitOps continuous delivery tool for Kubernetes. Watches Git repositories and automatically syncs cluster state to match. |
| Argo Rollouts | A Kubernetes controller for progressive delivery strategies — canary deployments, blue-green deployments, and analysis-driven rollbacks. |
| Blue-green deployment | A release strategy that runs two identical environments (blue = current, green = new) and switches traffic atomically. Zero-downtime but requires double the resources. |
| Canary deployment | A release strategy that routes a small percentage of traffic to the new version, gradually increasing if metrics are healthy. Detected regressions trigger automatic rollback. |
| CI/CD | Continuous Integration (automated build + test) and Continuous Delivery/Deployment (automated release to environments). |
| Declarative | A configuration style that describes the desired end state rather than the steps to reach it. Kubernetes manifests and FluxCD Kustomizations are declarative. |
| Drift | When the actual state of the cluster diverges from the desired state defined in Git. FluxCD detects and can auto-correct drift via continuous reconciliation. |
| GitOps | An operational model where Git is the single source of truth for infrastructure and application configuration. Changes are applied via pull requests and automated reconciliation. |
| Harbor | An open-source container registry with vulnerability scanning, image signing, and replication. The RCIIS private registry for Helm charts and container images. |
| Helm chart | A package of Kubernetes YAML templates with a values.yaml file for configuration. Charts are versioned and stored in registries (Harbor, OCI). |
| Idempotent | An operation that produces the same result regardless of how many times it is applied. kubectl apply and Helm upgrades are idempotent. |
| KSOPS | A Kustomize plugin that integrates SOPS decryption into the Kustomize build process. Enables encrypted secrets in GitOps workflows. |
| Multi-source Kustomization | A FluxCD Kustomization that references multiple sources — typically a HelmRepository for the chart and a GitRepository for values overrides. |
| OCI (Open Container Initiative) | A set of standards for container image formats and registries. Helm charts can be stored as OCI artifacts in registries like Harbor. |
| Progressive delivery | A release strategy that gradually shifts traffic to a new version based on real-time metrics. Canary and blue-green are types of progressive delivery. |
| Reconciliation | The process of comparing actual state to desired state and making corrections. FluxCD reconciles at a configurable interval (default 10 minutes). |
| Renovate | An automated dependency update tool that creates pull requests when new versions of Helm charts, container images, or other dependencies are available. |
| Dependency ordering | A FluxCD Kustomization feature that uses dependsOn to ensure resources are deployed in the correct order across environments. |
| Drift detection | A FluxCD feature that automatically reverts manual changes to cluster resources, ensuring the Git-defined state is maintained when prune and force are enabled. |
| SOPS | Secrets OPerationS — a tool for encrypting/decrypting files using Age, PGP, or KMS keys. Only values are encrypted; keys and metadata remain readable. |
| Reconciliation (FluxCD) | The process of applying the desired state from Git to the cluster. Runs automatically at the configured interval, with options for pruning deleted resources. |
| DependsOn | A deployment ordering mechanism in FluxCD Kustomizations. Resources with dependencies wait for their prerequisites to become ready before deploying. |
Infrastructure & IaC¶
| Term | Definition |
|---|---|
| AMI (Amazon Machine Image) | A pre-built virtual machine image for AWS EC2. The Talos AMI provides a pre-installed Talos Linux image for each AWS region. |
| BMC / IPMI | Baseboard Management Controller / Intelligent Platform Management Interface — remote management hardware on servers for power control, console access, and monitoring without an OS. |
| Bare metal | Physical servers without virtualisation. Provides maximum performance but requires manual hardware management, PXE booting, and physical access for maintenance. |
| Cloud-init | A tool for initialising cloud instances on first boot — setting hostname, SSH keys, network config. Used by some Proxmox and cloud deployments; Talos uses its own machine config instead. |
| Control plane | The set of Kubernetes components (API server, etcd, scheduler, controller manager) that manage cluster state. Runs on dedicated nodes in production. |
| Day-0 / Day-1 / Day-2 | Lifecycle phases — Day-0 is planning and design, Day-1 is initial deployment, Day-2 is ongoing operations. See Reliability Engineering Practices. |
| EC2 | Amazon Elastic Compute Cloud — virtual machine instances in AWS. RCIIS uses EC2 for control plane and worker nodes on AWS. |
| Terraform | An Infrastructure as Code (IaC) tool by HashiCorp that provisions and manages cloud and on-premises infrastructure using declarative HCL configuration files. Used in RCIIS for provisioning AWS and Proxmox infrastructure. |
| HA (High Availability) | A design approach that eliminates single points of failure through redundancy, replication, and automatic failover. Requires 3+ nodes. |
| IaC (Infrastructure as Code) | Managing infrastructure through version-controlled configuration files rather than manual processes. Terraform, HCL, and Helm are all IaC tools. |
| iDRAC / iLO | Dell (iDRAC) and HPE (iLO) implementations of BMC for remote server management. Accessed via web UI or CLI for power, console, and hardware monitoring. |
| Image Factory | A Talos service that builds custom Talos Linux images with specific system extensions (e.g., iscsi-tools, qemu-guest-agent) baked in. |
| Immutable OS | An operating system with a read-only root filesystem that cannot be modified at runtime. Talos Linux is immutable — configuration is applied declaratively via machine config. |
| iPXE / PXE | Network boot protocols for loading an OS image from a server over the network. Used for bare metal Talos installations. |
| Kind | Kubernetes in Docker — a tool for running local Kubernetes clusters using Docker containers as nodes. Used for development and testing. |
| Machine config | The YAML configuration file that defines a Talos node's identity, networking, disk encryption, and cluster membership. Applied at boot or via talosctl apply-config. |
| NTP | Network Time Protocol — synchronises system clocks across servers. Critical for TLS certificate validation, log correlation, and distributed consensus. |
| NVMe / SSD / HDD | Storage device types — NVMe (fastest, PCIe-attached), SSD (fast, SATA/SAS), HDD (slowest, mechanical). NVMe is recommended for Ceph OSDs. |
| OOB (Out-of-Band) | Management access to a server that is independent of the main operating system — typically via IPMI/BMC over a dedicated management network. |
| HCL (HashiCorp Configuration Language) | The declarative configuration language used by Terraform for defining infrastructure resources. Supports variables, modules, and expressions. |
| Proxmox VE | An open-source virtualisation platform based on KVM and LXC. Used as an alternative to cloud providers for running Kubernetes VMs on-premises. |
| RAID | Redundant Array of Independent Disks — combines multiple disks for redundancy or performance. Hardware RAID is configured in BIOS/UEFI before OS installation. |
| Schematic | A Talos configuration that defines which system extensions to include in a custom Talos image. Submitted to Image Factory to produce a downloadable image. |
| System extension | A read-only overlay that adds functionality to Talos Linux (e.g., iscsi-tools, qemu-guest-agent, bpf for Falco). Baked into the image at build time. |
| Talos Linux | A minimal, immutable, API-managed Linux distribution designed exclusively for running Kubernetes. There is no SSH, shell, or package manager — all management is via the Talos API. |
| talosctl | The CLI for managing Talos Linux nodes — applying config, upgrading, reading logs, and accessing etcd. Authenticated via mTLS certificates. |
| Talhelper | A helper tool that generates Talos machine configurations from a single talconfig.yaml definition, simplifying multi-node config management. |
| Worker node | A Kubernetes node that runs application workloads (pods). Does not run control plane components (on dedicated control plane nodes). |
Domain & Acronyms¶
| Term | Definition |
|---|---|
| APISIX | Apache APISIX — a cloud-native API gateway used as the primary ingress and traffic management layer for RCIIS application services. |
| CloudNativePG (CNPG) | A Kubernetes operator for managing PostgreSQL clusters with automated failover, backup, and recovery. Used for Keycloak and RCIIS application databases. |
| DR (Disaster Recovery) | The process and procedures for restoring service after a catastrophic failure. Includes backups (Velero, CNPG), geo-load balancing, and documented recovery runbooks. |
| EAC | East African Community — the regional intergovernmental organisation for which the RCIIS platform is built. Partner states include Kenya, Tanzania, Uganda, Rwanda, Burundi, DRC, South Sudan, and Somalia. |
| ESB (Enterprise Service Bus) | A middleware pattern for integrating applications via a central message broker. The RCIIS ESB handles customs data exchange between partner states. |
| Fluent Bit | A lightweight log processor and forwarder deployed as a DaemonSet. Collects container logs and ships them to Loki. |
| Goldilocks | A tool that runs VPA in recommendation mode and provides a dashboard showing right-sizing suggestions for pod resource requests and limits. |
| Keepalived | A Linux daemon that provides VRRP-based failover for virtual IPs. Used alongside HAProxy for load balancer HA on bare metal. |
| HAProxy | A high-performance TCP/HTTP load balancer. Used for Kubernetes API load balancing on bare metal and Proxmox deployments. |
| Kafka | Apache Kafka — a distributed event streaming platform. Deployed via Strimzi in RCIIS for asynchronous customs data processing. |
| Partner state | A member country of the EAC that participates in the RCIIS customs interconnectivity system. |
| PITR (Point-in-Time Recovery) | Restoring a database to a specific moment by replaying WAL segments from a backup. CloudNativePG supports PITR for PostgreSQL. |
| RCIIS | Regional Customs Interconnectivity Integration System — the platform being deployed by this documentation. Connects customs systems of EAC partner states. |
| RPO (Recovery Point Objective) | The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means backups must be no more than 1 hour old. |
| RTO (Recovery Time Objective) | The maximum acceptable time to restore service after a failure. An RTO of 30 minutes means the service must be back within 30 minutes. |
| SIEM (Security Information and Event Management) | A system that aggregates and analyses security events from multiple sources. Falco and Tracee events can be forwarded to a SIEM. |
| Strimzi | A Kubernetes operator for running Apache Kafka clusters. Manages brokers, topics, users, and connectors declaratively via CRDs. |
| Velero | A Kubernetes backup and disaster recovery tool that backs up cluster resources and PersistentVolumes to S3-compatible storage. |
| WAL (Write-Ahead Log) | A database transaction log where changes are written before being applied to data files. PostgreSQL WAL segments are archived by CNPG for continuous backup and PITR. |
| WCO SAFE Framework | World Customs Organization Framework of Standards to Secure and Facilitate Global Trade — an international standard relevant to RCIIS compliance. |