Skip to content

9.2 Certificate Rotation

This page documents the certificate inventory for the RCIIS platform, rotation procedures, and integration with HSM-backed issuers.

Certificate Inventory

Certificate Issuer Validity Auto-Rotation Location
Kubernetes API server TLS Talos (internal CA) 1 year Yes (Talos manages) Talos machine secrets
Kubelet serving certificate Talos (internal CA) 1 year Yes (rotate-server-certificates: true) Talos machine secrets
etcd peer and client TLS Talos (internal CA) 1 year Yes (Talos manages) Talos machine secrets
Talos API mTLS Talos (internal CA) 10 years No (manual renewal) Talos machine secrets
Cilium agent certificates Cilium (internal CA) Configurable Yes (Cilium manages) Cilium secrets
Application ingress TLS cert-manager (Let's Encrypt or internal CA) 90 days (LE) / configurable Yes (cert-manager renews) Kubernetes Secrets
Keycloak ingress TLS cert-manager 90 days (LE) / configurable Yes (cert-manager renews) keycloak namespace Secret
Keycloak realm signing key Keycloak (internal or HSM) Configurable Manual rotation Keycloak database or HSM
Weave GitOps server TLS cert-manager 90 days (LE) / configurable Yes (cert-manager renews) flux-system namespace Secret

Talos-Managed Certificates

Talos automatically manages all control plane certificates (API server, kubelet, etcd). These rotate automatically and require no manual intervention under normal conditions.

Verify Certificate Expiry

# Check API server certificate
talosctl -n <control-plane-ip> get certificate apiserver

# Check all certificate statuses
talosctl -n <control-plane-ip> get certificates

Talos API Certificate Renewal

The Talos API certificate has a 10-year validity. If it approaches expiry or needs emergency rotation:

# Regenerate Talos secrets (includes new CA and certificates)
talhelper gensecret > talsecret.sops.yaml
sops -e -i talsecret.sops.yaml

# Regenerate machine configs with the new secrets
talhelper genconfig

# Apply new configs to each node (rolling)
talhelper gencommand apply --extra-flags="--mode=staged"
# Then reboot each node to pick up the new certificates
talosctl -n <node-ip> reboot

Warning

Regenerating Talos secrets creates a new CA. All nodes must be reconfigured with the new secrets. This is a disruptive operation — plan a maintenance window. See Talos Upgrades for the rolling update procedure.

cert-manager Managed Certificates

cert-manager handles automatic renewal for all application and infrastructure TLS certificates.

Check Certificate Status

# List all certificates and their status
kubectl get certificates -A

# Check a specific certificate
kubectl describe certificate <name> -n <namespace>

# Check upcoming renewals (certificates expiring within 30 days)
kubectl get certificates -A -o json | \
  jq -r '.items[] | select(.status.renewalTime != null) |
    "\(.metadata.namespace)/\(.metadata.name): renews at \(.status.renewalTime)"'

Manual Renewal

If a certificate needs immediate renewal:

# Delete the certificate secret — cert-manager will re-issue
kubectl delete secret <secret-name> -n <namespace>

# Or trigger renewal via cmctl
cmctl renew <certificate-name> -n <namespace>

Troubleshoot Failed Renewals

# Check cert-manager logs
kubectl -n cert-manager logs -l app=cert-manager --tail=50

# Check certificate request status
kubectl get certificaterequests -A
kubectl describe certificaterequest <name> -n <namespace>

# Check ACME order status (for Let's Encrypt)
kubectl get orders -A
kubectl get challenges -A

HSM-Backed Certificate Rotation

When cert-manager is configured with an HSM-backed issuer (see Key Management), certificate rotation involves the HSM for signing operations.

How It Works

  1. cert-manager detects a certificate approaching expiry
  2. cert-manager generates a new CSR (Certificate Signing Request)
  3. The CSR is sent to the HSM-backed issuer
  4. The HSM signs the certificate using the CA private key (which never leaves the HSM)
  5. The signed certificate is stored in the Kubernetes Secret

The HSM integration is transparent to the rotation process — cert-manager handles renewal automatically. The only difference is that signing operations are slower (network round-trip to HSM) compared to software-based signing.

Verify HSM-Backed Issuance

# Check the issuer status
kubectl get clusterissuer hsm-ca-issuer -o yaml

# Verify a certificate was signed by the HSM CA
kubectl get secret <tls-secret> -n <namespace> -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -text -noout | grep "Issuer:"
# Expected: Issuer matches the HSM CA certificate subject

CA Certificate Rotation

If the HSM-backed CA certificate itself needs rotation (e.g., approaching expiry or key compromise):

  1. Generate a new CA key pair in CloudHSM
  2. Create a new CA certificate (self-signed or signed by an external root)
  3. Update the cert-manager ClusterIssuer to reference the new CA certificate
  4. Re-issue all certificates signed by the old CA
# Update the CA certificate secret
kubectl create secret tls hsm-ca-certificate \
  --cert=new-ca-cert.pem \
  --key=/dev/null \     # Private key stays in HSM
  -n cert-manager \
  --dry-run=client -o yaml | kubectl apply -f -

# Trigger re-issuance of all certificates
cmctl renew --all-namespaces --all
  1. Perform a key ceremony to generate a new CA key pair in the HSM
  2. Create and sign the new CA certificate
  3. Update the cert-manager ClusterIssuer
  4. Re-issue all certificates

The key ceremony must follow the same formal process as the initial HSM setup — witnessed, documented, and recorded.

  1. Perform a key ceremony to generate a new CA key pair in the HSM
  2. Create and sign the new CA certificate
  3. Update the cert-manager ClusterIssuer
  4. Re-issue all certificates

The key ceremony must follow the same formal process as the initial HSM setup — witnessed, documented, and recorded.

Keycloak Signing Key Rotation

Keycloak uses RSA or EC keys to sign JWT tokens. These keys should be rotated periodically.

Rotate Realm Keys

  1. In the Keycloak admin console: Realm Settings > Keys > Providers
  2. Add a new key provider (RSA or EC) with a higher priority than the existing one
  3. The new key becomes the active signing key; the old key remains for verification of existing tokens
  4. After the old tokens expire (determined by token lifespan settings), remove the old key provider

HSM-Backed Key Rotation

If Keycloak signing keys are stored in the HSM:

  1. Generate a new signing key in the HSM via pkcs11-tool or the HSM vendor SDK
  2. Update the Keycloak PKCS#11 keystore configuration to reference the new key alias
  3. Restart Keycloak pods to pick up the new key
  4. The old key remains in the HSM for verification until all old tokens expire
  5. After the grace period, mark the old key as inactive in the HSM

Rotation Schedule

Certificate Type Rotation Frequency Method Downtime
Application TLS (Let's Encrypt) Every 60 days (auto) cert-manager auto-renewal None
Application TLS (internal CA) Configurable (recommend 90 days) cert-manager auto-renewal None
Keycloak realm signing key Every 6 months Manual via admin console or API None (graceful rotation)
HSM CA certificate Every 3–5 years Key ceremony + cert-manager update Brief (minutes) during rollout
Talos control plane certs Every year (auto) Talos auto-renewal None
Talos API certificate Every 10 years Manual secret regeneration Rolling reboot required

Monitoring

Set up alerts for certificate expiry:

cert-expiry-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: certificate-expiry-alerts
  namespace: cert-manager
spec:
  groups:
    - name: cert-manager
      rules:
        - alert: CertificateExpiringSoon
          expr: >
            certmanager_certificate_expiration_timestamp_seconds - time() < 7 * 24 * 3600
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Certificate {{ $labels.namespace }}/{{ $labels.name }} expires in < 7 days"

        - alert: CertificateExpiryCritical
          expr: >
            certmanager_certificate_expiration_timestamp_seconds - time() < 24 * 3600
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "Certificate {{ $labels.namespace }}/{{ $labels.name }} expires in < 24 hours"