Skip to content

8.2 Security Audit

Before handing over the environment, verify that every security control is deployed, configured correctly, and functioning. This audit should be performed after all Phase 5 (Install Platform Services) security tools are installed and before external access is enabled.

Pre-Handover Security Checklist

# Check Tool Pass Criteria Common Failure Modes
1 Kyverno admission controller is running Kyverno 3 replicas healthy, webhook registered Pods pending due to resource limits; webhook not recreated after restart
2 Policy-violating resources are blocked Kyverno Test pod with privileged=true is rejected Policy in Audit mode instead of Enforce; namespace excluded from webhook
3 Unsigned images are rejected cosign + Kyverno Test pod with unsigned image is denied by admission webhook Public key mismatch; webhookTimeoutSeconds too low for signature verification
4 Trivy Operator is scanning Trivy VulnerabilityReports exist for all namespaces Scan jobs stuck Pending (insufficient resources); DB download failure
5 No CRITICAL CVEs in production images Trivy criticalCount = 0 across all VulnerabilityReports New CVE disclosed after last scan; image not yet rescanned
6 Falco is detecting runtime events Falco Test shell exec triggers alert eBPF probe load failure on Talos; modern_ebpf driver not set
7 Tracee eBPF programs are loaded Tracee DaemonSet running, events captured hostPID not enabled; kernel BTF not available
8 Keycloak OIDC is functional Keycloak Weave GitOps SSO login succeeds Realm not configured; TLS certificate mismatch
9 Kubernetes OIDC auth works Keycloak kubectl with OIDC token succeeds API server OIDC flags not set; issuer URL mismatch
10 HSM is connected (if applicable) HSM cert-manager issues a test certificate via HSM PKCS#11 library path incorrect; HSM partition locked
11 Encryption at rest is enabled Talos/KMS Disk encryption verified per model Encryption key not provisioned; wrong partition encrypted
12 Network policies are enforced Cilium Default-deny policies exist, test cross-namespace traffic is blocked Kyverno generate policy not active; Cilium policy enforcement disabled
13 RBAC is properly scoped Kubernetes No cluster-admin bindings for application service accounts Helm chart installs ClusterRoleBinding with excessive permissions
14 CIS Kubernetes Benchmark scan passes Trivy No CRITICAL findings in clustercompliancereports Compliance scanning not enabled; scan not yet run
15 All required Kyverno policies in Enforce mode Kyverno Image allow-list, pod security, resource limits policies are Enforce Policies still in Audit from initial rollout; PolicyExceptions masking issues

Verification Procedures

1. Kyverno

Confirm the admission controller is running:

kubectl -n kyverno get pods
# Expect: 3 admission-controller pods Running, 2 background-controller pods Running

kubectl get validatingwebhookconfigurations | grep kyverno
# Expect: kyverno-resource-validating-webhook-cfg

Test policy enforcement:

# Create a policy-violating pod (should be rejected)
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: security-audit-test-privileged
  namespace: default
spec:
  containers:
    - name: test
      image: harbor.devops.africa/rciis/test:latest
      securityContext:
        privileged: true
EOF
# Expected: Error from server: admission webhook denied the request

# Clean up
kubectl delete pod security-audit-test-privileged --ignore-not-found

Review policy reports:

# Check for existing violations in audit-mode policies
kubectl get policyreports -A --no-headers | wc -l
kubectl get clusterpolicyreports

# Detailed violations
kubectl get policyreports -A -o json | \
  jq -r '.items[] | select(.summary.fail > 0) |
    "\(.metadata.namespace): \(.summary.fail) failures"'

2. Image Signature Verification

Verify the image verification policy is active:

kubectl get clusterpolicy verify-image-signatures -o yaml | grep validationFailureAction
# Expected: validationFailureAction: Enforce

Test that unsigned images are rejected:

# Try deploying an unsigned image from Harbor (should be rejected)
kubectl run audit-unsigned-test \
  --image=harbor.devops.africa/rciis/test:unsigned \
  --restart=Never 2>&1
# Expected: Error from server: admission webhook denied the request:
#   image signature verification failed

# Clean up
kubectl delete pod audit-unsigned-test --ignore-not-found

Verify a signed image is accepted:

# Verify a signature exists on a deployed image
cosign verify --key cosign.pub harbor.devops.africa/rciis/myapp:v1.2.3
# Expected: Verification for harbor.devops.africa/rciis/myapp:v1.2.3 --
#   The following checks were performed:
#   - The cosign claims were validated
#   - The signatures were verified against the specified public key

3. Trivy Operator

Confirm the operator is running:

kubectl -n trivy-system get pods
# Expect: trivy-operator pod Running

Check scan coverage:

# List all vulnerability reports
kubectl get vulnerabilityreports -A --no-headers | wc -l

# Check for CRITICAL findings
kubectl get vulnerabilityreports -A -o json | \
  jq -r '.items[] | select(.report.summary.criticalCount > 0) |
    "\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]): \(.report.summary.criticalCount) CRITICAL"'
# Expected: No output (zero critical vulnerabilities)

Check config audit:

kubectl get configauditreports -A -o json | \
  jq -r '.items[] | select(.report.summary.criticalCount > 0) |
    "\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]): \(.report.summary.criticalCount) CRITICAL misconfigs"'

Check for exposed secrets in images:

kubectl get exposedsecretreports -A
# Expected: No reports, or reports with targetCount = 0

4. Falco

Confirm Falco is running with eBPF driver:

kubectl -n falco get pods
# Expect: One Falco pod per node (DaemonSet)

kubectl -n falco logs -l app.kubernetes.io/name=falco | grep -i "driver"
# Expected: "eBPF probe loaded successfully" or similar

Trigger a test alert:

# Exec into a test pod — should trigger "Shell Spawned in Container"
kubectl run security-audit-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo audit-test && exit"

# Check Falco logs for the alert
kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=30 | grep -i "shell"
# Expected: Alert line containing "Shell spawned in container"

Verify Falcosidekick is forwarding alerts:

kubectl -n falco get pods -l app.kubernetes.io/name=falcosidekick
# Expect: Running

# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs

5. Tracee

Confirm Tracee DaemonSet is running:

kubectl -n tracee get ds
# Expect: DESIRED = CURRENT = READY (one per node)

kubectl -n tracee logs -l app.kubernetes.io/name=tracee --tail=10
# Expected: No errors, events being captured

Verify eBPF programs loaded:

kubectl -n tracee logs -l app.kubernetes.io/name=tracee | grep -i "loaded\|attached"
# Expected: Multiple lines confirming eBPF program attachment

6. Keycloak

Confirm the Keycloak Operator and instance are running:

# Check the operator pod
kubectl -n keycloak get pods -l app.kubernetes.io/name=keycloak-operator
# Expected: 1 pod Running

# Check the Keycloak CR status
kubectl -n keycloak get keycloak rciis-keycloak
# Expected: READY = true

# Check the Keycloak instance pods
kubectl -n keycloak get pods -l app=keycloak
# Expected: 2 pods Running (HA)

# Test OIDC discovery endpoint
curl -s https://auth.rciis.eac.int/realms/rciis/.well-known/openid-configuration | jq .issuer
# Expected: "https://auth.rciis.eac.int/realms/rciis"

Test Weave GitOps SSO login:

# Access the Weave GitOps dashboard and verify OIDC login via Keycloak
# Expected: Browser opens, Keycloak login page appears, login succeeds

Test Kubernetes OIDC auth:

# Requires kubelogin/oidc-login kubectl plugin
kubectl oidc-login get-token \
  --oidc-issuer-url=https://auth.rciis.eac.int/realms/rciis \
  --oidc-client-id=kubernetes
# Expected: Token retrieved, kubectl commands work with OIDC identity

Verify RBAC mapping:

# As an OIDC-authenticated user with platform-admin role
kubectl auth can-i '*' '*' --all-namespaces
# Expected: yes

# As an OIDC-authenticated user with auditor role
kubectl auth can-i get pods --all-namespaces
# Expected: yes
kubectl auth can-i delete pods --all-namespaces
# Expected: no

7. HSM (if applicable)

Verify HSM connectivity:

# Check CloudHSM cluster status
aws cloudhsmv2 describe-clusters \
  --query 'Clusters[0].State' --output text
# Expected: ACTIVE

# Verify KMS Custom Key Store is connected
aws kms describe-custom-key-stores \
  --query 'CustomKeyStores[0].ConnectionState' --output text
# Expected: CONNECTED
# Test PKCS#11 connectivity (from a pod with the PKCS#11 library)
pkcs11-tool --module /usr/lib/libCryptoki2.so --list-slots
# Expected: Slot listing showing the HSM partition

pkcs11-tool --module /usr/lib/libCryptoki2.so --list-objects --type privkey
# Expected: CA signing key and other managed keys listed
# Test PKCS#11 connectivity (from a pod with the PKCS#11 library)
pkcs11-tool --module /usr/lib/libCryptoki2.so --list-slots
# Expected: Slot listing showing the HSM partition

pkcs11-tool --module /usr/lib/libCryptoki2.so --list-objects --type privkey
# Expected: CA signing key and other managed keys listed

Test certificate issuance via HSM:

# Create a test certificate request
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: security-audit-test-cert
  namespace: default
spec:
  secretName: security-audit-test-tls
  issuerRef:
    name: hsm-ca-issuer
    kind: ClusterIssuer
  commonName: security-audit-test.rciis.eac.int
  dnsNames:
    - security-audit-test.rciis.eac.int
  duration: 1h
EOF

# Check certificate status
kubectl get certificate security-audit-test-cert
# Expected: READY = True

# Clean up
kubectl delete certificate security-audit-test-cert
kubectl delete secret security-audit-test-tls

8. Encryption at Rest

# Verify Talos disk encryption is active (run on each node)
talosctl -n <node-ip> get systemdisk
# Expected: STATE and EPHEMERAL partitions show encryption enabled

9. Network Policies

# Verify default-deny policies exist in application namespaces
kubectl get networkpolicies -A | grep default-deny
# Expected: default-deny-all in each namespace (generated by Kyverno)

# Test cross-namespace traffic is blocked
kubectl run nettest --image=busybox --rm -it --restart=Never -n default -- \
  wget -qO- --timeout=3 http://keycloak-http.keycloak:80
# Expected: timeout (blocked by default-deny)

10. RBAC Audit

# Check for overly permissive ClusterRoleBindings
kubectl get clusterrolebindings -o json | \
  jq -r '.items[] | select(.roleRef.name == "cluster-admin") |
    "\(.metadata.name): \(.subjects[].kind)/\(.subjects[].name)"'
# Review: Only system accounts and platform-admin OIDC group should have cluster-admin

11. CIS Kubernetes Benchmark

# Run a CIS compliance scan (if not already scheduled)
# Trivy Operator runs this automatically if configured in Helm values

# Check compliance report exists
kubectl get clustercompliancereports
# Expected: k8s-cis report exists

# Review results
kubectl get clustercompliancereports k8s-cis -o json | \
  jq '.status.summaryReport | {failCount, passCount}'
# Expected: failCount = 0 for CRITICAL findings

# List all failing controls
kubectl get clustercompliancereports k8s-cis -o json | \
  jq -r '.status.summaryReport.controlCheck[] |
    select(.totalFail > 0) | "\(.id) \(.name): \(.totalFail) failures (\(.severity))"'

12. Kyverno Policy Enforcement Mode

# List all ClusterPolicies and their enforcement mode
kubectl get clusterpolicies -o custom-columns=\
  NAME:.metadata.name,\
  ACTION:.spec.validationFailureAction,\
  BACKGROUND:.spec.background

# Required policies in Enforce mode:
#   restrict-image-registries       → Enforce
#   enforce-pod-security-restricted → Enforce
#   require-resource-limits         → Enforce
#   verify-image-signatures         → Enforce (if cosign is configured)

# Verify no required policy is still in Audit
kubectl get clusterpolicies -o json | \
  jq -r '.items[] |
    select(.spec.validationFailureAction == "Audit") |
    "\(.metadata.name): WARNING — still in Audit mode"'

Audit Report

Document the results in a security audit report with:

  1. Date and auditor name
  2. Pass/fail status for each check above
  3. List of any exceptions or accepted risks (with rationale)
  4. Remediation plan for any failures
  5. Sign-off by the project security lead

This report forms part of the Handover Checklist.