5.2.3 Vulnerability Scanning¶
CI pipeline scanning catches vulnerabilities before deployment, but it cannot detect CVEs discovered after an image is deployed, configuration drift, or secrets accidentally baked into images. The Trivy Operator runs inside the cluster and continuously scans running workloads, generating reports for container image vulnerabilities, Kubernetes misconfigurations, and exposed secrets.
How to use this page
Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.
All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.
- Using the existing
rciis-devopsrepository: All files already exist. Skip themkdirandgit add/git commitcommands — they are for users building a new repository. Simply review the files, edit values for your environment, and push. - Building a new repository from scratch: Follow the
mkdir, file creation, andgitcommands in order. - No Git access: Expand the "Alternative: Helm CLI" block under each Install section.
Trivy Operator¶
Trivy Operator continuously scans running workloads and generates reports for container image vulnerabilities (CVEs in OS packages and language dependencies), Kubernetes resource misconfigurations (Pod Security Standards, CIS benchmarks), and exposed secrets (API keys, passwords, tokens leaked into container images).
What Trivy Operator Scans¶
| Report Type | CRD | What It Detects |
|---|---|---|
| Vulnerability | VulnerabilityReport |
CVEs in container images (OS packages and language dependencies) |
| Config Audit | ConfigAuditReport |
Kubernetes resource misconfigurations (Pod Security Standards, CIS benchmarks) |
| Exposed Secret | ExposedSecretReport |
Secrets (API keys, tokens, passwords) leaked into container images |
| Infra Assessment | InfraAssessmentReport |
Misconfigurations in Kubernetes infrastructure components (API server, kubelet) |
| RBAC Assessment | RbacAssessmentReport |
Overly permissive RBAC roles and bindings |
Install¶
The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).
Create the base directory and file:
| Field | Value | Explanation |
|---|---|---|
chart |
trivy-operator |
The Helm chart name from the Aqua Security registry |
version |
0.27.0 |
Pinned chart version — update this to upgrade Trivy Operator |
sourceRef.name |
aquasecurity |
References a HelmRepository CR pointing to https://aquasecurity.github.io/helm-charts |
targetNamespace |
trivy-system |
Trivy Operator runs in trivy-system for isolation from workloads |
remediation.retries |
3 |
Flux retries up to 3 times if the install or upgrade fails |
Save the following as flux/infra/base/trivy-operator.yaml:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: trivy-operator
namespace: flux-system
spec:
targetNamespace: trivy-system
interval: 30m
chart:
spec:
chart: trivy-operator
version: "0.27.0"
sourceRef:
kind: HelmRepository
name: aquasecurity
namespace: flux-system
releaseName: trivy-operator
install:
createNamespace: true
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
operator:
scanJobsConcurrentLimit: 3
scanJobsRetryDelay: 30s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH,MEDIUM
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: "0 2 * * 0"
scanJob:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
Alternative: Helm CLI
If you do not have Git access, install Trivy Operator directly:
Configuration¶
The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Trivy Operator behaves. Select your environment below.
Create the environment overlay directory:
Environment Patch¶
The patch file optimizes scan concurrency, resource limits, and severity filtering for the target environment.
Save the following as the patch file for your environment:
On AWS, scan resources are constrained to reduce load on shared EKS clusters. Concurrency is limited to 1 scan at a time, retry delay is increased, and resource requests are reduced.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: trivy-operator
spec:
values:
operator:
scanJobsConcurrentLimit: 1
scanJobsRetryDelay: 60s
trivy:
severity: CRITICAL,HIGH
scanJob:
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
| Setting | Value | Why |
|---|---|---|
scanJobsConcurrentLimit |
1 |
Single concurrent scan reduces CPU contention on shared EKS nodes |
scanJobsRetryDelay |
60s |
Longer delay between retries reduces load spikes |
severity |
CRITICAL,HIGH |
Reduces alert noise — only high-risk findings are reported |
scanJob.resources |
50m/128Mi → 250m/256Mi | Tighter resource constraints for cost optimization |
On Bare Metal, scan resources can be more generous since the cluster is dedicated. Higher concurrency and full severity reporting provide better coverage.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: trivy-operator
spec:
values:
# Default values from base — no overrides needed
# Base already optimized for bare metal:
# scanJobsConcurrentLimit: 3
# severity: CRITICAL,HIGH,MEDIUM
# Larger resources for full forensic scanning
On Bare Metal, scan resources can be more generous since the cluster is dedicated. Higher concurrency and full severity reporting provide better coverage.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: trivy-operator
spec:
values:
# Default values from base — no overrides needed
# Base already optimized for bare metal:
# scanJobsConcurrentLimit: 3
# severity: CRITICAL,HIGH,MEDIUM
# Larger resources for full forensic scanning
Helm Values¶
The values file controls Trivy Operator's core features: scan concurrency, severity levels, timeout thresholds, and database sources. Select your environment and deployment size:
# Trivy Operator — AWS Non-HA configuration
operator:
scanJobsConcurrentLimit: 1
scanJobsRetryDelay: 60s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: ""
scanJob:
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
# Trivy Operator — Bare Metal HA configuration
operator:
scanJobsConcurrentLimit: 3
scanJobsRetryDelay: 30s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH,MEDIUM
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: "0 2 * * 0"
scanJob:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
# Trivy Operator — Bare Metal Non-HA configuration
operator:
scanJobsConcurrentLimit: 1
scanJobsRetryDelay: 60s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: ""
scanJob:
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
# Trivy Operator — Bare Metal HA configuration
operator:
scanJobsConcurrentLimit: 3
scanJobsRetryDelay: 30s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH,MEDIUM
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: "0 2 * * 0"
scanJob:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
# Trivy Operator — Bare Metal Non-HA configuration
operator:
scanJobsConcurrentLimit: 1
scanJobsRetryDelay: 60s
privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
trivy:
dbRepository: ghcr.io/aquasecurity/trivy-db
javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
severity: CRITICAL,HIGH
timeout: 10m0s
ignoreUnfixed: false
compliance:
cron: ""
scanJob:
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
serviceMonitor:
enabled: true
interval: 60s
labels:
release: prometheus
Key settings (all environments):
| Setting | HA | Non-HA | Why |
|---|---|---|---|
scanJobsConcurrentLimit |
3 |
1 |
HA clusters can handle parallel scans; non-HA reduces resource contention |
severity |
CRITICAL,HIGH,MEDIUM |
CRITICAL,HIGH |
HA environments report all issues; non-HA focuses on critical findings |
compliance.cron |
"0 2 * * 0" |
"" |
HA weekly CIS benchmarks; non-HA disables scheduled compliance |
scanJob.resources |
100m/256Mi → 500m/512Mi | 50m/128Mi → 250m/256Mi | Resource scaling matches cluster capacity |
dbRepository |
Aqua Security registry (ghcr.io) | Aqua Security registry (ghcr.io) | External CVE database — pulled on-demand during scans |
Extra Manifests¶
Save the following additional manifests for your environment:
No extra manifests required for AWS. The patch and values files are sufficient.
On Bare Metal, deploy a PrometheusRule for Trivy Operator alerting.
Save this as flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: trivy-operator-alerts
namespace: trivy-system
labels:
release: prometheus
spec:
groups:
- name: trivy-operator
rules:
- alert: CriticalVulnerabilityFound
expr: |
trivy_image_vulnerabilities{severity="Critical"} > 0
for: 1h
labels:
severity: critical
annotations:
summary: "Critical CVE in {{ $labels.namespace }}/{{ $labels.resource_name }}"
description: >
Image {{ $labels.image_ref }} has {{ $value }} critical
vulnerabilities. Review the VulnerabilityReport and
schedule remediation.
runbook_url: "https://docs.rciis.africa/05-secure/troubleshooting/#trivy-operator"
- alert: TrivyScanNotRunning
expr: |
time() - trivy_operator_build_info > 3600
for: 30m
labels:
severity: warning
annotations:
summary: "Trivy Operator may not be running"
description: >
The Trivy Operator build info metric has not been updated
in over 1 hour. Scans may not be running.
- alert: TrivyScanJobFailure
expr: |
increase(trivy_operator_scan_jobs_total{result="failure"}[1h]) > 3
for: 10m
labels:
severity: warning
annotations:
summary: "Trivy scan jobs failing"
description: >
More than 3 Trivy scan jobs have failed in the last hour.
Check scan job logs for resource or connectivity issues.
- alert: TrivyDBStale
expr: |
(time() - trivy_operator_vulnerability_db_last_update_timestamp) > 172800
for: 1h
labels:
severity: warning
annotations:
summary: "Trivy vulnerability database not updated in 48 hours"
description: >
The Trivy vulnerability database has not been updated in over
48 hours. New CVEs will not be detected. Check network
connectivity to ghcr.io/aquasecurity/trivy-db.
On Bare Metal, deploy a PrometheusRule for Trivy Operator alerting.
Save this as flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: trivy-operator-alerts
namespace: trivy-system
labels:
release: prometheus
spec:
groups:
- name: trivy-operator
rules:
- alert: CriticalVulnerabilityFound
expr: |
trivy_image_vulnerabilities{severity="Critical"} > 0
for: 1h
labels:
severity: critical
annotations:
summary: "Critical CVE in {{ $labels.namespace }}/{{ $labels.resource_name }}"
description: >
Image {{ $labels.image_ref }} has {{ $value }} critical
vulnerabilities. Review the VulnerabilityReport and
schedule remediation.
runbook_url: "https://docs.rciis.africa/05-secure/troubleshooting/#trivy-operator"
- alert: TrivyScanNotRunning
expr: |
time() - trivy_operator_build_info > 3600
for: 30m
labels:
severity: warning
annotations:
summary: "Trivy Operator may not be running"
description: >
The Trivy Operator build info metric has not been updated
in over 1 hour. Scans may not be running.
- alert: TrivyScanJobFailure
expr: |
increase(trivy_operator_scan_jobs_total{result="failure"}[1h]) > 3
for: 10m
labels:
severity: warning
annotations:
summary: "Trivy scan jobs failing"
description: >
More than 3 Trivy scan jobs have failed in the last hour.
Check scan job logs for resource or connectivity issues.
- alert: TrivyDBStale
expr: |
(time() - trivy_operator_vulnerability_db_last_update_timestamp) > 172800
for: 1h
labels:
severity: warning
annotations:
summary: "Trivy vulnerability database not updated in 48 hours"
description: >
The Trivy vulnerability database has not been updated in over
48 hours. New CVEs will not be detected. Check network
connectivity to ghcr.io/aquasecurity/trivy-db.
Commit and Deploy¶
Once all files are in place, commit and push to trigger Flux deployment:
Flux will detect the new commit and begin deploying Trivy Operator. To trigger an immediate sync instead of waiting for the next poll interval:
Verify¶
After Trivy Operator is deployed, confirm it is working:
# Check the operator pod is running
kubectl -n trivy-system get pods
# Expected: trivy-operator pod Running
# Wait for initial scans to complete (2-5 minutes after install)
kubectl get vulnerabilityreports -A --no-headers | wc -l
# Expected: >0 reports
# Check config audit reports
kubectl get configauditreports -A --no-headers | wc -l
# Expected: >0 reports
# Verify the operator is healthy
kubectl -n trivy-system logs deployment/trivy-operator --tail=10
# Expected: No errors, scan jobs completing
List vulnerability reports:
# All vulnerability reports across the cluster
kubectl get vulnerabilityreports -A
# Reports with CRITICAL findings
kubectl get vulnerabilityreports -A -o json | \
jq -r '.items[] | select(.report.summary.criticalCount > 0) |
"\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]) — CRITICAL: \(.report.summary.criticalCount)"'
List config audit reports:
# All config audit failures
kubectl get configauditreports -A
# Show FAIL results
kubectl get configauditreports -A -o json | \
jq -r '.items[] | select(.report.summary.criticalCount > 0 or .report.summary.highCount > 0) |
"\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]) — CRITICAL: \(.report.summary.criticalCount), HIGH: \(.report.summary.highCount)"'
Flux Operations¶
This component is managed by Flux as HelmRelease trivy-operator and Kustomization infra-trivy-operator.
Check whether the HelmRelease and Kustomization are in a Ready state:
Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:
Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:
View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:
Recovering a stalled HelmRelease
If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:
flux suspend helmrelease trivy-operator -n flux-system
flux resume helmrelease trivy-operator -n flux-system
flux reconcile kustomization infra-trivy-operator -n flux-system
Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.
Next Steps¶
Proceed to runtime threat detection and behavioral monitoring: