5.2.3 Vulnerability Scanning¶

CI pipeline scanning catches vulnerabilities before deployment, but it cannot detect CVEs discovered after an image is deployed, configuration drift, or secrets accidentally baked into images. The Trivy Operator runs inside the cluster and continuously scans running workloads, generating reports for container image vulnerabilities, Kubernetes misconfigurations, and exposed secrets.

How to use this page

Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.

All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.

Using the existing rciis-devops repository: All files already exist. Skip the mkdir and git add/git commit commands — they are for users building a new repository. Simply review the files, edit values for your environment, and push.
Building a new repository from scratch: Follow the mkdir, file creation, and git commands in order.
No Git access: Expand the "Alternative: Helm CLI" block under each Install section.

Trivy Operator¶

Trivy Operator continuously scans running workloads and generates reports for container image vulnerabilities (CVEs in OS packages and language dependencies), Kubernetes resource misconfigurations (Pod Security Standards, CIS benchmarks), and exposed secrets (API keys, passwords, tokens leaked into container images).

What Trivy Operator Scans¶

Report Type	CRD	What It Detects
Vulnerability	`VulnerabilityReport`	CVEs in container images (OS packages and language dependencies)
Config Audit	`ConfigAuditReport`	Kubernetes resource misconfigurations (Pod Security Standards, CIS benchmarks)
Exposed Secret	`ExposedSecretReport`	Secrets (API keys, tokens, passwords) leaked into container images
Infra Assessment	`InfraAssessmentReport`	Misconfigurations in Kubernetes infrastructure components (API server, kubelet)
RBAC Assessment	`RbacAssessmentReport`	Overly permissive RBAC roles and bindings

Install¶

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base

Field	Value	Explanation
`chart`	`trivy-operator`	The Helm chart name from the Aqua Security registry
`version`	`0.27.0`	Pinned chart version — update this to upgrade Trivy Operator
`sourceRef.name`	`aquasecurity`	References a `HelmRepository` CR pointing to `https://aquasecurity.github.io/helm-charts`
`targetNamespace`	`trivy-system`	Trivy Operator runs in trivy-system for isolation from workloads
`remediation.retries`	`3`	Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/trivy-operator.yaml:

flux/infra/base/trivy-operator.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: trivy-operator
  namespace: flux-system
spec:
  targetNamespace: trivy-system
  interval: 30m
  chart:
    spec:
      chart: trivy-operator
      version: "0.27.0"
      sourceRef:
        kind: HelmRepository
        name: aquasecurity
        namespace: flux-system
  releaseName: trivy-operator
  install:
    createNamespace: true
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    operator:
      scanJobsConcurrentLimit: 3
      scanJobsRetryDelay: 30s
      privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}
    trivy:
      dbRepository: ghcr.io/aquasecurity/trivy-db
      javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
      severity: CRITICAL,HIGH,MEDIUM
      timeout: 10m0s
      ignoreUnfixed: false
    compliance:
      cron: "0 2 * * 0"
    scanJob:
      resources:
        requests:
          cpu: 100m
          memory: 256Mi
        limits:
          cpu: 500m
          memory: 512Mi
    serviceMonitor:
      enabled: true
      interval: 60s
      labels:
        release: prometheus

Alternative: Helm CLI

If you do not have Git access, install Trivy Operator directly:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts
helm repo update
helm upgrade --install trivy-operator aquasecurity/trivy-operator \
  --namespace trivy-system \
  --create-namespace \
  --version 0.27.0 \
  -f values.yaml

Configuration¶

The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Trivy Operator behaves. Select your environment below.

Create the environment overlay directory:

AWSBare MetalProxmox VMs

mkdir -p flux/infra/aws/trivy-operator

mkdir -p flux/infra/baremetal/trivy-operator

mkdir -p flux/infra/baremetal/trivy-operator

Environment Patch¶

The patch file optimizes scan concurrency, resource limits, and severity filtering for the target environment.

Save the following as the patch file for your environment:

AWSBare MetalProxmox VMs

On AWS, scan resources are constrained to reduce load on shared EKS clusters. Concurrency is limited to 1 scan at a time, retry delay is increased, and resource requests are reduced.

flux/infra/aws/trivy-operator/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: trivy-operator
spec:
  values:
    operator:
      scanJobsConcurrentLimit: 1
      scanJobsRetryDelay: 60s
    trivy:
      severity: CRITICAL,HIGH
    scanJob:
      resources:
        requests:
          cpu: 50m
          memory: 128Mi
        limits:
          cpu: 250m
          memory: 256Mi

Setting	Value	Why
`scanJobsConcurrentLimit`	`1`	Single concurrent scan reduces CPU contention on shared EKS nodes
`scanJobsRetryDelay`	`60s`	Longer delay between retries reduces load spikes
`severity`	`CRITICAL,HIGH`	Reduces alert noise — only high-risk findings are reported
`scanJob.resources`	50m/128Mi → 250m/256Mi	Tighter resource constraints for cost optimization

On Bare Metal, scan resources can be more generous since the cluster is dedicated. Higher concurrency and full severity reporting provide better coverage.

flux/infra/baremetal/trivy-operator/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: trivy-operator
spec:
  values:
    # Default values from base — no overrides needed
    # Base already optimized for bare metal:
    # scanJobsConcurrentLimit: 3
    # severity: CRITICAL,HIGH,MEDIUM
    # Larger resources for full forensic scanning

On Bare Metal, scan resources can be more generous since the cluster is dedicated. Higher concurrency and full severity reporting provide better coverage.

flux/infra/baremetal/trivy-operator/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: trivy-operator
spec:
  values:
    # Default values from base — no overrides needed
    # Base already optimized for bare metal:
    # scanJobsConcurrentLimit: 3
    # severity: CRITICAL,HIGH,MEDIUM
    # Larger resources for full forensic scanning

Helm Values¶

The values file controls Trivy Operator's core features: scan concurrency, severity levels, timeout thresholds, and database sources. Select your environment and deployment size:

AWSBare MetalProxmox VMs

Non-HA

flux/infra/aws/trivy-operator/values.yaml

# Trivy Operator — AWS Non-HA configuration

operator:
  scanJobsConcurrentLimit: 1
  scanJobsRetryDelay: 60s
  privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}

trivy:
  dbRepository: ghcr.io/aquasecurity/trivy-db
  javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  severity: CRITICAL,HIGH
  timeout: 10m0s
  ignoreUnfixed: false

compliance:
  cron: ""

scanJob:
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 250m
      memory: 256Mi

serviceMonitor:
  enabled: true
  interval: 60s
  labels:
    release: prometheus

HANon-HA

flux/infra/baremetal/trivy-operator/values.yaml

# Trivy Operator — Bare Metal HA configuration

operator:
  scanJobsConcurrentLimit: 3
  scanJobsRetryDelay: 30s
  privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}

trivy:
  dbRepository: ghcr.io/aquasecurity/trivy-db
  javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  severity: CRITICAL,HIGH,MEDIUM
  timeout: 10m0s
  ignoreUnfixed: false

compliance:
  cron: "0 2 * * 0"

scanJob:
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

serviceMonitor:
  enabled: true
  interval: 60s
  labels:
    release: prometheus

flux/infra/baremetal/trivy-operator/values.yaml

# Trivy Operator — Bare Metal Non-HA configuration

operator:
  scanJobsConcurrentLimit: 1
  scanJobsRetryDelay: 60s
  privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}

trivy:
  dbRepository: ghcr.io/aquasecurity/trivy-db
  javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  severity: CRITICAL,HIGH
  timeout: 10m0s
  ignoreUnfixed: false

compliance:
  cron: ""

scanJob:
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 250m
      memory: 256Mi

serviceMonitor:
  enabled: true
  interval: 60s
  labels:
    release: prometheus

HANon-HA

flux/infra/baremetal/trivy-operator/values.yaml

# Trivy Operator — Bare Metal HA configuration

operator:
  scanJobsConcurrentLimit: 3
  scanJobsRetryDelay: 30s
  privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}

trivy:
  dbRepository: ghcr.io/aquasecurity/trivy-db
  javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  severity: CRITICAL,HIGH,MEDIUM
  timeout: 10m0s
  ignoreUnfixed: false

compliance:
  cron: "0 2 * * 0"

scanJob:
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      cpu: 500m
      memory: 512Mi

serviceMonitor:
  enabled: true
  interval: 60s
  labels:
    release: prometheus

flux/infra/baremetal/trivy-operator/values.yaml

# Trivy Operator — Bare Metal Non-HA configuration

operator:
  scanJobsConcurrentLimit: 1
  scanJobsRetryDelay: 60s
  privateRegistryScanSecretsNames: {"trivy-system": "harbor-creds"}

trivy:
  dbRepository: ghcr.io/aquasecurity/trivy-db
  javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  severity: CRITICAL,HIGH
  timeout: 10m0s
  ignoreUnfixed: false

compliance:
  cron: ""

scanJob:
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
    limits:
      cpu: 250m
      memory: 256Mi

serviceMonitor:
  enabled: true
  interval: 60s
  labels:
    release: prometheus

Key settings (all environments):

Setting	HA	Non-HA	Why
`scanJobsConcurrentLimit`	`3`	`1`	HA clusters can handle parallel scans; non-HA reduces resource contention
`severity`	`CRITICAL,HIGH,MEDIUM`	`CRITICAL,HIGH`	HA environments report all issues; non-HA focuses on critical findings
`compliance.cron`	`"0 2 * * 0"`	`""`	HA weekly CIS benchmarks; non-HA disables scheduled compliance
`scanJob.resources`	100m/256Mi → 500m/512Mi	50m/128Mi → 250m/256Mi	Resource scaling matches cluster capacity
`dbRepository`	Aqua Security registry (ghcr.io)	Aqua Security registry (ghcr.io)	External CVE database — pulled on-demand during scans

Extra Manifests¶

Save the following additional manifests for your environment:

AWSBare MetalProxmox VMs

No extra manifests required for AWS. The patch and values files are sufficient.

On Bare Metal, deploy a PrometheusRule for Trivy Operator alerting. Save this as flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml:

flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: trivy-operator-alerts
  namespace: trivy-system
  labels:
    release: prometheus
spec:
  groups:
    - name: trivy-operator
      rules:
        - alert: CriticalVulnerabilityFound
          expr: |
            trivy_image_vulnerabilities{severity="Critical"} > 0
          for: 1h
          labels:
            severity: critical
          annotations:
            summary: "Critical CVE in {{ $labels.namespace }}/{{ $labels.resource_name }}"
            description: >
              Image {{ $labels.image_ref }} has {{ $value }} critical
              vulnerabilities. Review the VulnerabilityReport and
              schedule remediation.
            runbook_url: "https://docs.rciis.africa/05-secure/troubleshooting/#trivy-operator"

        - alert: TrivyScanNotRunning
          expr: |
            time() - trivy_operator_build_info > 3600
          for: 30m
          labels:
            severity: warning
          annotations:
            summary: "Trivy Operator may not be running"
            description: >
              The Trivy Operator build info metric has not been updated
              in over 1 hour. Scans may not be running.

        - alert: TrivyScanJobFailure
          expr: |
            increase(trivy_operator_scan_jobs_total{result="failure"}[1h]) > 3
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Trivy scan jobs failing"
            description: >
              More than 3 Trivy scan jobs have failed in the last hour.
              Check scan job logs for resource or connectivity issues.

        - alert: TrivyDBStale
          expr: |
            (time() - trivy_operator_vulnerability_db_last_update_timestamp) > 172800
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Trivy vulnerability database not updated in 48 hours"
            description: >
              The Trivy vulnerability database has not been updated in over
              48 hours. New CVEs will not be detected. Check network
              connectivity to ghcr.io/aquasecurity/trivy-db.

On Bare Metal, deploy a PrometheusRule for Trivy Operator alerting. Save this as flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml:

flux/infra/baremetal/trivy-operator/prometheus-rule-trivy.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: trivy-operator-alerts
  namespace: trivy-system
  labels:
    release: prometheus
spec:
  groups:
    - name: trivy-operator
      rules:
        - alert: CriticalVulnerabilityFound
          expr: |
            trivy_image_vulnerabilities{severity="Critical"} > 0
          for: 1h
          labels:
            severity: critical
          annotations:
            summary: "Critical CVE in {{ $labels.namespace }}/{{ $labels.resource_name }}"
            description: >
              Image {{ $labels.image_ref }} has {{ $value }} critical
              vulnerabilities. Review the VulnerabilityReport and
              schedule remediation.
            runbook_url: "https://docs.rciis.africa/05-secure/troubleshooting/#trivy-operator"

        - alert: TrivyScanNotRunning
          expr: |
            time() - trivy_operator_build_info > 3600
          for: 30m
          labels:
            severity: warning
          annotations:
            summary: "Trivy Operator may not be running"
            description: >
              The Trivy Operator build info metric has not been updated
              in over 1 hour. Scans may not be running.

        - alert: TrivyScanJobFailure
          expr: |
            increase(trivy_operator_scan_jobs_total{result="failure"}[1h]) > 3
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Trivy scan jobs failing"
            description: >
              More than 3 Trivy scan jobs have failed in the last hour.
              Check scan job logs for resource or connectivity issues.

        - alert: TrivyDBStale
          expr: |
            (time() - trivy_operator_vulnerability_db_last_update_timestamp) > 172800
          for: 1h
          labels:
            severity: warning
          annotations:
            summary: "Trivy vulnerability database not updated in 48 hours"
            description: >
              The Trivy vulnerability database has not been updated in over
              48 hours. New CVEs will not be detected. Check network
              connectivity to ghcr.io/aquasecurity/trivy-db.

Commit and Deploy¶

Once all files are in place, commit and push to trigger Flux deployment:

AWSBare MetalProxmox VMs

git add flux/infra/base/trivy-operator.yaml \
        flux/infra/aws/trivy-operator/
git commit -m "feat(trivy-operator): add vulnerability scanning for AWS environment"
git push

git add flux/infra/base/trivy-operator.yaml \
        flux/infra/baremetal/trivy-operator/
git commit -m "feat(trivy-operator): add vulnerability scanning for bare metal environment"
git push

git add flux/infra/base/trivy-operator.yaml \
        flux/infra/baremetal/trivy-operator/
git commit -m "feat(trivy-operator): add vulnerability scanning for bare metal environment"
git push

Flux will detect the new commit and begin deploying Trivy Operator. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-trivy-operator -n flux-system --with-source

Verify¶

After Trivy Operator is deployed, confirm it is working:

# Check the operator pod is running
kubectl -n trivy-system get pods
# Expected: trivy-operator pod Running

# Wait for initial scans to complete (2-5 minutes after install)
kubectl get vulnerabilityreports -A --no-headers | wc -l
# Expected: >0 reports

# Check config audit reports
kubectl get configauditreports -A --no-headers | wc -l
# Expected: >0 reports

# Verify the operator is healthy
kubectl -n trivy-system logs deployment/trivy-operator --tail=10
# Expected: No errors, scan jobs completing

List vulnerability reports:

# All vulnerability reports across the cluster
kubectl get vulnerabilityreports -A

# Reports with CRITICAL findings
kubectl get vulnerabilityreports -A -o json | \
  jq -r '.items[] | select(.report.summary.criticalCount > 0) |
    "\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]) — CRITICAL: \(.report.summary.criticalCount)"'

List config audit reports:

# All config audit failures
kubectl get configauditreports -A

# Show FAIL results
kubectl get configauditreports -A -o json | \
  jq -r '.items[] | select(.report.summary.criticalCount > 0 or .report.summary.highCount > 0) |
    "\(.metadata.namespace)/\(.metadata.labels["trivy-operator.resource.name"]) — CRITICAL: \(.report.summary.criticalCount), HIGH: \(.report.summary.highCount)"'

Flux Operations¶

This component is managed by Flux as HelmRelease trivy-operator and Kustomization infra-trivy-operator.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease trivy-operator -n flux-system

flux get kustomization infra-trivy-operator -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-trivy-operator -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease trivy-operator -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=trivy-operator -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease trivy-operator -n flux-system
flux resume helmrelease trivy-operator -n flux-system
flux reconcile kustomization infra-trivy-operator -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.

Next Steps¶

Proceed to runtime threat detection and behavioral monitoring:

5.2.4 Runtime Threat Detection