5.2.4 Runtime Threat Detection¶

Runtime security detects anomalous behaviour inside running containers and on the host — things that static scanning and admission control cannot catch. This section covers two complementary tools: Falco and Tracee.

How to use this page

Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.

All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.

Using the existing rciis-devops repository: All files already exist. Skip the mkdir and git add/git commit commands — they are for users building a new repository. Simply review the files, edit values for your environment, and push.
Building a new repository from scratch: Follow the mkdir, file creation, and git commands in order.
No Git access: Expand the "Alternative: Helm CLI" block under each Install section.

Falco vs Tracee¶

Both tools monitor kernel-level events but differ in approach and strengths:

	Falco	Tracee
Maintainer	Sysdig / CNCF (Graduated)	Aqua Security
Detection engine	Kernel module or eBPF probe	eBPF only
Talos compatibility	eBPF driver required (Talos cannot load out-of-tree kernel modules at runtime)	Native eBPF — excellent fit for Talos
Rule language	YAML-based Falco rules (condition/output/priority)	Rego policies + Go signatures
Rule ecosystem	Large community rule library, actively maintained	Growing library, strong container-focused defaults
Forensic capture	Events only (syscall fields, metadata)	Full event capture with optional artifact extraction
Alert forwarding	Falcosidekick (30+ output targets: Slack, PagerDuty, webhook, Kafka, OTEL)	Built-in webhook, OpenTelemetry export
Resource overhead	Low–moderate	Low (pure eBPF, no kernel module)
Best for	Broad runtime policy enforcement with mature alerting	Deep forensic investigation and incident response

Recommendation

Deploy both tools. Falco provides broad detection coverage with a mature rule library and flexible alerting. Tracee adds forensic depth for incident investigations. If you must choose one, Falco is the safer default for teams without deep eBPF experience.

Falco¶

Falco monitors Linux syscalls and Kubernetes audit events in real time, matching them against rules that define suspicious behaviour. When a rule triggers, Falco generates an alert that can be routed to any number of destinations via Falcosidekick.

Install¶

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base

Field	Value	Explanation
`chart`	`falco`	The Helm chart name from the Falco Security registry
`version`	`4.25.1`	Pinned chart version — update this to upgrade Falco
`sourceRef.name`	`falcosecurity`	References a `HelmRepository` CR pointing to `https://falcosecurity.github.io/charts`
`targetNamespace`	`falco`	Falco runs in a dedicated namespace for isolation
`driver.kind`	`modern_ebpf`	Uses CO-RE eBPF — required for Talos Linux (see eBPF Driver Selection below)
`remediation.retries`	`3`	Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/falco.yaml:

flux/infra/base/falco.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
  namespace: flux-system
spec:
  targetNamespace: falco
  interval: 30m
  chart:
    spec:
      chart: falco
      version: "4.25.1"
      sourceRef:
        kind: HelmRepository
        name: falcosecurity
        namespace: flux-system
  releaseName: falco
  install:
    createNamespace: true
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    driver:
      kind: modern_ebpf
    tty: true
    falcosidekick:
      enabled: true
      replicaCount: 2
      config:
        slack:
          webhookurl: ""       # Set via SOPS-encrypted secret
          minimumpriority: warning
        webhook:
          address: ""          # Optional: forward to incident management
          minimumpriority: critical
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    customRules:
      rciis-rules.yaml: |-
        - rule: Shell Spawned in Container
          desc: >
            A shell (bash, sh, zsh) was spawned inside a container.
            This is unexpected in production RCIIS workloads.
          condition: >
            spawned_process
            and container
            and proc.name in (bash, sh, zsh, ash, csh, ksh)
            and not proc.pname in (healthcheck)
          output: >
            Shell spawned in container
            (user=%user.name container=%container.name
            image=%container.image.repository
            shell=%proc.name parent=%proc.pname
            cmdline=%proc.cmdline namespace=%k8s.ns.name
            pod=%k8s.pod.name)
          priority: WARNING
          tags: [shell, container, rciis]

        - rule: Read Sensitive File in Container
          desc: >
            A process inside a container read a sensitive file
            (e.g., /etc/shadow, /etc/passwd, private keys).
          condition: >
            open_read
            and container
            and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
          output: >
            Sensitive file read in container
            (user=%user.name file=%fd.name container=%container.name
            image=%container.image.repository namespace=%k8s.ns.name
            pod=%k8s.pod.name)
          priority: WARNING
          tags: [filesystem, sensitive, rciis]

        - rule: Unexpected Outbound Connection from RCIIS
          desc: >
            An RCIIS application pod made an outbound network connection
            to a destination not in the expected allow-list.
          condition: >
            outbound
            and container
            and k8s.ns.name = "rciis"
            and not fd.sip in (rciis_allowed_destinations)
          output: >
            Unexpected outbound connection from RCIIS pod
            (pod=%k8s.pod.name namespace=%k8s.ns.name
            image=%container.image.repository
            connection=%fd.name dest=%fd.sip:%fd.sport)
          priority: CRITICAL
          tags: [network, rciis, exfiltration]

Alternative: Helm CLI

If you do not have Git access, install Falco directly:

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm upgrade --install falco falcosecurity/falco \
  --namespace falco \
  --create-namespace \
  --version 4.25.1 \
  -f values.yaml

eBPF Driver Selection¶

Driver	Kernel Requirement	Talos Support	Recommendation
`module`	Kernel headers + DKMS	Not supported — Talos has no kernel headers or compiler toolchain	Do not use
`ebpf`	BPF support, kernel headers at build time	Works — legacy option	Supported but superseded
`modern_ebpf`	Kernel 5.8+ with BTF (BPF Type Format)	Best fit — Talos ships kernel 6.x with BTF enabled	Recommended

The modern_ebpf driver uses CO-RE (Compile Once, Run Everywhere) and requires no kernel headers at runtime. Since Talos Linux ships a modern kernel (6.x) with BTF support enabled, this is the optimal choice.

Talos eBPF Requirement

Talos Linux has an immutable root filesystem, enforces kernel module signing, and ships no kernel headers or compiler toolchain. This means Falco's traditional kernel module driver — which compiles a .ko against the running kernel at runtime via DKMS — cannot work. Talos does support kernel modules, but only those built into the system image or added as system extensions at image build time.

The practical result: set the Falco driver to modern_ebpf (recommended) or ebpf. Do not use the module driver on Talos nodes.

Configuration¶

The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Falco behaves. Select your environment and deployment size below.

Create the environment overlay directory:

AWSBare MetalProxmox VMs

mkdir -p flux/infra/aws/falco

mkdir -p flux/infra/baremetal/falco

mkdir -p flux/infra/baremetal/falco

Environment Patch¶

The patch file sets Falcosidekick replica count and resource limits appropriate for the target environment.

Save the following as the patch file for your environment:

AWSBare MetalProxmox VMs

On AWS, Falco resources are constrained to reduce cost on shared clusters. Falcosidekick runs as a single replica to reduce overhead.

flux/infra/aws/falco/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  chart:
    spec:
      version: "8.0.1"
  values:
    falcosidekick:
      replicaCount: 1
    resources:
      requests:
        cpu: 50m
        memory: 128Mi
      limits:
        cpu: 250m
        memory: 256Mi
    customRules:
      rciis-rules.yaml: |-
        - rule: Shell Spawned in Container
          desc: A shell was spawned inside a container.
          condition: >
            spawned_process
            and container
            and proc.name in (bash, sh, zsh, ash, csh, ksh)
            and not proc.pname in (healthcheck)
          output: >
            Shell spawned in container
            (user=%user.name container=%container.name
            image=%container.image.repository
            namespace=%k8s.ns.name pod=%k8s.pod.name)
          priority: WARNING
          tags: [shell, container, rciis]

        - rule: Read Sensitive File in Container
          desc: A process inside a container read a sensitive file.
          condition: >
            open_read
            and container
            and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
          output: >
            Sensitive file read in container
            (user=%user.name file=%fd.name container=%container.name
            namespace=%k8s.ns.name pod=%k8s.pod.name)
          priority: WARNING
          tags: [filesystem, sensitive, rciis]

Setting	Value	Why
`falcosidekick.replicaCount`	`1`	Single replica reduces cost on shared AWS clusters
`resources`	50m/128Mi → 250m/256Mi	Tighter constraints for cost optimization
Custom rules	Simplified	AWS patch uses a reduced rule set — full rules live in the base

On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.

flux/infra/baremetal/falco/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  values:
    # Base values are already optimized for bare metal HA:
    # falcosidekick.replicaCount: 2
    # resources: 100m/256Mi → 500m/512Mi
    # Full custom rules with all three RCIIS detections

Setting	Value	Why
`falcosidekick.replicaCount`	`2` (from base)	HA — two Falcosidekick replicas for alert delivery redundancy
`resources`	100m/256Mi → 500m/512Mi (from base)	Full resources for dedicated bare metal nodes

On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.

flux/infra/baremetal/falco/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  values:
    # Base values are already optimized for bare metal HA:
    # falcosidekick.replicaCount: 2
    # resources: 100m/256Mi → 500m/512Mi
    # Full custom rules with all three RCIIS detections

Setting	Value	Why
`falcosidekick.replicaCount`	`2` (from base)	HA — two Falcosidekick replicas for alert delivery redundancy
`resources`	100m/256Mi → 500m/512Mi (from base)	Full resources for dedicated bare metal nodes

Helm Values¶

The values file controls Falco's core features. Select your environment and deployment size:

AWSBare MetalProxmox VMs

Non-HA

flux/infra/aws/falco/values.yaml

# Falco — AWS Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

HANon-HA

flux/infra/baremetal/falco/values.yaml

# Falco — Bare Metal HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 2
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning
    webhook:
      address: ""
      minimumpriority: critical

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

flux/infra/baremetal/falco/values.yaml

# Falco — Bare Metal Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

HANon-HA

flux/infra/baremetal/falco/values.yaml

# Falco — Bare Metal HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 2
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning
    webhook:
      address: ""
      minimumpriority: critical

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

flux/infra/baremetal/falco/values.yaml

# Falco — Bare Metal Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

Key settings (all environments):

Setting	HA	Non-HA	Why
`driver.kind`	`modern_ebpf`	`modern_ebpf`	Talos-specific — CO-RE eBPF, no kernel headers needed at runtime
`falcosidekick.replicaCount`	`2`	`1`	HA runs two replicas for alert delivery redundancy
`resources`	100m/256Mi → 500m/512Mi	50m/128Mi → 250m/256Mi	Resource scaling matches cluster capacity

Extra Manifests¶

Save the following additional manifests for your environment:

AWSBare MetalProxmox VMs

No extra manifests required for AWS. The patch and values files are sufficient.

On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.

ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:

flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: falcosidekick
  namespace: falco
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: falcosidekick
  endpoints:
    - port: http
      interval: 30s
      path: /metrics

PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:

flux/infra/baremetal/falco/prometheus-rule-falco.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: falco-alerts
  namespace: falco
  labels:
    release: prometheus
spec:
  groups:
    - name: falco
      rules:
        - alert: FalcoCriticalAlert
          expr: |
            increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Falco CRITICAL priority event detected"
            description: >
              A CRITICAL priority Falco event was detected in the last 5 minutes.
              This may indicate a container escape, privilege escalation, or
              data exfiltration attempt. Investigate immediately.
            runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"

        - alert: FalcoPodDown
          expr: |
            kube_daemonset_status_number_unavailable{
              namespace="falco",
              daemonset="falco"
            } > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Falco DaemonSet not running on all nodes"
            description: >
              {{ $value }} node(s) do not have a running Falco pod.
              Runtime security monitoring is incomplete.
            runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"

        - alert: FalcoHighAlertRate
          expr: |
            sum(rate(falcosidekick_outputs_total[5m])) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Falco alert rate exceeds 100/min"
            description: >
              Falco is generating more than 100 alerts per minute for over 10
              minutes. This may indicate a false positive storm from a noisy
              rule. Review Falco rules and tune conditions.

Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:

flux/infra/baremetal/falco/grafana-dashboard-falco.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-falco
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  falco-dashboard.json: |-
    { "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }

Tip

Import the Falcosidekick dashboard (ID: 17514) from Grafana.com for a pre-built visualization of Falco event rates by priority and output target.

On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.

ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:

flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: falcosidekick
  namespace: falco
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: falcosidekick
  endpoints:
    - port: http
      interval: 30s
      path: /metrics

PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:

flux/infra/baremetal/falco/prometheus-rule-falco.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: falco-alerts
  namespace: falco
  labels:
    release: prometheus
spec:
  groups:
    - name: falco
      rules:
        - alert: FalcoCriticalAlert
          expr: |
            increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Falco CRITICAL priority event detected"
            description: >
              A CRITICAL priority Falco event was detected in the last 5 minutes.
              This may indicate a container escape, privilege escalation, or
              data exfiltration attempt. Investigate immediately.
            runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"

        - alert: FalcoPodDown
          expr: |
            kube_daemonset_status_number_unavailable{
              namespace="falco",
              daemonset="falco"
            } > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Falco DaemonSet not running on all nodes"
            description: >
              {{ $value }} node(s) do not have a running Falco pod.
              Runtime security monitoring is incomplete.
            runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"

        - alert: FalcoHighAlertRate
          expr: |
            sum(rate(falcosidekick_outputs_total[5m])) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Falco alert rate exceeds 100/min"
            description: >
              Falco is generating more than 100 alerts per minute for over 10
              minutes. This may indicate a false positive storm from a noisy
              rule. Review Falco rules and tune conditions.

Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:

flux/infra/baremetal/falco/grafana-dashboard-falco.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-falco
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  falco-dashboard.json: |-
    { "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }

Tip

Import the Falcosidekick dashboard (ID: 17514) from Grafana.com for a pre-built visualization of Falco event rates by priority and output target.

Falcosidekick Alert Routing¶

Falcosidekick supports 30+ output destinations. Common configurations for RCIIS:

Destination	Use Case	Priority Filter
Slack	Team notifications	`>= WARNING`
PagerDuty	On-call escalation	`>= CRITICAL`
Webhook	SIEM / incident management integration	`>= NOTICE`
Kafka	Event streaming for analytics	`>= INFORMATIONAL`
Prometheus (metrics)	Dashboard and SLO tracking	All

Falcosidekick Secret Management¶

Webhook URLs and API keys for alert destinations must not be stored in plain text in the Helm values file. Use a Kubernetes Secret referenced by Falcosidekick.

Create a SOPS-encrypted secret:

secret-falcosidekick.yaml

apiVersion: v1
kind: Secret
metadata:
  name: falcosidekick-secrets
  namespace: falco
stringData:
  SLACK_WEBHOOKURL: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
  PAGERDUTY_ROUTINGKEY: "your-pagerduty-routing-key"

Encrypt with SOPS:

sops -e secret-falcosidekick.yaml > secret-falcosidekick.enc.yaml

Reference the secret in the Helm values:

values addition

falcosidekick:
  existingSecret: falcosidekick-secrets

This follows the same SOPS + KSOPS pattern used throughout the project. See Credential Management for the full SOPS workflow.

Commit and Deploy¶

Once all files are in place, commit and push to trigger Flux deployment:

AWSBare MetalProxmox VMs

git add flux/infra/base/falco.yaml \
        flux/infra/aws/falco/
git commit -m "feat(falco): add runtime security monitoring for AWS environment"
git push

git add flux/infra/base/falco.yaml \
        flux/infra/baremetal/falco/
git commit -m "feat(falco): add runtime security monitoring for bare metal environment"
git push

git add flux/infra/base/falco.yaml \
        flux/infra/baremetal/falco/
git commit -m "feat(falco): add runtime security monitoring for bare metal environment"
git push

Flux will detect the new commit and begin deploying Falco. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-falco -n flux-system --with-source

Verify¶

After Falco is deployed, confirm it is working:

# Check pods are running
kubectl -n falco get pods
# Expected: falco DaemonSet pods Running, falcosidekick Deployment Running

# Check Falco logs for successful eBPF probe load
kubectl -n falco logs -l app.kubernetes.io/name=falco | grep -i "ebpf\|loaded"

# Trigger a test alert — exec into any pod
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"

# Check Falco logs for the "Shell Spawned in Container" alert
kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"

AWSBare MetalProxmox VMs

# Verify Falcosidekick is forwarding alerts
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20

# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
kill %1

# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
kill %1

Flux Operations¶

This component is managed by Flux as HelmRelease falco and Kustomization infra-falco.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease falco -n flux-system

flux get kustomization infra-falco -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-falco -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease falco -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=falco -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease falco -n flux-system
flux resume helmrelease falco -n flux-system
flux reconcile kustomization infra-falco -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.

Tracee¶

Tracee uses eBPF to trace system calls, network events, and kernel functions at the OS level. It provides deeper visibility than Falco for forensic investigations — capturing not just that an event happened, but the full context around it (arguments, stack traces, file contents).

Install¶

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base

Field	Value	Explanation
`chart`	`tracee`	The Helm chart name from the Aqua Security registry
`version`	`0.22.1`	Pinned chart version — update this to upgrade Tracee
`sourceRef.name`	`aquasecurity`	References a `HelmRepository` CR pointing to `https://aquasecurity.github.io/helm-charts`
`targetNamespace`	`tracee`	Tracee runs in a dedicated namespace for isolation
`dependsOn`	`falco`	Tracee depends on Falco so Falcosidekick is available for alert forwarding
`remediation.retries`	`3`	Flux retries up to 3 times if the install or upgrade fails

eBPF on Talos

Tracee is pure eBPF and works well on Talos Linux without modification. The Talos kernel ships with full eBPF support enabled. No kernel headers or build tools are needed since Tracee uses CO-RE (Compile Once, Run Everywhere) with BTF.

Save the following as flux/infra/base/tracee.yaml:

flux/infra/base/tracee.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
  namespace: flux-system
spec:
  dependsOn:
    - name: falco
  targetNamespace: tracee
  interval: 30m
  chart:
    spec:
      chart: tracee
      version: "0.22.1"
      sourceRef:
        kind: HelmRepository
        name: aquasecurity
        namespace: flux-system
  releaseName: tracee
  install:
    createNamespace: true
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    config:
      output:
        - webhook:http://falcosidekick.falco:2801  # Share Falco's alert pipeline
        - json:/tmp/tracee/events.json
      filter:
        event:
          - security_file_open
          - security_socket_connect
          - security_socket_bind
          - magic_write
          - mem_prot_alert
          - process_execute
          - sched_process_exec
          - anti_debugging
          - ptrace
          - security_bpf
        scope:
          - not container=tracee
          - not container=falco
      capture:
        write: true
        exec: true
        network: true
        dir: /tmp/tracee/captures

Alternative: Helm CLI

If you do not have Git access, install Tracee directly:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts
helm repo update
helm upgrade --install tracee aquasecurity/tracee \
  --namespace tracee \
  --create-namespace \
  --version 0.22.1 \
  -f values.yaml

Configuration¶

The environment patch overrides the base HelmRelease with cluster-specific settings. Select your environment and deployment size below.

Create the environment overlay directory:

AWSBare MetalProxmox VMs

mkdir -p flux/infra/aws/tracee

mkdir -p flux/infra/baremetal/tracee

mkdir -p flux/infra/baremetal/tracee

Environment Patch¶

The patch file adjusts resource limits and Talos-specific compatibility settings for the target environment.

Save the following as the patch file for your environment:

AWSBare MetalProxmox VMs

On AWS, the Tracee patch applies Talos Linux eBPF compatibility workarounds. The Helm chart hardcodes the containerd socket at /var/run which does not exist on Talos, and several eBPF filesystems must be mounted before Tracee starts.

flux/infra/aws/tracee/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false
  postRenderers:
    - kustomize:
        patches:
          - target:
              kind: DaemonSet
              name: tracee
            patch: |
              # Talos Linux eBPF compatibility patches
              #
              # Talos does not mount debugfs, tracefs, or bpffs by default
              # and sets kptr_restrict=2. The Helm chart also hardcodes the
              # containerd socket at /var/run which doesn't exist on Talos.
              - op: replace
                path: /spec/template/spec/containers/0/command
                value:
                  - /bin/sh
                  - -c
                  - |
                    set -e
                    echo "Preparing eBPF environment for Talos Linux..."
                    if ! mountpoint -q /sys/kernel/debug 2>/dev/null; then
                      mount -t debugfs debugfs /sys/kernel/debug 2>/dev/null || echo "WARN: Could not mount debugfs"
                    fi
                    if ! mountpoint -q /sys/kernel/tracing 2>/dev/null; then
                      mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null || echo "WARN: Could not mount tracefs"
                    fi
                    if ! mountpoint -q /sys/fs/bpf 2>/dev/null; then
                      mount -t bpf bpf /sys/fs/bpf 2>/dev/null || echo "WARN: Could not mount bpffs"
                    fi
                    echo 1 > /proc/sys/kernel/kptr_restrict 2>/dev/null || echo "WARN: Could not set kptr_restrict"
                    echo "eBPF environment ready. Starting Tracee..."
                    exec /tracee/tracee --config /tracee/config.yaml --server healthz --server http-address=:3366 --server metrics
              - op: replace
                path: /spec/template/spec/containers/0/args
                value: []
              - op: replace
                path: /spec/template/spec/volumes/2/hostPath/path
                value: /run/containerd/containerd.sock
              - op: add
                path: /spec/template/spec/containers/0/startupProbe
                value:
                  httpGet:
                    path: /healthz
                    port: 3366
                  initialDelaySeconds: 15
                  periodSeconds: 10
                  timeoutSeconds: 5
                  failureThreshold: 30
              - op: replace
                path: /spec/template/spec/containers/0/readinessProbe
                value:
                  httpGet:
                    path: /healthz
                    port: 3366
                  initialDelaySeconds: 30
                  periodSeconds: 10
                  timeoutSeconds: 5
                  failureThreshold: 6

Setting	Value	Why
`postRenderers`	Kustomize patches	Applies Talos-specific runtime fixes after Helm renders the manifests
`command` override	Shell wrapper	Mounts debugfs, tracefs, bpffs before starting Tracee on Talos
`volumes[2].hostPath`	`/run/containerd/containerd.sock`	Fixes hardcoded `/var/run` path — Talos uses `/run`
`startupProbe`	15s delay, 30 failures	Allows extra time for eBPF program loading during startup
`config.output.format`	`json`	Structured output for log aggregation and forwarding

On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.

flux/infra/baremetal/tracee/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false

Setting	Value	Why
`resources.limits.cpu`	`1`	Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware
`resources.limits.memory`	`1Gi`	Full forensic capture (`write`, `exec`, `network`) requires more memory than base
`config.output.format`	`json`	Structured output for log aggregation

On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.

flux/infra/baremetal/tracee/patch.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false

Setting	Value	Why
`resources.limits.cpu`	`1`	Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware
`resources.limits.memory`	`1Gi`	Full forensic capture (`write`, `exec`, `network`) requires more memory than base
`config.output.format`	`json`	Structured output for log aggregation

Helm Values¶

The values file controls Tracee's core event filters and capture settings. Select your environment and deployment size:

AWSBare MetalProxmox VMs

Non-HA

flux/infra/aws/tracee/values.yaml

# Tracee — AWS Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: "1"
    memory: 1Gi

config:
  output:
    format: json
    options:
      parseArguments: true
      stackAddresses: false
      execEnv: false
      execHash: dev-inode
      sortEvents: false

HANon-HA

flux/infra/baremetal/tracee/values.yaml

# Tracee — Bare Metal HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

config:
  output:
    - webhook:http://falcosidekick.falco:2801
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - security_socket_bind
      - magic_write
      - mem_prot_alert
      - process_execute
      - sched_process_exec
      - anti_debugging
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  capture:
    write: true
    exec: true
    network: true
    dir: /tmp/tracee/captures

flux/infra/baremetal/tracee/values.yaml

# Tracee — Bare Metal Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

config:
  output:
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - process_execute
      - sched_process_exec
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  # Forensic capture disabled to save disk space
  capture:
    write: false
    exec: false
    network: false

HANon-HA

flux/infra/baremetal/tracee/values.yaml

# Tracee — Bare Metal HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

config:
  output:
    - webhook:http://falcosidekick.falco:2801
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - security_socket_bind
      - magic_write
      - mem_prot_alert
      - process_execute
      - sched_process_exec
      - anti_debugging
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  capture:
    write: true
    exec: true
    network: true
    dir: /tmp/tracee/captures

flux/infra/baremetal/tracee/values.yaml

# Tracee — Bare Metal Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

config:
  output:
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - process_execute
      - sched_process_exec
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  # Forensic capture disabled to save disk space
  capture:
    write: false
    exec: false
    network: false

Key settings (all environments):

Setting	HA	Non-HA	Why
`capture.write/exec/network`	`true`	`false`	HA captures artefacts for forensics; non-HA saves disk space
`config.output`	webhook + json	json only	HA forwards to Falcosidekick for unified alert pipeline
`resources`	100m/256Mi → 500m/512Mi	50m/128Mi → 250m/256Mi	Resource scaling matches cluster capacity
`filter.event`	Full set (10 events)	Reduced set (6 events)	HA monitors all critical events; non-HA focuses on highest-risk

Tracee Policies¶

Tracee supports custom policies using Rego or Go signatures. Deploy the RCIIS policy to detect sensitive operations.

Save the following as the policy manifest — this applies to all environments:

flux/infra/base/tracee-policy-rciis.yaml

apiVersion: tracee.aquasec.com/v1beta1
kind: Policy
metadata:
  name: rciis-sensitive-operations
  namespace: tracee
spec:
  scope:
    - global
  rules:
    # Detect fileless execution (process running from memory)
    - event: security_file_open
      filters:
        - data.pathname=/proc/self/mem
    # Detect attempts to disable security modules
    - event: security_bpf
    # Detect privilege escalation attempts
    - event: ptrace
      filters:
        - data.request=PTRACE_ATTACH
    # Detect container escape attempts
    - event: security_socket_bind
      filters:
        - data.addr.port=0

Extra Manifests¶

Save the following additional manifests for your environment:

AWSBare MetalProxmox VMs

No extra manifests required for AWS. The patch and values files are sufficient.

On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee exposes Prometheus metrics on port 3366 when started with --server metrics.

Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:

flux/infra/baremetal/tracee/servicemonitor-tracee.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tracee
  namespace: tracee
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: tracee
  endpoints:
    - port: metrics
      interval: 30s

Note

Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.

On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee exposes Prometheus metrics on port 3366 when started with --server metrics.

Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:

flux/infra/baremetal/tracee/servicemonitor-tracee.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tracee
  namespace: tracee
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: tracee
  endpoints:
    - port: metrics
      interval: 30s

Note

Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.

Commit and Deploy¶

Once all files are in place, commit and push to trigger Flux deployment:

AWSBare MetalProxmox VMs

git add flux/infra/base/tracee.yaml \
        flux/infra/aws/tracee/
git commit -m "feat(tracee): add forensic monitoring for AWS environment"
git push

git add flux/infra/base/tracee.yaml \
        flux/infra/baremetal/tracee/
git commit -m "feat(tracee): add forensic monitoring for bare metal environment"
git push

git add flux/infra/base/tracee.yaml \
        flux/infra/baremetal/tracee/
git commit -m "feat(tracee): add forensic monitoring for bare metal environment"
git push

Flux will detect the new commit and begin deploying Tracee. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-tracee -n flux-system --with-source

Verify¶

After Tracee is deployed, confirm it is working:

# Check DaemonSet is running on all nodes
kubectl -n tracee get ds

# Verify eBPF programs are loaded
kubectl -n tracee logs -l app.kubernetes.io/name=tracee | grep -i "loaded\|ready"

# Check captured events (HA only — capture must be enabled)
kubectl -n tracee exec -it ds/tracee -- cat /tmp/tracee/events.json | head -5

AWSBare MetalProxmox VMs

# Verify Tracee health endpoint is responding
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/healthz
kill %1

# Verify Tracee metrics are exposed
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/metrics | grep tracee
kill %1

# Verify Tracee metrics are exposed
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/metrics | grep tracee
kill %1

Flux Operations¶

This component is managed by Flux as HelmRelease tracee and Kustomization infra-tracee.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease tracee -n flux-system

flux get kustomization infra-tracee -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-tracee -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease tracee -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=tracee -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease tracee -n flux-system
flux resume helmrelease tracee -n flux-system
flux reconcile kustomization infra-tracee -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.

Using Falco and Tracee Together¶

When both tools are deployed, use them for different purposes:

Scenario	Tool	Why
Continuous monitoring and alerting	Falco	Mature alerting pipeline via Falcosidekick, broad rule coverage
Incident investigation and forensics	Tracee	Captures full event context, file contents, and network data
Compliance evidence	Falco	PolicyReport-compatible output, audit trail
Threat hunting	Tracee	Query historical events with rich filter expressions
Container escape detection	Both	Complementary detection signatures

Shared Alert Pipeline¶

Configure Tracee to forward events to Falcosidekick for a unified alert pipeline:

Tracee events  --webhook-->  Falcosidekick  ---->  Slack / PagerDuty / SIEM
Falco events   ----------->  Falcosidekick  ---->  Slack / PagerDuty / SIEM

This gives a single pane of glass for all runtime security events while preserving Tracee's deeper forensic data for on-demand investigation.

Verify Alert Pipeline¶

After deploying both Falco and Tracee with Falcosidekick, verify the full alert pipeline end-to-end:

Step 1 — Trigger a Falco alert:

# Exec into a test pod to trigger "Shell Spawned in Container"
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"

Step 2 — Confirm Falco detected the event:

kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"
# Expected: Alert line containing "Shell spawned in container"

Step 3 — Confirm Falcosidekick received and forwarded the alert:

# Check Falcosidekick logs
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20
# Expected: Log lines showing the alert was forwarded to configured outputs

# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
# Expected: Counter incremented for the configured output (slack, webhook, etc.)
kill %1

Step 4 — Confirm delivery at the destination:

Slack: Check the configured Slack channel for the alert message
Webhook: Check the endpoint's logs for the received payload
Kafka: Consume from the configured topic to verify the event arrived

Step 5 — Verify Tracee → Falcosidekick pipeline (HA configuration only):

If Tracee is configured to forward events to Falcosidekick via webhook:

# Check Tracee logs for webhook delivery
kubectl -n tracee logs -l app.kubernetes.io/name=tracee --tail=20 | grep -i "webhook\|output"
# Expected: Events being sent to http://falcosidekick.falco:2801

# Check Falcosidekick for Tracee-sourced events
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20 | grep -i "tracee"

Tip

If alerts are not arriving at the destination, check connectivity between namespaces. The default-deny NetworkPolicy generated by Kyverno may block Falcosidekick's outbound traffic. Create a NetworkPolicy allowing egress from the falco namespace to external webhook endpoints.

Next Steps¶

Proceed to troubleshooting guidance for security components:

5.2.5 Troubleshooting