Skip to content

5.2.4 Runtime Threat Detection

Runtime security detects anomalous behaviour inside running containers and on the host — things that static scanning and admission control cannot catch. This section covers two complementary tools: Falco and Tracee.

How to use this page

Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.

All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.

  • Using the existing rciis-devops repository: All files already exist. Skip the mkdir and git add/git commit commands — they are for users building a new repository. Simply review the files, edit values for your environment, and push.
  • Building a new repository from scratch: Follow the mkdir, file creation, and git commands in order.
  • No Git access: Expand the "Alternative: Helm CLI" block under each Install section.

Falco vs Tracee

Both tools monitor kernel-level events but differ in approach and strengths:

Falco Tracee
Maintainer Sysdig / CNCF (Graduated) Aqua Security
Detection engine Kernel module or eBPF probe eBPF only
Talos compatibility eBPF driver required (Talos cannot load out-of-tree kernel modules at runtime) Native eBPF — excellent fit for Talos
Rule language YAML-based Falco rules (condition/output/priority) Rego policies + Go signatures
Rule ecosystem Large community rule library, actively maintained Growing library, strong container-focused defaults
Forensic capture Events only (syscall fields, metadata) Full event capture with optional artifact extraction
Alert forwarding Falcosidekick (30+ output targets: Slack, PagerDuty, webhook, Kafka, OTEL) Built-in webhook, OpenTelemetry export
Resource overhead Low–moderate Low (pure eBPF, no kernel module)
Best for Broad runtime policy enforcement with mature alerting Deep forensic investigation and incident response

Recommendation

Deploy both tools. Falco provides broad detection coverage with a mature rule library and flexible alerting. Tracee adds forensic depth for incident investigations. If you must choose one, Falco is the safer default for teams without deep eBPF experience.


Falco

Falco monitors Linux syscalls and Kubernetes audit events in real time, matching them against rules that define suspicious behaviour. When a rule triggers, Falco generates an alert that can be routed to any number of destinations via Falcosidekick.

Install

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base
Field Value Explanation
chart falco The Helm chart name from the Falco Security registry
version 4.25.1 Pinned chart version — update this to upgrade Falco
sourceRef.name falcosecurity References a HelmRepository CR pointing to https://falcosecurity.github.io/charts
targetNamespace falco Falco runs in a dedicated namespace for isolation
driver.kind modern_ebpf Uses CO-RE eBPF — required for Talos Linux (see eBPF Driver Selection below)
remediation.retries 3 Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/falco.yaml:

flux/infra/base/falco.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
  namespace: flux-system
spec:
  targetNamespace: falco
  interval: 30m
  chart:
    spec:
      chart: falco
      version: "4.25.1"
      sourceRef:
        kind: HelmRepository
        name: falcosecurity
        namespace: flux-system
  releaseName: falco
  install:
    createNamespace: true
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    driver:
      kind: modern_ebpf
    tty: true
    falcosidekick:
      enabled: true
      replicaCount: 2
      config:
        slack:
          webhookurl: ""       # Set via SOPS-encrypted secret
          minimumpriority: warning
        webhook:
          address: ""          # Optional: forward to incident management
          minimumpriority: critical
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    customRules:
      rciis-rules.yaml: |-
        - rule: Shell Spawned in Container
          desc: >
            A shell (bash, sh, zsh) was spawned inside a container.
            This is unexpected in production RCIIS workloads.
          condition: >
            spawned_process
            and container
            and proc.name in (bash, sh, zsh, ash, csh, ksh)
            and not proc.pname in (healthcheck)
          output: >
            Shell spawned in container
            (user=%user.name container=%container.name
            image=%container.image.repository
            shell=%proc.name parent=%proc.pname
            cmdline=%proc.cmdline namespace=%k8s.ns.name
            pod=%k8s.pod.name)
          priority: WARNING
          tags: [shell, container, rciis]

        - rule: Read Sensitive File in Container
          desc: >
            A process inside a container read a sensitive file
            (e.g., /etc/shadow, /etc/passwd, private keys).
          condition: >
            open_read
            and container
            and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
          output: >
            Sensitive file read in container
            (user=%user.name file=%fd.name container=%container.name
            image=%container.image.repository namespace=%k8s.ns.name
            pod=%k8s.pod.name)
          priority: WARNING
          tags: [filesystem, sensitive, rciis]

        - rule: Unexpected Outbound Connection from RCIIS
          desc: >
            An RCIIS application pod made an outbound network connection
            to a destination not in the expected allow-list.
          condition: >
            outbound
            and container
            and k8s.ns.name = "rciis"
            and not fd.sip in (rciis_allowed_destinations)
          output: >
            Unexpected outbound connection from RCIIS pod
            (pod=%k8s.pod.name namespace=%k8s.ns.name
            image=%container.image.repository
            connection=%fd.name dest=%fd.sip:%fd.sport)
          priority: CRITICAL
          tags: [network, rciis, exfiltration]
Alternative: Helm CLI

If you do not have Git access, install Falco directly:

helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
helm upgrade --install falco falcosecurity/falco \
  --namespace falco \
  --create-namespace \
  --version 4.25.1 \
  -f values.yaml

eBPF Driver Selection

Driver Kernel Requirement Talos Support Recommendation
module Kernel headers + DKMS Not supported — Talos has no kernel headers or compiler toolchain Do not use
ebpf BPF support, kernel headers at build time Works — legacy option Supported but superseded
modern_ebpf Kernel 5.8+ with BTF (BPF Type Format) Best fit — Talos ships kernel 6.x with BTF enabled Recommended

The modern_ebpf driver uses CO-RE (Compile Once, Run Everywhere) and requires no kernel headers at runtime. Since Talos Linux ships a modern kernel (6.x) with BTF support enabled, this is the optimal choice.

Talos eBPF Requirement

Talos Linux has an immutable root filesystem, enforces kernel module signing, and ships no kernel headers or compiler toolchain. This means Falco's traditional kernel module driver — which compiles a .ko against the running kernel at runtime via DKMS — cannot work. Talos does support kernel modules, but only those built into the system image or added as system extensions at image build time.

The practical result: set the Falco driver to modern_ebpf (recommended) or ebpf. Do not use the module driver on Talos nodes.

Configuration

The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Falco behaves. Select your environment and deployment size below.

Create the environment overlay directory:

mkdir -p flux/infra/aws/falco
mkdir -p flux/infra/baremetal/falco
mkdir -p flux/infra/baremetal/falco

Environment Patch

The patch file sets Falcosidekick replica count and resource limits appropriate for the target environment.

Save the following as the patch file for your environment:

On AWS, Falco resources are constrained to reduce cost on shared clusters. Falcosidekick runs as a single replica to reduce overhead.

flux/infra/aws/falco/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  chart:
    spec:
      version: "8.0.1"
  values:
    falcosidekick:
      replicaCount: 1
    resources:
      requests:
        cpu: 50m
        memory: 128Mi
      limits:
        cpu: 250m
        memory: 256Mi
    customRules:
      rciis-rules.yaml: |-
        - rule: Shell Spawned in Container
          desc: A shell was spawned inside a container.
          condition: >
            spawned_process
            and container
            and proc.name in (bash, sh, zsh, ash, csh, ksh)
            and not proc.pname in (healthcheck)
          output: >
            Shell spawned in container
            (user=%user.name container=%container.name
            image=%container.image.repository
            namespace=%k8s.ns.name pod=%k8s.pod.name)
          priority: WARNING
          tags: [shell, container, rciis]

        - rule: Read Sensitive File in Container
          desc: A process inside a container read a sensitive file.
          condition: >
            open_read
            and container
            and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
          output: >
            Sensitive file read in container
            (user=%user.name file=%fd.name container=%container.name
            namespace=%k8s.ns.name pod=%k8s.pod.name)
          priority: WARNING
          tags: [filesystem, sensitive, rciis]
Setting Value Why
falcosidekick.replicaCount 1 Single replica reduces cost on shared AWS clusters
resources 50m/128Mi → 250m/256Mi Tighter constraints for cost optimization
Custom rules Simplified AWS patch uses a reduced rule set — full rules live in the base

On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.

flux/infra/baremetal/falco/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  values:
    # Base values are already optimized for bare metal HA:
    # falcosidekick.replicaCount: 2
    # resources: 100m/256Mi → 500m/512Mi
    # Full custom rules with all three RCIIS detections
Setting Value Why
falcosidekick.replicaCount 2 (from base) HA — two Falcosidekick replicas for alert delivery redundancy
resources 100m/256Mi → 500m/512Mi (from base) Full resources for dedicated bare metal nodes

On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.

flux/infra/baremetal/falco/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: falco
spec:
  values:
    # Base values are already optimized for bare metal HA:
    # falcosidekick.replicaCount: 2
    # resources: 100m/256Mi → 500m/512Mi
    # Full custom rules with all three RCIIS detections
Setting Value Why
falcosidekick.replicaCount 2 (from base) HA — two Falcosidekick replicas for alert delivery redundancy
resources 100m/256Mi → 500m/512Mi (from base) Full resources for dedicated bare metal nodes

Helm Values

The values file controls Falco's core features. Select your environment and deployment size:

flux/infra/aws/falco/values.yaml
# Falco — AWS Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi
flux/infra/baremetal/falco/values.yaml
# Falco — Bare Metal HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 2
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning
    webhook:
      address: ""
      minimumpriority: critical

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
flux/infra/baremetal/falco/values.yaml
# Falco — Bare Metal Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi
flux/infra/baremetal/falco/values.yaml
# Falco — Bare Metal HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 2
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning
    webhook:
      address: ""
      minimumpriority: critical

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi
flux/infra/baremetal/falco/values.yaml
# Falco — Bare Metal Non-HA configuration

driver:
  kind: modern_ebpf

tty: true

falcosidekick:
  enabled: true
  replicaCount: 1
  config:
    slack:
      webhookurl: ""
      minimumpriority: warning

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

Key settings (all environments):

Setting HA Non-HA Why
driver.kind modern_ebpf modern_ebpf Talos-specific — CO-RE eBPF, no kernel headers needed at runtime
falcosidekick.replicaCount 2 1 HA runs two replicas for alert delivery redundancy
resources 100m/256Mi → 500m/512Mi 50m/128Mi → 250m/256Mi Resource scaling matches cluster capacity

Extra Manifests

Save the following additional manifests for your environment:

No extra manifests required for AWS. The patch and values files are sufficient.

On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.

ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:

flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: falcosidekick
  namespace: falco
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: falcosidekick
  endpoints:
    - port: http
      interval: 30s
      path: /metrics

PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:

flux/infra/baremetal/falco/prometheus-rule-falco.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: falco-alerts
  namespace: falco
  labels:
    release: prometheus
spec:
  groups:
    - name: falco
      rules:
        - alert: FalcoCriticalAlert
          expr: |
            increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Falco CRITICAL priority event detected"
            description: >
              A CRITICAL priority Falco event was detected in the last 5 minutes.
              This may indicate a container escape, privilege escalation, or
              data exfiltration attempt. Investigate immediately.
            runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"

        - alert: FalcoPodDown
          expr: |
            kube_daemonset_status_number_unavailable{
              namespace="falco",
              daemonset="falco"
            } > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Falco DaemonSet not running on all nodes"
            description: >
              {{ $value }} node(s) do not have a running Falco pod.
              Runtime security monitoring is incomplete.
            runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"

        - alert: FalcoHighAlertRate
          expr: |
            sum(rate(falcosidekick_outputs_total[5m])) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Falco alert rate exceeds 100/min"
            description: >
              Falco is generating more than 100 alerts per minute for over 10
              minutes. This may indicate a false positive storm from a noisy
              rule. Review Falco rules and tune conditions.

Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:

flux/infra/baremetal/falco/grafana-dashboard-falco.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-falco
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  falco-dashboard.json: |-
    { "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }

Tip

Import the Falcosidekick dashboard (ID: 17514) from Grafana.com for a pre-built visualization of Falco event rates by priority and output target.

On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.

ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:

flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: falcosidekick
  namespace: falco
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: falcosidekick
  endpoints:
    - port: http
      interval: 30s
      path: /metrics

PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:

flux/infra/baremetal/falco/prometheus-rule-falco.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: falco-alerts
  namespace: falco
  labels:
    release: prometheus
spec:
  groups:
    - name: falco
      rules:
        - alert: FalcoCriticalAlert
          expr: |
            increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
          for: 0m
          labels:
            severity: critical
          annotations:
            summary: "Falco CRITICAL priority event detected"
            description: >
              A CRITICAL priority Falco event was detected in the last 5 minutes.
              This may indicate a container escape, privilege escalation, or
              data exfiltration attempt. Investigate immediately.
            runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"

        - alert: FalcoPodDown
          expr: |
            kube_daemonset_status_number_unavailable{
              namespace="falco",
              daemonset="falco"
            } > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Falco DaemonSet not running on all nodes"
            description: >
              {{ $value }} node(s) do not have a running Falco pod.
              Runtime security monitoring is incomplete.
            runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"

        - alert: FalcoHighAlertRate
          expr: |
            sum(rate(falcosidekick_outputs_total[5m])) > 100
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Falco alert rate exceeds 100/min"
            description: >
              Falco is generating more than 100 alerts per minute for over 10
              minutes. This may indicate a false positive storm from a noisy
              rule. Review Falco rules and tune conditions.

Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:

flux/infra/baremetal/falco/grafana-dashboard-falco.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-falco
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  falco-dashboard.json: |-
    { "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }

Tip

Import the Falcosidekick dashboard (ID: 17514) from Grafana.com for a pre-built visualization of Falco event rates by priority and output target.

Falcosidekick Alert Routing

Falcosidekick supports 30+ output destinations. Common configurations for RCIIS:

Destination Use Case Priority Filter
Slack Team notifications >= WARNING
PagerDuty On-call escalation >= CRITICAL
Webhook SIEM / incident management integration >= NOTICE
Kafka Event streaming for analytics >= INFORMATIONAL
Prometheus (metrics) Dashboard and SLO tracking All

Falcosidekick Secret Management

Webhook URLs and API keys for alert destinations must not be stored in plain text in the Helm values file. Use a Kubernetes Secret referenced by Falcosidekick.

Create a SOPS-encrypted secret:

secret-falcosidekick.yaml
apiVersion: v1
kind: Secret
metadata:
  name: falcosidekick-secrets
  namespace: falco
stringData:
  SLACK_WEBHOOKURL: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
  PAGERDUTY_ROUTINGKEY: "your-pagerduty-routing-key"

Encrypt with SOPS:

sops -e secret-falcosidekick.yaml > secret-falcosidekick.enc.yaml

Reference the secret in the Helm values:

values addition
falcosidekick:
  existingSecret: falcosidekick-secrets

This follows the same SOPS + KSOPS pattern used throughout the project. See Credential Management for the full SOPS workflow.

Commit and Deploy

Once all files are in place, commit and push to trigger Flux deployment:

git add flux/infra/base/falco.yaml \
        flux/infra/aws/falco/
git commit -m "feat(falco): add runtime security monitoring for AWS environment"
git push
git add flux/infra/base/falco.yaml \
        flux/infra/baremetal/falco/
git commit -m "feat(falco): add runtime security monitoring for bare metal environment"
git push
git add flux/infra/base/falco.yaml \
        flux/infra/baremetal/falco/
git commit -m "feat(falco): add runtime security monitoring for bare metal environment"
git push

Flux will detect the new commit and begin deploying Falco. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-falco -n flux-system --with-source

Verify

After Falco is deployed, confirm it is working:

# Check pods are running
kubectl -n falco get pods
# Expected: falco DaemonSet pods Running, falcosidekick Deployment Running
# Check Falco logs for successful eBPF probe load
kubectl -n falco logs -l app.kubernetes.io/name=falco | grep -i "ebpf\|loaded"
# Trigger a test alert — exec into any pod
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"

# Check Falco logs for the "Shell Spawned in Container" alert
kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"
# Verify Falcosidekick is forwarding alerts
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20
# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
kill %1
# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
kill %1

Flux Operations

This component is managed by Flux as HelmRelease falco and Kustomization infra-falco.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease falco -n flux-system
flux get kustomization infra-falco -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-falco -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease falco -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=falco -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease falco -n flux-system
flux resume helmrelease falco -n flux-system
flux reconcile kustomization infra-falco -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.


Tracee

Tracee uses eBPF to trace system calls, network events, and kernel functions at the OS level. It provides deeper visibility than Falco for forensic investigations — capturing not just that an event happened, but the full context around it (arguments, stack traces, file contents).

Install

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base
Field Value Explanation
chart tracee The Helm chart name from the Aqua Security registry
version 0.22.1 Pinned chart version — update this to upgrade Tracee
sourceRef.name aquasecurity References a HelmRepository CR pointing to https://aquasecurity.github.io/helm-charts
targetNamespace tracee Tracee runs in a dedicated namespace for isolation
dependsOn falco Tracee depends on Falco so Falcosidekick is available for alert forwarding
remediation.retries 3 Flux retries up to 3 times if the install or upgrade fails

eBPF on Talos

Tracee is pure eBPF and works well on Talos Linux without modification. The Talos kernel ships with full eBPF support enabled. No kernel headers or build tools are needed since Tracee uses CO-RE (Compile Once, Run Everywhere) with BTF.

Save the following as flux/infra/base/tracee.yaml:

flux/infra/base/tracee.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
  namespace: flux-system
spec:
  dependsOn:
    - name: falco
  targetNamespace: tracee
  interval: 30m
  chart:
    spec:
      chart: tracee
      version: "0.22.1"
      sourceRef:
        kind: HelmRepository
        name: aquasecurity
        namespace: flux-system
  releaseName: tracee
  install:
    createNamespace: true
    remediation:
      retries: 3
  upgrade:
    remediation:
      retries: 3
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    config:
      output:
        - webhook:http://falcosidekick.falco:2801  # Share Falco's alert pipeline
        - json:/tmp/tracee/events.json
      filter:
        event:
          - security_file_open
          - security_socket_connect
          - security_socket_bind
          - magic_write
          - mem_prot_alert
          - process_execute
          - sched_process_exec
          - anti_debugging
          - ptrace
          - security_bpf
        scope:
          - not container=tracee
          - not container=falco
      capture:
        write: true
        exec: true
        network: true
        dir: /tmp/tracee/captures
Alternative: Helm CLI

If you do not have Git access, install Tracee directly:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts
helm repo update
helm upgrade --install tracee aquasecurity/tracee \
  --namespace tracee \
  --create-namespace \
  --version 0.22.1 \
  -f values.yaml

Configuration

The environment patch overrides the base HelmRelease with cluster-specific settings. Select your environment and deployment size below.

Create the environment overlay directory:

mkdir -p flux/infra/aws/tracee
mkdir -p flux/infra/baremetal/tracee
mkdir -p flux/infra/baremetal/tracee

Environment Patch

The patch file adjusts resource limits and Talos-specific compatibility settings for the target environment.

Save the following as the patch file for your environment:

On AWS, the Tracee patch applies Talos Linux eBPF compatibility workarounds. The Helm chart hardcodes the containerd socket at /var/run which does not exist on Talos, and several eBPF filesystems must be mounted before Tracee starts.

flux/infra/aws/tracee/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false
  postRenderers:
    - kustomize:
        patches:
          - target:
              kind: DaemonSet
              name: tracee
            patch: |
              # Talos Linux eBPF compatibility patches
              #
              # Talos does not mount debugfs, tracefs, or bpffs by default
              # and sets kptr_restrict=2. The Helm chart also hardcodes the
              # containerd socket at /var/run which doesn't exist on Talos.
              - op: replace
                path: /spec/template/spec/containers/0/command
                value:
                  - /bin/sh
                  - -c
                  - |
                    set -e
                    echo "Preparing eBPF environment for Talos Linux..."
                    if ! mountpoint -q /sys/kernel/debug 2>/dev/null; then
                      mount -t debugfs debugfs /sys/kernel/debug 2>/dev/null || echo "WARN: Could not mount debugfs"
                    fi
                    if ! mountpoint -q /sys/kernel/tracing 2>/dev/null; then
                      mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null || echo "WARN: Could not mount tracefs"
                    fi
                    if ! mountpoint -q /sys/fs/bpf 2>/dev/null; then
                      mount -t bpf bpf /sys/fs/bpf 2>/dev/null || echo "WARN: Could not mount bpffs"
                    fi
                    echo 1 > /proc/sys/kernel/kptr_restrict 2>/dev/null || echo "WARN: Could not set kptr_restrict"
                    echo "eBPF environment ready. Starting Tracee..."
                    exec /tracee/tracee --config /tracee/config.yaml --server healthz --server http-address=:3366 --server metrics
              - op: replace
                path: /spec/template/spec/containers/0/args
                value: []
              - op: replace
                path: /spec/template/spec/volumes/2/hostPath/path
                value: /run/containerd/containerd.sock
              - op: add
                path: /spec/template/spec/containers/0/startupProbe
                value:
                  httpGet:
                    path: /healthz
                    port: 3366
                  initialDelaySeconds: 15
                  periodSeconds: 10
                  timeoutSeconds: 5
                  failureThreshold: 30
              - op: replace
                path: /spec/template/spec/containers/0/readinessProbe
                value:
                  httpGet:
                    path: /healthz
                    port: 3366
                  initialDelaySeconds: 30
                  periodSeconds: 10
                  timeoutSeconds: 5
                  failureThreshold: 6
Setting Value Why
postRenderers Kustomize patches Applies Talos-specific runtime fixes after Helm renders the manifests
command override Shell wrapper Mounts debugfs, tracefs, bpffs before starting Tracee on Talos
volumes[2].hostPath /run/containerd/containerd.sock Fixes hardcoded /var/run path — Talos uses /run
startupProbe 15s delay, 30 failures Allows extra time for eBPF program loading during startup
config.output.format json Structured output for log aggregation and forwarding

On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.

flux/infra/baremetal/tracee/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false
Setting Value Why
resources.limits.cpu 1 Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware
resources.limits.memory 1Gi Full forensic capture (write, exec, network) requires more memory than base
config.output.format json Structured output for log aggregation

On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.

flux/infra/baremetal/tracee/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: tracee
spec:
  chart:
    spec:
      version: "0.24.1"
  values:
    hostPID: true
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: "1"
        memory: 1Gi
    config:
      output:
        format: json
        options:
          parseArguments: true
          stackAddresses: false
          execEnv: false
          execHash: dev-inode
          sortEvents: false
Setting Value Why
resources.limits.cpu 1 Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware
resources.limits.memory 1Gi Full forensic capture (write, exec, network) requires more memory than base
config.output.format json Structured output for log aggregation

Helm Values

The values file controls Tracee's core event filters and capture settings. Select your environment and deployment size:

flux/infra/aws/tracee/values.yaml
# Tracee — AWS Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: "1"
    memory: 1Gi

config:
  output:
    format: json
    options:
      parseArguments: true
      stackAddresses: false
      execEnv: false
      execHash: dev-inode
      sortEvents: false
flux/infra/baremetal/tracee/values.yaml
# Tracee — Bare Metal HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

config:
  output:
    - webhook:http://falcosidekick.falco:2801
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - security_socket_bind
      - magic_write
      - mem_prot_alert
      - process_execute
      - sched_process_exec
      - anti_debugging
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  capture:
    write: true
    exec: true
    network: true
    dir: /tmp/tracee/captures
flux/infra/baremetal/tracee/values.yaml
# Tracee — Bare Metal Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

config:
  output:
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - process_execute
      - sched_process_exec
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  # Forensic capture disabled to save disk space
  capture:
    write: false
    exec: false
    network: false
flux/infra/baremetal/tracee/values.yaml
# Tracee — Bare Metal HA configuration

hostPID: true

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

config:
  output:
    - webhook:http://falcosidekick.falco:2801
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - security_socket_bind
      - magic_write
      - mem_prot_alert
      - process_execute
      - sched_process_exec
      - anti_debugging
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  capture:
    write: true
    exec: true
    network: true
    dir: /tmp/tracee/captures
flux/infra/baremetal/tracee/values.yaml
# Tracee — Bare Metal Non-HA configuration

hostPID: true

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 250m
    memory: 256Mi

config:
  output:
    - json:/tmp/tracee/events.json
  filter:
    event:
      - security_file_open
      - security_socket_connect
      - process_execute
      - sched_process_exec
      - ptrace
      - security_bpf
    scope:
      - not container=tracee
      - not container=falco
  # Forensic capture disabled to save disk space
  capture:
    write: false
    exec: false
    network: false

Key settings (all environments):

Setting HA Non-HA Why
capture.write/exec/network true false HA captures artefacts for forensics; non-HA saves disk space
config.output webhook + json json only HA forwards to Falcosidekick for unified alert pipeline
resources 100m/256Mi → 500m/512Mi 50m/128Mi → 250m/256Mi Resource scaling matches cluster capacity
filter.event Full set (10 events) Reduced set (6 events) HA monitors all critical events; non-HA focuses on highest-risk

Tracee Policies

Tracee supports custom policies using Rego or Go signatures. Deploy the RCIIS policy to detect sensitive operations.

Save the following as the policy manifest — this applies to all environments:

flux/infra/base/tracee-policy-rciis.yaml
apiVersion: tracee.aquasec.com/v1beta1
kind: Policy
metadata:
  name: rciis-sensitive-operations
  namespace: tracee
spec:
  scope:
    - global
  rules:
    # Detect fileless execution (process running from memory)
    - event: security_file_open
      filters:
        - data.pathname=/proc/self/mem
    # Detect attempts to disable security modules
    - event: security_bpf
    # Detect privilege escalation attempts
    - event: ptrace
      filters:
        - data.request=PTRACE_ATTACH
    # Detect container escape attempts
    - event: security_socket_bind
      filters:
        - data.addr.port=0

Extra Manifests

Save the following additional manifests for your environment:

No extra manifests required for AWS. The patch and values files are sufficient.

On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee exposes Prometheus metrics on port 3366 when started with --server metrics.

Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:

flux/infra/baremetal/tracee/servicemonitor-tracee.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tracee
  namespace: tracee
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: tracee
  endpoints:
    - port: metrics
      interval: 30s

Note

Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.

On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee exposes Prometheus metrics on port 3366 when started with --server metrics.

Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:

flux/infra/baremetal/tracee/servicemonitor-tracee.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: tracee
  namespace: tracee
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: tracee
  endpoints:
    - port: metrics
      interval: 30s

Note

Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.

Commit and Deploy

Once all files are in place, commit and push to trigger Flux deployment:

git add flux/infra/base/tracee.yaml \
        flux/infra/aws/tracee/
git commit -m "feat(tracee): add forensic monitoring for AWS environment"
git push
git add flux/infra/base/tracee.yaml \
        flux/infra/baremetal/tracee/
git commit -m "feat(tracee): add forensic monitoring for bare metal environment"
git push
git add flux/infra/base/tracee.yaml \
        flux/infra/baremetal/tracee/
git commit -m "feat(tracee): add forensic monitoring for bare metal environment"
git push

Flux will detect the new commit and begin deploying Tracee. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-tracee -n flux-system --with-source

Verify

After Tracee is deployed, confirm it is working:

# Check DaemonSet is running on all nodes
kubectl -n tracee get ds
# Verify eBPF programs are loaded
kubectl -n tracee logs -l app.kubernetes.io/name=tracee | grep -i "loaded\|ready"
# Check captured events (HA only — capture must be enabled)
kubectl -n tracee exec -it ds/tracee -- cat /tmp/tracee/events.json | head -5
# Verify Tracee health endpoint is responding
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/healthz
kill %1
# Verify Tracee metrics are exposed
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/metrics | grep tracee
kill %1
# Verify Tracee metrics are exposed
kubectl -n tracee port-forward ds/tracee 3366:3366 &
curl -s http://localhost:3366/metrics | grep tracee
kill %1

Flux Operations

This component is managed by Flux as HelmRelease tracee and Kustomization infra-tracee.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease tracee -n flux-system
flux get kustomization infra-tracee -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-tracee -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease tracee -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=tracee -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease tracee -n flux-system
flux resume helmrelease tracee -n flux-system
flux reconcile kustomization infra-tracee -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.


Using Falco and Tracee Together

When both tools are deployed, use them for different purposes:

Scenario Tool Why
Continuous monitoring and alerting Falco Mature alerting pipeline via Falcosidekick, broad rule coverage
Incident investigation and forensics Tracee Captures full event context, file contents, and network data
Compliance evidence Falco PolicyReport-compatible output, audit trail
Threat hunting Tracee Query historical events with rich filter expressions
Container escape detection Both Complementary detection signatures

Shared Alert Pipeline

Configure Tracee to forward events to Falcosidekick for a unified alert pipeline:

Tracee events  --webhook-->  Falcosidekick  ---->  Slack / PagerDuty / SIEM
Falco events   ----------->  Falcosidekick  ---->  Slack / PagerDuty / SIEM

This gives a single pane of glass for all runtime security events while preserving Tracee's deeper forensic data for on-demand investigation.

Verify Alert Pipeline

After deploying both Falco and Tracee with Falcosidekick, verify the full alert pipeline end-to-end:

Step 1 — Trigger a Falco alert:

# Exec into a test pod to trigger "Shell Spawned in Container"
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"

Step 2 — Confirm Falco detected the event:

kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"
# Expected: Alert line containing "Shell spawned in container"

Step 3 — Confirm Falcosidekick received and forwarded the alert:

# Check Falcosidekick logs
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20
# Expected: Log lines showing the alert was forwarded to configured outputs

# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
# Expected: Counter incremented for the configured output (slack, webhook, etc.)
kill %1

Step 4 — Confirm delivery at the destination:

  • Slack: Check the configured Slack channel for the alert message
  • Webhook: Check the endpoint's logs for the received payload
  • Kafka: Consume from the configured topic to verify the event arrived

Step 5 — Verify Tracee → Falcosidekick pipeline (HA configuration only):

If Tracee is configured to forward events to Falcosidekick via webhook:

# Check Tracee logs for webhook delivery
kubectl -n tracee logs -l app.kubernetes.io/name=tracee --tail=20 | grep -i "webhook\|output"
# Expected: Events being sent to http://falcosidekick.falco:2801

# Check Falcosidekick for Tracee-sourced events
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20 | grep -i "tracee"

Tip

If alerts are not arriving at the destination, check connectivity between namespaces. The default-deny NetworkPolicy generated by Kyverno may block Falcosidekick's outbound traffic. Create a NetworkPolicy allowing egress from the falco namespace to external webhook endpoints.


Next Steps

Proceed to troubleshooting guidance for security components:

5.2.5 Troubleshooting