5.2.4 Runtime Threat Detection¶
Runtime security detects anomalous behaviour inside running containers and on the host — things that static scanning and admission control cannot catch. This section covers two complementary tools: Falco and Tracee.
How to use this page
Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.
All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.
- Using the existing
rciis-devopsrepository: All files already exist. Skip themkdirandgit add/git commitcommands — they are for users building a new repository. Simply review the files, edit values for your environment, and push. - Building a new repository from scratch: Follow the
mkdir, file creation, andgitcommands in order. - No Git access: Expand the "Alternative: Helm CLI" block under each Install section.
Falco vs Tracee¶
Both tools monitor kernel-level events but differ in approach and strengths:
| Falco | Tracee | |
|---|---|---|
| Maintainer | Sysdig / CNCF (Graduated) | Aqua Security |
| Detection engine | Kernel module or eBPF probe | eBPF only |
| Talos compatibility | eBPF driver required (Talos cannot load out-of-tree kernel modules at runtime) | Native eBPF — excellent fit for Talos |
| Rule language | YAML-based Falco rules (condition/output/priority) | Rego policies + Go signatures |
| Rule ecosystem | Large community rule library, actively maintained | Growing library, strong container-focused defaults |
| Forensic capture | Events only (syscall fields, metadata) | Full event capture with optional artifact extraction |
| Alert forwarding | Falcosidekick (30+ output targets: Slack, PagerDuty, webhook, Kafka, OTEL) | Built-in webhook, OpenTelemetry export |
| Resource overhead | Low–moderate | Low (pure eBPF, no kernel module) |
| Best for | Broad runtime policy enforcement with mature alerting | Deep forensic investigation and incident response |
Recommendation
Deploy both tools. Falco provides broad detection coverage with a mature rule library and flexible alerting. Tracee adds forensic depth for incident investigations. If you must choose one, Falco is the safer default for teams without deep eBPF experience.
Falco¶
Falco monitors Linux syscalls and Kubernetes audit events in real time, matching them against rules that define suspicious behaviour. When a rule triggers, Falco generates an alert that can be routed to any number of destinations via Falcosidekick.
Install¶
The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).
Create the base directory and file:
| Field | Value | Explanation |
|---|---|---|
chart |
falco |
The Helm chart name from the Falco Security registry |
version |
4.25.1 |
Pinned chart version — update this to upgrade Falco |
sourceRef.name |
falcosecurity |
References a HelmRepository CR pointing to https://falcosecurity.github.io/charts |
targetNamespace |
falco |
Falco runs in a dedicated namespace for isolation |
driver.kind |
modern_ebpf |
Uses CO-RE eBPF — required for Talos Linux (see eBPF Driver Selection below) |
remediation.retries |
3 |
Flux retries up to 3 times if the install or upgrade fails |
Save the following as flux/infra/base/falco.yaml:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: falco
namespace: flux-system
spec:
targetNamespace: falco
interval: 30m
chart:
spec:
chart: falco
version: "4.25.1"
sourceRef:
kind: HelmRepository
name: falcosecurity
namespace: flux-system
releaseName: falco
install:
createNamespace: true
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
driver:
kind: modern_ebpf
tty: true
falcosidekick:
enabled: true
replicaCount: 2
config:
slack:
webhookurl: "" # Set via SOPS-encrypted secret
minimumpriority: warning
webhook:
address: "" # Optional: forward to incident management
minimumpriority: critical
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
customRules:
rciis-rules.yaml: |-
- rule: Shell Spawned in Container
desc: >
A shell (bash, sh, zsh) was spawned inside a container.
This is unexpected in production RCIIS workloads.
condition: >
spawned_process
and container
and proc.name in (bash, sh, zsh, ash, csh, ksh)
and not proc.pname in (healthcheck)
output: >
Shell spawned in container
(user=%user.name container=%container.name
image=%container.image.repository
shell=%proc.name parent=%proc.pname
cmdline=%proc.cmdline namespace=%k8s.ns.name
pod=%k8s.pod.name)
priority: WARNING
tags: [shell, container, rciis]
- rule: Read Sensitive File in Container
desc: >
A process inside a container read a sensitive file
(e.g., /etc/shadow, /etc/passwd, private keys).
condition: >
open_read
and container
and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
output: >
Sensitive file read in container
(user=%user.name file=%fd.name container=%container.name
image=%container.image.repository namespace=%k8s.ns.name
pod=%k8s.pod.name)
priority: WARNING
tags: [filesystem, sensitive, rciis]
- rule: Unexpected Outbound Connection from RCIIS
desc: >
An RCIIS application pod made an outbound network connection
to a destination not in the expected allow-list.
condition: >
outbound
and container
and k8s.ns.name = "rciis"
and not fd.sip in (rciis_allowed_destinations)
output: >
Unexpected outbound connection from RCIIS pod
(pod=%k8s.pod.name namespace=%k8s.ns.name
image=%container.image.repository
connection=%fd.name dest=%fd.sip:%fd.sport)
priority: CRITICAL
tags: [network, rciis, exfiltration]
Alternative: Helm CLI
If you do not have Git access, install Falco directly:
eBPF Driver Selection¶
| Driver | Kernel Requirement | Talos Support | Recommendation |
|---|---|---|---|
module |
Kernel headers + DKMS | Not supported — Talos has no kernel headers or compiler toolchain | Do not use |
ebpf |
BPF support, kernel headers at build time | Works — legacy option | Supported but superseded |
modern_ebpf |
Kernel 5.8+ with BTF (BPF Type Format) | Best fit — Talos ships kernel 6.x with BTF enabled | Recommended |
The modern_ebpf driver uses CO-RE (Compile Once, Run Everywhere) and requires no
kernel headers at runtime. Since Talos Linux ships a modern kernel (6.x) with BTF
support enabled, this is the optimal choice.
Talos eBPF Requirement
Talos Linux has an immutable root filesystem, enforces kernel module signing, and
ships no kernel headers or compiler toolchain. This means Falco's traditional
kernel module driver — which compiles a .ko against the running kernel at
runtime via DKMS — cannot work. Talos does support kernel modules, but only
those built into the system image or added as
system extensions
at image build time.
The practical result: set the Falco driver to modern_ebpf (recommended) or
ebpf. Do not use the module driver on Talos nodes.
Configuration¶
The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Falco behaves. Select your environment and deployment size below.
Create the environment overlay directory:
Environment Patch¶
The patch file sets Falcosidekick replica count and resource limits appropriate for the target environment.
Save the following as the patch file for your environment:
On AWS, Falco resources are constrained to reduce cost on shared clusters. Falcosidekick runs as a single replica to reduce overhead.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: falco
spec:
chart:
spec:
version: "8.0.1"
values:
falcosidekick:
replicaCount: 1
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
customRules:
rciis-rules.yaml: |-
- rule: Shell Spawned in Container
desc: A shell was spawned inside a container.
condition: >
spawned_process
and container
and proc.name in (bash, sh, zsh, ash, csh, ksh)
and not proc.pname in (healthcheck)
output: >
Shell spawned in container
(user=%user.name container=%container.name
image=%container.image.repository
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: WARNING
tags: [shell, container, rciis]
- rule: Read Sensitive File in Container
desc: A process inside a container read a sensitive file.
condition: >
open_read
and container
and fd.name pmatch (/etc/shadow, /etc/passwd, /etc/pki/*, /run/secrets/*)
output: >
Sensitive file read in container
(user=%user.name file=%fd.name container=%container.name
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: WARNING
tags: [filesystem, sensitive, rciis]
| Setting | Value | Why |
|---|---|---|
falcosidekick.replicaCount |
1 |
Single replica reduces cost on shared AWS clusters |
resources |
50m/128Mi → 250m/256Mi | Tighter constraints for cost optimization |
| Custom rules | Simplified | AWS patch uses a reduced rule set — full rules live in the base |
On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: falco
spec:
values:
# Base values are already optimized for bare metal HA:
# falcosidekick.replicaCount: 2
# resources: 100m/256Mi → 500m/512Mi
# Full custom rules with all three RCIIS detections
| Setting | Value | Why |
|---|---|---|
falcosidekick.replicaCount |
2 (from base) |
HA — two Falcosidekick replicas for alert delivery redundancy |
resources |
100m/256Mi → 500m/512Mi (from base) | Full resources for dedicated bare metal nodes |
On Bare Metal, Falco runs with the full HA configuration from the base. No resource constraints are overridden — the base values are already optimized for dedicated clusters.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: falco
spec:
values:
# Base values are already optimized for bare metal HA:
# falcosidekick.replicaCount: 2
# resources: 100m/256Mi → 500m/512Mi
# Full custom rules with all three RCIIS detections
| Setting | Value | Why |
|---|---|---|
falcosidekick.replicaCount |
2 (from base) |
HA — two Falcosidekick replicas for alert delivery redundancy |
resources |
100m/256Mi → 500m/512Mi (from base) | Full resources for dedicated bare metal nodes |
Helm Values¶
The values file controls Falco's core features. Select your environment and deployment size:
# Falco — Bare Metal HA configuration
driver:
kind: modern_ebpf
tty: true
falcosidekick:
enabled: true
replicaCount: 2
config:
slack:
webhookurl: ""
minimumpriority: warning
webhook:
address: ""
minimumpriority: critical
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Falco — Bare Metal HA configuration
driver:
kind: modern_ebpf
tty: true
falcosidekick:
enabled: true
replicaCount: 2
config:
slack:
webhookurl: ""
minimumpriority: warning
webhook:
address: ""
minimumpriority: critical
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Key settings (all environments):
| Setting | HA | Non-HA | Why |
|---|---|---|---|
driver.kind |
modern_ebpf |
modern_ebpf |
Talos-specific — CO-RE eBPF, no kernel headers needed at runtime |
falcosidekick.replicaCount |
2 |
1 |
HA runs two replicas for alert delivery redundancy |
resources |
100m/256Mi → 500m/512Mi | 50m/128Mi → 250m/256Mi | Resource scaling matches cluster capacity |
Extra Manifests¶
Save the following additional manifests for your environment:
No extra manifests required for AWS. The patch and values files are sufficient.
On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.
ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: falcosidekick
namespace: falco
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: falcosidekick
endpoints:
- port: http
interval: 30s
path: /metrics
PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: falco-alerts
namespace: falco
labels:
release: prometheus
spec:
groups:
- name: falco
rules:
- alert: FalcoCriticalAlert
expr: |
increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Falco CRITICAL priority event detected"
description: >
A CRITICAL priority Falco event was detected in the last 5 minutes.
This may indicate a container escape, privilege escalation, or
data exfiltration attempt. Investigate immediately.
runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"
- alert: FalcoPodDown
expr: |
kube_daemonset_status_number_unavailable{
namespace="falco",
daemonset="falco"
} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Falco DaemonSet not running on all nodes"
description: >
{{ $value }} node(s) do not have a running Falco pod.
Runtime security monitoring is incomplete.
runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"
- alert: FalcoHighAlertRate
expr: |
sum(rate(falcosidekick_outputs_total[5m])) > 100
for: 10m
labels:
severity: warning
annotations:
summary: "Falco alert rate exceeds 100/min"
description: >
Falco is generating more than 100 alerts per minute for over 10
minutes. This may indicate a false positive storm from a noisy
rule. Review Falco rules and tune conditions.
Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-falco
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
falco-dashboard.json: |-
{ "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }
Tip
Import the Falcosidekick dashboard
(ID: 17514) from Grafana.com for a pre-built visualization of Falco event
rates by priority and output target.
On Bare Metal, deploy a ServiceMonitor, PrometheusRule, and Grafana dashboard ConfigMap for Falco alerting and visibility.
ServiceMonitor — save as flux/infra/baremetal/falco/servicemonitor-falcosidekick.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: falcosidekick
namespace: falco
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: falcosidekick
endpoints:
- port: http
interval: 30s
path: /metrics
PrometheusRule — save as flux/infra/baremetal/falco/prometheus-rule-falco.yaml:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: falco-alerts
namespace: falco
labels:
release: prometheus
spec:
groups:
- name: falco
rules:
- alert: FalcoCriticalAlert
expr: |
increase(falcosidekick_outputs_total{priority="critical"}[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Falco CRITICAL priority event detected"
description: >
A CRITICAL priority Falco event was detected in the last 5 minutes.
This may indicate a container escape, privilege escalation, or
data exfiltration attempt. Investigate immediately.
runbook_url: "https://docs.rciis.eac.int/08-operations/incident-response/"
- alert: FalcoPodDown
expr: |
kube_daemonset_status_number_unavailable{
namespace="falco",
daemonset="falco"
} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Falco DaemonSet not running on all nodes"
description: >
{{ $value }} node(s) do not have a running Falco pod.
Runtime security monitoring is incomplete.
runbook_url: "https://docs.rciis.eac.int/05-secure/troubleshooting/#falco"
- alert: FalcoHighAlertRate
expr: |
sum(rate(falcosidekick_outputs_total[5m])) > 100
for: 10m
labels:
severity: warning
annotations:
summary: "Falco alert rate exceeds 100/min"
description: >
Falco is generating more than 100 alerts per minute for over 10
minutes. This may indicate a false positive storm from a noisy
rule. Review Falco rules and tune conditions.
Grafana Dashboard — save as flux/infra/baremetal/falco/grafana-dashboard-falco.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-falco
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
falco-dashboard.json: |-
{ "annotations": { "list": [] }, "title": "Falco - Runtime Security Events", "uid": "falco-events" }
Tip
Import the Falcosidekick dashboard
(ID: 17514) from Grafana.com for a pre-built visualization of Falco event
rates by priority and output target.
Falcosidekick Alert Routing¶
Falcosidekick supports 30+ output destinations. Common configurations for RCIIS:
| Destination | Use Case | Priority Filter |
|---|---|---|
| Slack | Team notifications | >= WARNING |
| PagerDuty | On-call escalation | >= CRITICAL |
| Webhook | SIEM / incident management integration | >= NOTICE |
| Kafka | Event streaming for analytics | >= INFORMATIONAL |
| Prometheus (metrics) | Dashboard and SLO tracking | All |
Falcosidekick Secret Management¶
Webhook URLs and API keys for alert destinations must not be stored in plain text in the Helm values file. Use a Kubernetes Secret referenced by Falcosidekick.
Create a SOPS-encrypted secret:
apiVersion: v1
kind: Secret
metadata:
name: falcosidekick-secrets
namespace: falco
stringData:
SLACK_WEBHOOKURL: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
PAGERDUTY_ROUTINGKEY: "your-pagerduty-routing-key"
Encrypt with SOPS:
Reference the secret in the Helm values:
This follows the same SOPS + KSOPS pattern used throughout the project. See Credential Management for the full SOPS workflow.
Commit and Deploy¶
Once all files are in place, commit and push to trigger Flux deployment:
Flux will detect the new commit and begin deploying Falco. To trigger an immediate sync instead of waiting for the next poll interval:
Verify¶
After Falco is deployed, confirm it is working:
# Check pods are running
kubectl -n falco get pods
# Expected: falco DaemonSet pods Running, falcosidekick Deployment Running
# Check Falco logs for successful eBPF probe load
kubectl -n falco logs -l app.kubernetes.io/name=falco | grep -i "ebpf\|loaded"
# Trigger a test alert — exec into any pod
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"
# Check Falco logs for the "Shell Spawned in Container" alert
kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"
Flux Operations¶
This component is managed by Flux as HelmRelease falco and Kustomization infra-falco.
Check whether the HelmRelease and Kustomization are in a Ready state:
Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:
Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:
View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:
Recovering a stalled HelmRelease
If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry
automatically. Suspend and resume to clear the failure counter, then reconcile:
flux suspend helmrelease falco -n flux-system
flux resume helmrelease falco -n flux-system
flux reconcile kustomization infra-falco -n flux-system
Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.
Tracee¶
Tracee uses eBPF to trace system calls, network events, and kernel functions at the OS level. It provides deeper visibility than Falco for forensic investigations — capturing not just that an event happened, but the full context around it (arguments, stack traces, file contents).
Install¶
The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).
Create the base directory and file:
| Field | Value | Explanation |
|---|---|---|
chart |
tracee |
The Helm chart name from the Aqua Security registry |
version |
0.22.1 |
Pinned chart version — update this to upgrade Tracee |
sourceRef.name |
aquasecurity |
References a HelmRepository CR pointing to https://aquasecurity.github.io/helm-charts |
targetNamespace |
tracee |
Tracee runs in a dedicated namespace for isolation |
dependsOn |
falco |
Tracee depends on Falco so Falcosidekick is available for alert forwarding |
remediation.retries |
3 |
Flux retries up to 3 times if the install or upgrade fails |
eBPF on Talos
Tracee is pure eBPF and works well on Talos Linux without modification. The Talos kernel ships with full eBPF support enabled. No kernel headers or build tools are needed since Tracee uses CO-RE (Compile Once, Run Everywhere) with BTF.
Save the following as flux/infra/base/tracee.yaml:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: tracee
namespace: flux-system
spec:
dependsOn:
- name: falco
targetNamespace: tracee
interval: 30m
chart:
spec:
chart: tracee
version: "0.22.1"
sourceRef:
kind: HelmRepository
name: aquasecurity
namespace: flux-system
releaseName: tracee
install:
createNamespace: true
remediation:
retries: 3
upgrade:
remediation:
retries: 3
values:
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
config:
output:
- webhook:http://falcosidekick.falco:2801 # Share Falco's alert pipeline
- json:/tmp/tracee/events.json
filter:
event:
- security_file_open
- security_socket_connect
- security_socket_bind
- magic_write
- mem_prot_alert
- process_execute
- sched_process_exec
- anti_debugging
- ptrace
- security_bpf
scope:
- not container=tracee
- not container=falco
capture:
write: true
exec: true
network: true
dir: /tmp/tracee/captures
Alternative: Helm CLI
If you do not have Git access, install Tracee directly:
Configuration¶
The environment patch overrides the base HelmRelease with cluster-specific settings. Select your environment and deployment size below.
Create the environment overlay directory:
Environment Patch¶
The patch file adjusts resource limits and Talos-specific compatibility settings for the target environment.
Save the following as the patch file for your environment:
On AWS, the Tracee patch applies Talos Linux eBPF compatibility workarounds. The
Helm chart hardcodes the containerd socket at /var/run which does not exist on
Talos, and several eBPF filesystems must be mounted before Tracee starts.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: tracee
spec:
chart:
spec:
version: "0.24.1"
values:
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
config:
output:
format: json
options:
parseArguments: true
stackAddresses: false
execEnv: false
execHash: dev-inode
sortEvents: false
postRenderers:
- kustomize:
patches:
- target:
kind: DaemonSet
name: tracee
patch: |
# Talos Linux eBPF compatibility patches
#
# Talos does not mount debugfs, tracefs, or bpffs by default
# and sets kptr_restrict=2. The Helm chart also hardcodes the
# containerd socket at /var/run which doesn't exist on Talos.
- op: replace
path: /spec/template/spec/containers/0/command
value:
- /bin/sh
- -c
- |
set -e
echo "Preparing eBPF environment for Talos Linux..."
if ! mountpoint -q /sys/kernel/debug 2>/dev/null; then
mount -t debugfs debugfs /sys/kernel/debug 2>/dev/null || echo "WARN: Could not mount debugfs"
fi
if ! mountpoint -q /sys/kernel/tracing 2>/dev/null; then
mount -t tracefs tracefs /sys/kernel/tracing 2>/dev/null || echo "WARN: Could not mount tracefs"
fi
if ! mountpoint -q /sys/fs/bpf 2>/dev/null; then
mount -t bpf bpf /sys/fs/bpf 2>/dev/null || echo "WARN: Could not mount bpffs"
fi
echo 1 > /proc/sys/kernel/kptr_restrict 2>/dev/null || echo "WARN: Could not set kptr_restrict"
echo "eBPF environment ready. Starting Tracee..."
exec /tracee/tracee --config /tracee/config.yaml --server healthz --server http-address=:3366 --server metrics
- op: replace
path: /spec/template/spec/containers/0/args
value: []
- op: replace
path: /spec/template/spec/volumes/2/hostPath/path
value: /run/containerd/containerd.sock
- op: add
path: /spec/template/spec/containers/0/startupProbe
value:
httpGet:
path: /healthz
port: 3366
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
- op: replace
path: /spec/template/spec/containers/0/readinessProbe
value:
httpGet:
path: /healthz
port: 3366
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6
| Setting | Value | Why |
|---|---|---|
postRenderers |
Kustomize patches | Applies Talos-specific runtime fixes after Helm renders the manifests |
command override |
Shell wrapper | Mounts debugfs, tracefs, bpffs before starting Tracee on Talos |
volumes[2].hostPath |
/run/containerd/containerd.sock |
Fixes hardcoded /var/run path — Talos uses /run |
startupProbe |
15s delay, 30 failures | Allows extra time for eBPF program loading during startup |
config.output.format |
json |
Structured output for log aggregation and forwarding |
On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: tracee
spec:
chart:
spec:
version: "0.24.1"
values:
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
config:
output:
format: json
options:
parseArguments: true
stackAddresses: false
execEnv: false
execHash: dev-inode
sortEvents: false
| Setting | Value | Why |
|---|---|---|
resources.limits.cpu |
1 |
Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware |
resources.limits.memory |
1Gi |
Full forensic capture (write, exec, network) requires more memory than base |
config.output.format |
json |
Structured output for log aggregation |
On Bare Metal, the same Talos compatibility patches apply. The patch also pins the chart to the same version used in the AWS overlay for consistency.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: tracee
spec:
chart:
spec:
version: "0.24.1"
values:
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
config:
output:
format: json
options:
parseArguments: true
stackAddresses: false
execEnv: false
execHash: dev-inode
sortEvents: false
| Setting | Value | Why |
|---|---|---|
resources.limits.cpu |
1 |
Tracee forensic capture is CPU-intensive; 1 full core allowed on dedicated hardware |
resources.limits.memory |
1Gi |
Full forensic capture (write, exec, network) requires more memory than base |
config.output.format |
json |
Structured output for log aggregation |
Helm Values¶
The values file controls Tracee's core event filters and capture settings. Select your environment and deployment size:
# Tracee — Bare Metal HA configuration
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
config:
output:
- webhook:http://falcosidekick.falco:2801
- json:/tmp/tracee/events.json
filter:
event:
- security_file_open
- security_socket_connect
- security_socket_bind
- magic_write
- mem_prot_alert
- process_execute
- sched_process_exec
- anti_debugging
- ptrace
- security_bpf
scope:
- not container=tracee
- not container=falco
capture:
write: true
exec: true
network: true
dir: /tmp/tracee/captures
# Tracee — Bare Metal Non-HA configuration
hostPID: true
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
config:
output:
- json:/tmp/tracee/events.json
filter:
event:
- security_file_open
- security_socket_connect
- process_execute
- sched_process_exec
- ptrace
- security_bpf
scope:
- not container=tracee
- not container=falco
# Forensic capture disabled to save disk space
capture:
write: false
exec: false
network: false
# Tracee — Bare Metal HA configuration
hostPID: true
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
config:
output:
- webhook:http://falcosidekick.falco:2801
- json:/tmp/tracee/events.json
filter:
event:
- security_file_open
- security_socket_connect
- security_socket_bind
- magic_write
- mem_prot_alert
- process_execute
- sched_process_exec
- anti_debugging
- ptrace
- security_bpf
scope:
- not container=tracee
- not container=falco
capture:
write: true
exec: true
network: true
dir: /tmp/tracee/captures
# Tracee — Bare Metal Non-HA configuration
hostPID: true
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 250m
memory: 256Mi
config:
output:
- json:/tmp/tracee/events.json
filter:
event:
- security_file_open
- security_socket_connect
- process_execute
- sched_process_exec
- ptrace
- security_bpf
scope:
- not container=tracee
- not container=falco
# Forensic capture disabled to save disk space
capture:
write: false
exec: false
network: false
Key settings (all environments):
| Setting | HA | Non-HA | Why |
|---|---|---|---|
capture.write/exec/network |
true |
false |
HA captures artefacts for forensics; non-HA saves disk space |
config.output |
webhook + json | json only | HA forwards to Falcosidekick for unified alert pipeline |
resources |
100m/256Mi → 500m/512Mi | 50m/128Mi → 250m/256Mi | Resource scaling matches cluster capacity |
filter.event |
Full set (10 events) | Reduced set (6 events) | HA monitors all critical events; non-HA focuses on highest-risk |
Tracee Policies¶
Tracee supports custom policies using Rego or Go signatures. Deploy the RCIIS policy to detect sensitive operations.
Save the following as the policy manifest — this applies to all environments:
apiVersion: tracee.aquasec.com/v1beta1
kind: Policy
metadata:
name: rciis-sensitive-operations
namespace: tracee
spec:
scope:
- global
rules:
# Detect fileless execution (process running from memory)
- event: security_file_open
filters:
- data.pathname=/proc/self/mem
# Detect attempts to disable security modules
- event: security_bpf
# Detect privilege escalation attempts
- event: ptrace
filters:
- data.request=PTRACE_ATTACH
# Detect container escape attempts
- event: security_socket_bind
filters:
- data.addr.port=0
Extra Manifests¶
Save the following additional manifests for your environment:
No extra manifests required for AWS. The patch and values files are sufficient.
On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee
exposes Prometheus metrics on port 3366 when started with --server metrics.
Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tracee
namespace: tracee
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: tracee
endpoints:
- port: metrics
interval: 30s
Note
Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.
On Bare Metal, deploy a ServiceMonitor for Tracee metrics scraping. Tracee
exposes Prometheus metrics on port 3366 when started with --server metrics.
Save as flux/infra/baremetal/tracee/servicemonitor-tracee.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tracee
namespace: tracee
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: tracee
endpoints:
- port: metrics
interval: 30s
Note
Tracee metrics support depends on the chart version. Verify the Tracee Helm chart exposes a metrics port before deploying this ServiceMonitor.
Commit and Deploy¶
Once all files are in place, commit and push to trigger Flux deployment:
Flux will detect the new commit and begin deploying Tracee. To trigger an immediate sync instead of waiting for the next poll interval:
Verify¶
After Tracee is deployed, confirm it is working:
# Verify eBPF programs are loaded
kubectl -n tracee logs -l app.kubernetes.io/name=tracee | grep -i "loaded\|ready"
# Check captured events (HA only — capture must be enabled)
kubectl -n tracee exec -it ds/tracee -- cat /tmp/tracee/events.json | head -5
Flux Operations¶
This component is managed by Flux as HelmRelease tracee and Kustomization infra-tracee.
Check whether the HelmRelease and Kustomization are in a Ready state:
Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:
Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:
View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:
Recovering a stalled HelmRelease
If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry
automatically. Suspend and resume to clear the failure counter, then reconcile:
flux suspend helmrelease tracee -n flux-system
flux resume helmrelease tracee -n flux-system
flux reconcile kustomization infra-tracee -n flux-system
Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved.
Using Falco and Tracee Together¶
When both tools are deployed, use them for different purposes:
| Scenario | Tool | Why |
|---|---|---|
| Continuous monitoring and alerting | Falco | Mature alerting pipeline via Falcosidekick, broad rule coverage |
| Incident investigation and forensics | Tracee | Captures full event context, file contents, and network data |
| Compliance evidence | Falco | PolicyReport-compatible output, audit trail |
| Threat hunting | Tracee | Query historical events with rich filter expressions |
| Container escape detection | Both | Complementary detection signatures |
Shared Alert Pipeline¶
Configure Tracee to forward events to Falcosidekick for a unified alert pipeline:
Tracee events --webhook--> Falcosidekick ----> Slack / PagerDuty / SIEM
Falco events -----------> Falcosidekick ----> Slack / PagerDuty / SIEM
This gives a single pane of glass for all runtime security events while preserving Tracee's deeper forensic data for on-demand investigation.
Verify Alert Pipeline¶
After deploying both Falco and Tracee with Falcosidekick, verify the full alert pipeline end-to-end:
Step 1 — Trigger a Falco alert:
# Exec into a test pod to trigger "Shell Spawned in Container"
kubectl run alert-test --image=busybox --rm -it --restart=Never -- /bin/sh -c "echo test && exit"
Step 2 — Confirm Falco detected the event:
kubectl -n falco logs -l app.kubernetes.io/name=falco --tail=20 | grep -i "shell"
# Expected: Alert line containing "Shell spawned in container"
Step 3 — Confirm Falcosidekick received and forwarded the alert:
# Check Falcosidekick logs
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20
# Expected: Log lines showing the alert was forwarded to configured outputs
# Check Falcosidekick metrics
kubectl -n falco port-forward svc/falco-falcosidekick 2801:2801 &
curl -s http://localhost:2801/metrics | grep falcosidekick_outputs_total
# Expected: Counter incremented for the configured output (slack, webhook, etc.)
kill %1
Step 4 — Confirm delivery at the destination:
- Slack: Check the configured Slack channel for the alert message
- Webhook: Check the endpoint's logs for the received payload
- Kafka: Consume from the configured topic to verify the event arrived
Step 5 — Verify Tracee → Falcosidekick pipeline (HA configuration only):
If Tracee is configured to forward events to Falcosidekick via webhook:
# Check Tracee logs for webhook delivery
kubectl -n tracee logs -l app.kubernetes.io/name=tracee --tail=20 | grep -i "webhook\|output"
# Expected: Events being sent to http://falcosidekick.falco:2801
# Check Falcosidekick for Tracee-sourced events
kubectl -n falco logs -l app.kubernetes.io/name=falcosidekick --tail=20 | grep -i "tracee"
Tip
If alerts are not arriving at the destination, check connectivity between
namespaces. The default-deny NetworkPolicy generated by Kyverno may block
Falcosidekick's outbound traffic. Create a NetworkPolicy allowing egress from
the falco namespace to external webhook endpoints.
Next Steps¶
Proceed to troubleshooting guidance for security components: