Skip to content

5.3.3 Backup & Scheduling

The backup strategy combines two complementary approaches: Velero for Kubernetes resource and PVC-level backups using CSI volume snapshots, and CloudNativePG continuous backup for PostgreSQL point-in-time recovery (PITR) using WAL archiving to S3-compatible storage. The Descheduler continuously rebalances pod placement to maintain even resource distribution across nodes.

How to use this page

Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.

All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.

  • Using the existing rciis-devops repository: All files already exist. Skip the mkdir and git add/git commit commands — they are for users building a new repository. Simply review the files, edit values for your environment, and push.
  • Building a new repository from scratch: Follow the mkdir, file creation, and git commands in order.
  • No Git access: Expand the "Alternative: Helm CLI" block under each Install section.

Velero

Velero backs up Kubernetes resources and persistent volumes. It uses CSI snapshots for volume backups and stores backup metadata in a cloud object store. On AWS, this is AWS S3. On Bare Metal, it is the in-cluster Ceph Object Store (S3-compatible via RGW). This enables both scheduled backups and on-demand disaster recovery.

Install

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).

Create the base directory and file:

mkdir -p flux/infra/base
Field Value Explanation
chart velero The Helm chart name from the VMware Tanzu registry
version 11.3.2 Pinned chart version — update this to upgrade Velero
sourceRef.name vmware-tanzu References a HelmRepository CR pointing to the VMware Tanzu Helm repository
targetNamespace velero Velero is installed in its own namespace
crds: CreateReplace Automatically installs and updates Velero CRDs
remediation.retries 3 Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/velero.yaml:

flux/infra/base/velero.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: velero
  namespace: flux-system
spec:
  targetNamespace: velero
  interval: 30m
  chart:
    spec:
      chart: velero
      version: "11.3.2"
      sourceRef:
        kind: HelmRepository
        name: vmware-tanzu
        namespace: flux-system
  releaseName: velero
  install:
    createNamespace: true
    crds: CreateReplace
    remediation:
      retries: 3
  upgrade:
    crds: CreateReplace
    remediation:
      retries: 3
  values:
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi
    initContainers:
      - name: velero-plugin-for-aws
        image: velero/velero-plugin-for-aws:v1.13.0
        volumeMounts:
          - mountPath: /target
            name: plugins
    configuration:
      features: EnableCSI
      volumeSnapshotLocation: []
    credentials:
      useSecret: true
      existingSecret: velero-s3-credentials
    deployNodeAgent: false
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
        additionalLabels:
          release: prometheus
    schedules: {}
    kubectl:
      image:
        repository: public.ecr.aws/bitnami/kubectl
Alternative: Helm CLI

If you do not have Git access, install Velero directly:

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update
helm upgrade --install velero vmware-tanzu/velero \
  --namespace velero \
  --create-namespace \
  --version 11.3.2 \
  -f values.yaml

Configuration

The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls where backups are stored and how Velero behaves. Select your environment below.

Create the environment overlay directory:

mkdir -p flux/infra/aws/velero
mkdir -p flux/infra/baremetal/velero
mkdir -p flux/infra/baremetal/velero

Environment Patch

The patch file sets the backup storage location. This differs fundamentally between AWS and Bare Metal.

Save the following as the patch file for your environment:

On AWS, Velero stores backup metadata directly in AWS S3. The AWS plugin uses native S3 endpoints — no s3Url or s3ForcePathStyle is needed.

flux/infra/aws/velero/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: velero
spec:
  values:
    resources:
      requests:
        cpu: 25m
        memory: 64Mi
      limits:
        cpu: 250m
        memory: 256Mi
    configuration:
      backupStorageLocation:
        - name: default
          provider: aws
          bucket: rciis-aws-velero-backups
          config:
            region: af-south-1
Setting Value Why
bucket rciis-aws-velero-backups AWS S3 bucket for backup storage
region af-south-1 AWS region where the bucket is located
Resource limits (reduced) CPU 25m, RAM 64Mi AWS deployments use less resources than HA bare metal

On Bare Metal, Velero stores backup metadata in the in-cluster Ceph Object Store (RGW). The patch configures S3 compatibility settings for Ceph RGW.

flux/infra/baremetal/velero/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: velero
spec:
  values:
    configuration:
      backupStorageLocation:
        - name: default
          provider: aws
          bucket: velero-backups
          config:
            region: rciis-kenya
            s3ForcePathStyle: true
            s3Url: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
Setting Value Why
bucket velero-backups Ceph RGW bucket for backup storage
region rciis-kenya Region identifier for Ceph RGW (arbitrary)
s3ForcePathStyle true Uses path-style S3 URLs (required for Ceph RGW)
s3Url http://rook-ceph-rgw-... Ceph RGW endpoint within the cluster

On Bare Metal, Velero stores backup metadata in the in-cluster Ceph Object Store (RGW). The patch configures S3 compatibility settings for Ceph RGW.

flux/infra/baremetal/velero/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: velero
spec:
  values:
    configuration:
      backupStorageLocation:
        - name: default
          provider: aws
          bucket: velero-backups
          config:
            region: rciis-kenya
            s3ForcePathStyle: true
            s3Url: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
Setting Value Why
bucket velero-backups Ceph RGW bucket for backup storage
region rciis-kenya Region identifier for Ceph RGW (arbitrary)
s3ForcePathStyle true Uses path-style S3 URLs (required for Ceph RGW)
s3Url http://rook-ceph-rgw-... Ceph RGW endpoint within the cluster

Helm Values

The values file controls Velero's backup schedules and feature flags. Save the following as the values file for your environment:

flux/infra/aws/velero/values.yaml
# Velero — AWS HA configuration
# Automated backup schedules, CSI snapshots, S3 backend

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 65534
  runAsGroup: 65534
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    additionalLabels:
      release: prometheus

# Automated backup schedules
schedules:
  # Daily namespace backup — retains 30 days
  daily-namespaces:
    disabled: false
    schedule: "0 2 * * *"   # 02:00 UTC daily
    useOwnerReferencesInBackup: false
    template:
      ttl: "720h"           # 30 days
      storageLocation: default
      includedNamespaces:
        - rciis-aws
        - monitoring
        - strimzi-operator
        - cnpg-system
      snapshotMoveData: false

  # Weekly full-cluster backup — retains 90 days
  weekly-full:
    disabled: false
    schedule: "0 3 * * 0"   # 03:00 UTC Sunday
    useOwnerReferencesInBackup: false
    template:
      ttl: "2160h"          # 90 days
      storageLocation: default
      includeClusterResources: true
      snapshotMoveData: false
flux/infra/aws/velero/values.yaml
# Velero — AWS Non-HA configuration
# No automated schedules, reduced resources, on-demand backups only

metrics:
  enabled: true
  serviceMonitor:
    enabled: false

# No automated schedules — create on-demand backups as needed
schedules: {}
flux/infra/baremetal/velero/values.yaml
# Velero — Bare Metal HA configuration
# Automated backup schedules, CSI snapshots, Ceph RGW backend

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 65534
  runAsGroup: 65534
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    additionalLabels:
      release: prometheus

# Automated backup schedules
schedules:
  # Daily namespace backup — retains 30 days
  daily-namespaces:
    disabled: false
    schedule: "0 2 * * *"   # 02:00 UTC daily
    useOwnerReferencesInBackup: false
    template:
      ttl: "720h"           # 30 days
      storageLocation: default
      includedNamespaces:
        - rciis-kenya
        - monitoring
        - strimzi-operator
        - cnpg-system
      snapshotMoveData: false

  # Weekly full-cluster backup — retains 90 days
  weekly-full:
    disabled: false
    schedule: "0 3 * * 0"   # 03:00 UTC Sunday
    useOwnerReferencesInBackup: false
    template:
      ttl: "2160h"          # 90 days
      storageLocation: default
      includeClusterResources: true
      snapshotMoveData: false
flux/infra/baremetal/velero/values.yaml
# Velero — Bare Metal Non-HA configuration
# No automated schedules, reduced resources, on-demand backups only

metrics:
  enabled: true
  serviceMonitor:
    enabled: false

# No automated schedules — create on-demand backups as needed
schedules: {}
flux/infra/baremetal/velero/values.yaml
# Velero — Bare Metal HA configuration
# Automated backup schedules, CSI snapshots, Ceph RGW backend

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 65534
  runAsGroup: 65534
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
    additionalLabels:
      release: prometheus

# Automated backup schedules
schedules:
  # Daily namespace backup — retains 30 days
  daily-namespaces:
    disabled: false
    schedule: "0 2 * * *"   # 02:00 UTC daily
    useOwnerReferencesInBackup: false
    template:
      ttl: "720h"           # 30 days
      storageLocation: default
      includedNamespaces:
        - rciis-kenya
        - monitoring
        - strimzi-operator
        - cnpg-system
      snapshotMoveData: false

  # Weekly full-cluster backup — retains 90 days
  weekly-full:
    disabled: false
    schedule: "0 3 * * 0"   # 03:00 UTC Sunday
    useOwnerReferencesInBackup: false
    template:
      ttl: "2160h"          # 90 days
      storageLocation: default
      includeClusterResources: true
      snapshotMoveData: false
flux/infra/baremetal/velero/values.yaml
# Velero — Bare Metal Non-HA configuration
# No automated schedules, reduced resources, on-demand backups only

metrics:
  enabled: true
  serviceMonitor:
    enabled: false

# No automated schedules — create on-demand backups as needed
schedules: {}

Key settings (all environments):

Setting HA Non-HA Why
schedules.* Daily + weekly Empty {} Automated schedules provide continuous protection vs on-demand only
metrics.serviceMonitor.enabled true false HA exports Velero metrics to Prometheus for monitoring
podSecurityContext Strict (65534:65534) Inherited from base HA enforces non-root execution for security
EnableCSI Enabled in base Enabled in base CSI snapshots required for PVC-level backups

Commit and Deploy

Once all files are in place, commit and push to trigger Flux deployment:

git add flux/infra/base/velero.yaml \
        flux/infra/aws/velero/
git commit -m "feat(velero): add Velero backup for AWS environment"
git push
git add flux/infra/base/velero.yaml \
        flux/infra/baremetal/velero/
git commit -m "feat(velero): add Velero backup for bare metal environment"
git push
git add flux/infra/base/velero.yaml \
        flux/infra/baremetal/velero/
git commit -m "feat(velero): add Velero backup for bare metal environment"
git push

Flux will detect the new commit and begin deploying Velero. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-velero -n flux-system --with-source

Extra Manifests - Ceph S3 User

Bare Metal only

This manifest is only required when using Ceph RGW as the backup storage backend. AWS deployments use IAM credentials instead.

Velero needs an S3 user in Ceph to access the backup bucket:

flux/infra/baremetal/velero/velero-s3-user.yaml
apiVersion: ceph.rook.io/v1
kind: CephObjectStoreUser
metadata:
  name: velero
  namespace: velero
spec:
  store: ceph-objectstore
  clusterNamespace: rook-ceph
  displayName: "Velero Backup User"
  capabilities:
    user: "*"
    bucket: "*"

CSI-only backup strategy

With deployNodeAgent: false, only PVCs backed by CSI-compatible storage classes (Ceph RBD) are snapshotted. Ensure all critical workloads use ceph-rbd or ceph-rbd-single storage classes.

flux/infra/baremetal/velero/velero-s3-user.yaml
apiVersion: ceph.rook.io/v1
kind: CephObjectStoreUser
metadata:
  name: velero
  namespace: velero
spec:
  store: ceph-objectstore
  clusterNamespace: rook-ceph
  displayName: "Velero Backup User"
  capabilities:
    user: "*"
    bucket: "*"

CSI-only backup strategy

With deployNodeAgent: false, only PVCs backed by CSI-compatible storage classes (Ceph RBD) are snapshotted. Ensure all critical workloads use ceph-rbd or ceph-rbd-single storage classes.

Verify

# Check Velero is running
kubectl get pods -n velero

# Verify backup storage location
velero backup-location get

# Create a test backup
velero backup create test-backup --include-namespaces default --wait

# Check backup status
velero backup describe test-backup

# Clean up test backup
velero backup delete test-backup --confirm

Flux Operations

This component is managed by Flux as HelmRelease velero and Kustomization infra-velero.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease velero -n flux-system
flux get kustomization infra-velero -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-velero -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease velero -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=velero -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease velero -n flux-system
flux resume helmrelease velero -n flux-system
flux reconcile kustomization infra-velero -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.

Next: Continue to CloudNativePG Backups below.


CloudNativePG Backups

CloudNativePG provides continuous backup at the PostgreSQL level using Barman. This is independent of Velero — while Velero backs up Kubernetes resources and PVCs as CSI snapshots, CNPG archives the PostgreSQL Write-Ahead Log (WAL) stream and performs periodic base backups directly to S3-compatible storage.

This enables point-in-time recovery (PITR) for all PostgreSQL databases managed by the CNPG operator (Grafana, Keycloak, application databases).

Operator vs Cluster backups

The CNPG operator (installed in Data Services) does not configure backups itself. Backups are configured per Cluster CR in each application namespace. The examples below show the backup stanza to add to any CNPG Cluster.

Cluster Backup Configuration

Add the backup stanza to any CNPG Cluster CR to enable continuous WAL archiving and base backups. The S3 destination depends on your deployment model:

cluster-with-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  backup:
    barmanObjectStore:
      destinationPath: s3://rciis-cnpg-backups/example-db
      s3Credentials:
        accessKeyId:
          name: cnpg-s3-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: cnpg-s3-credentials
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 2
      data:
        compression: gzip
    retentionPolicy: "30d"

The cnpg-s3-credentials Secret contains AWS IAM credentials:

cnpg-s3-credentials (SOPS-encrypted)
apiVersion: v1
kind: Secret
metadata:
  name: cnpg-s3-credentials
  namespace: app-namespace
type: Opaque
stringData:
  ACCESS_KEY_ID: "<AWS_ACCESS_KEY_ID>"
  ACCESS_SECRET_KEY: "<AWS_SECRET_ACCESS_KEY>"

IAM Roles for Service Accounts (IRSA)

On EKS, prefer IRSA over static credentials. Set backup.barmanObjectStore.s3Credentials.inheritFromIAMRole: true and annotate the CNPG ServiceAccount with the IAM role ARN.

cluster-with-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  backup:
    barmanObjectStore:
      destinationPath: s3://cnpg-backups/example-db
      endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
      s3Credentials:
        accessKeyId:
          name: cnpg-s3-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: cnpg-s3-credentials
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 2
      data:
        compression: gzip
    retentionPolicy: "30d"

The cnpg-s3-credentials Secret contains the Ceph RGW user credentials:

cnpg-s3-credentials (SOPS-encrypted)
apiVersion: v1
kind: Secret
metadata:
  name: cnpg-s3-credentials
  namespace: app-namespace
type: Opaque
stringData:
  ACCESS_KEY_ID: "<CEPH_RGW_ACCESS_KEY>"
  ACCESS_SECRET_KEY: "<CEPH_RGW_SECRET_KEY>"
cluster-with-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  backup:
    barmanObjectStore:
      destinationPath: s3://cnpg-backups/example-db
      endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
      s3Credentials:
        accessKeyId:
          name: cnpg-s3-credentials
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: cnpg-s3-credentials
          key: ACCESS_SECRET_KEY
      wal:
        compression: gzip
        maxParallel: 2
      data:
        compression: gzip
    retentionPolicy: "30d"

The cnpg-s3-credentials Secret contains the Ceph RGW user credentials:

cnpg-s3-credentials (SOPS-encrypted)
apiVersion: v1
kind: Secret
metadata:
  name: cnpg-s3-credentials
  namespace: app-namespace
type: Opaque
stringData:
  ACCESS_KEY_ID: "<CEPH_RGW_ACCESS_KEY>"
  ACCESS_SECRET_KEY: "<CEPH_RGW_SECRET_KEY>"

Scheduled Base Backups

WAL archiving is continuous, but periodic base backups are needed for efficient recovery. Create a ScheduledBackup CR for each PostgreSQL cluster:

scheduled-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: example-db-daily
  namespace: app-namespace
spec:
  schedule: "0 2 * * *"           # 02:00 UTC daily
  backupOwnerReference: self
  cluster:
    name: example-db
  method: barmanObjectStore

Backup retention

The retentionPolicy: "30d" in the Cluster CR controls how long base backups and WAL files are retained. The ScheduledBackup creates new base backups on schedule — old base backups and WAL segments beyond the retention window are automatically pruned by Barman.

Ceph S3 User for CNPG

Bare Metal only

This manifest is only required when using Ceph RGW as the backup storage backend. AWS deployments use IAM credentials instead.

cnpg-s3-user.yaml
apiVersion: ceph.rook.io/v1
kind: CephObjectStoreUser
metadata:
  name: cnpg-backup
  namespace: rook-ceph
spec:
  store: ceph-objectstore
  clusterNamespace: rook-ceph
  displayName: "CNPG Backup User"
  capabilities:
    user: "*"
    bucket: "*"

Recovery

To recover a PostgreSQL cluster to a specific point in time, create a new Cluster CR that bootstraps from the backup:

recovery-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db-recovered
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  bootstrap:
    recovery:
      source: example-db-backup
      recoveryTarget:
        targetTime: "2026-02-15T12:00:00Z"

  externalClusters:
    - name: example-db-backup
      barmanObjectStore:
        destinationPath: s3://rciis-cnpg-backups/example-db
        s3Credentials:
          accessKeyId:
            name: cnpg-s3-credentials
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: cnpg-s3-credentials
            key: ACCESS_SECRET_KEY
recovery-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db-recovered
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  bootstrap:
    recovery:
      source: example-db-backup
      recoveryTarget:
        targetTime: "2026-02-15T12:00:00Z"

  externalClusters:
    - name: example-db-backup
      barmanObjectStore:
        destinationPath: s3://cnpg-backups/example-db
        endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
        s3Credentials:
          accessKeyId:
            name: cnpg-s3-credentials
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: cnpg-s3-credentials
            key: ACCESS_SECRET_KEY
recovery-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: example-db-recovered
  namespace: app-namespace
spec:
  instances: 3
  storage:
    size: 10Gi
    storageClassName: ceph-rbd-single

  bootstrap:
    recovery:
      source: example-db-backup
      recoveryTarget:
        targetTime: "2026-02-15T12:00:00Z"

  externalClusters:
    - name: example-db-backup
      barmanObjectStore:
        destinationPath: s3://cnpg-backups/example-db
        endpointURL: http://rook-ceph-rgw-ceph-objectstore.rook-ceph.svc.cluster.local:80
        s3Credentials:
          accessKeyId:
            name: cnpg-s3-credentials
            key: ACCESS_KEY_ID
          secretAccessKey:
            name: cnpg-s3-credentials
            key: ACCESS_SECRET_KEY

Verify

# Check backup status on a CNPG cluster
kubectl get cluster example-db -n app-namespace \
  -o jsonpath='{.status.lastSuccessfulBackup}'

# List backups
kubectl get backups -n app-namespace

# Check WAL archiving — first recoverable point
kubectl get cluster example-db -n app-namespace \
  -o jsonpath='{.status.firstRecoverabilityPoint}'

# Verify scheduled backups
kubectl get scheduledbackups -n app-namespace

Descheduler

The Kubernetes Descheduler evicts pods that violate scheduling constraints or contribute to resource imbalance. It works alongside the default scheduler — the descheduler evicts, and the scheduler re-places pods on better-suited nodes.

Install

The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches.

Create the base directory and file:

mkdir -p flux/infra/base
Field Value Explanation
chart descheduler The Helm chart name from the Descheduler registry
version 0.34.0 Pinned chart version — update this to upgrade Descheduler
sourceRef.name descheduler References a HelmRepository CR pointing to the Descheduler Helm repository
targetNamespace kube-system Descheduler runs in the system namespace
crds: CreateReplace Automatically installs and updates Descheduler CRDs
remediation.retries 3 Flux retries up to 3 times if the install or upgrade fails

Save the following as flux/infra/base/descheduler.yaml:

flux/infra/base/descheduler.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: descheduler
  namespace: flux-system
spec:
  targetNamespace: kube-system
  interval: 30m
  chart:
    spec:
      chart: descheduler
      version: "0.34.0"
      sourceRef:
        kind: HelmRepository
        name: descheduler
        namespace: flux-system
  releaseName: descheduler
  install:
    createNamespace: true
    crds: CreateReplace
    remediation:
      retries: 3
  upgrade:
    crds: CreateReplace
    remediation:
      retries: 3
  values:
    replicas: 3
    leaderElection:
      enabled: true
    kind: Deployment
    deschedulerPolicy:
      profiles:
        - name: default
          pluginConfig:
            - name: DefaultEvictor
              args:
                evictLocalStoragePods: false
                evictSystemCriticalPods: false
                nodeFit: true
            - name: LowNodeUtilization
              args:
                useDeviationThresholds: true
                thresholds:
                  cpu: 10
                  memory: 10
                  pods: 10
                targetThresholds:
                  cpu: 20
                  memory: 20
                  pods: 20
            - name: RemovePodsViolatingTopologySpreadConstraint
              args:
                constraints:
                  - DoNotSchedule
          plugins:
            balance:
              enabled:
                - LowNodeUtilization
                - RemovePodsViolatingTopologySpreadConstraint
Alternative: Helm CLI

If you do not have Git access, install Descheduler directly:

helm repo add descheduler https://kubernetes-sigs.github.io/descheduler
helm repo update
helm upgrade --install descheduler descheduler/descheduler \
  --namespace kube-system \
  --version 0.34.0 \
  -f values.yaml

Configuration

The environment patch overrides the base HelmRelease with cluster-specific resource settings. Only AWS has a patch — Bare Metal uses the base configuration as-is.

Create the environment overlay directory:

mkdir -p flux/infra/aws/descheduler

No overlay needed — Bare Metal uses the base configuration. Skip this step.

No overlay needed — Bare Metal uses the base configuration. Skip this step.

Environment Patch

The patch file adjusts resource limits for your deployment model.

AWS deployments typically run on smaller instances, so Descheduler uses reduced resource limits.

flux/infra/aws/descheduler/patch.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: descheduler
spec:
  values:
    replicas: 1
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 128Mi
Setting Value Why
replicas 1 AWS deployments run a single replica; no leader election needed
Resource limits (reduced) CPU 50m, RAM 64Mi AWS instances are smaller than HA bare metal

Bare Metal uses the base configuration with 3 replicas and leader election:

Setting Value Why
replicas 3 HA deployment with leader election for redundancy
Resource limits CPU 50m (base), RAM 64Mi (base) Allows higher thresholds for rebalancing

Bare Metal uses the base configuration with 3 replicas and leader election:

Setting Value Why
replicas 3 HA deployment with leader election for redundancy
Resource limits CPU 50m (base), RAM 64Mi (base) Allows higher thresholds for rebalancing

Commit and Deploy

Once all files are in place, commit and push to trigger Flux deployment:

git add flux/infra/base/descheduler.yaml \
        flux/infra/aws/descheduler/
git commit -m "feat(descheduler): add Descheduler for AWS environment"
git push
git add flux/infra/base/descheduler.yaml
git commit -m "feat(descheduler): add Descheduler for bare metal environment"
git push
git add flux/infra/base/descheduler.yaml
git commit -m "feat(descheduler): add Descheduler for bare metal environment"
git push

Flux will detect the new commit and begin deploying Descheduler. To trigger an immediate sync instead of waiting for the next poll interval:

flux reconcile kustomization infra-descheduler -n flux-system --with-source

Verify

# Check Descheduler is running (Deployment mode)
kubectl get pods -n kube-system -l app.kubernetes.io/name=descheduler

# Check logs for eviction activity
kubectl logs -n kube-system -l app.kubernetes.io/name=descheduler --tail=50

Flux Operations

This component is managed by Flux as HelmRelease descheduler and Kustomization infra-descheduler.

Check whether the HelmRelease and Kustomization are in a Ready state:

flux get helmrelease descheduler -n flux-system
flux get kustomization infra-descheduler -n flux-system

Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:

flux reconcile kustomization infra-descheduler -n flux-system --with-source

Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:

flux reconcile helmrelease descheduler -n flux-system

View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:

flux logs --kind=HelmRelease --name=descheduler -n flux-system

Recovering a stalled HelmRelease

If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:

flux suspend helmrelease descheduler -n flux-system
flux resume helmrelease descheduler -n flux-system
flux reconcile kustomization infra-descheduler -n flux-system

Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.


Next Steps

Backup and scheduling infrastructure is now configured. Proceed to 5.3.4 Identity & Access Management to set up Kubernetes RBAC, role-based access control, and authentication policies for cluster security.