5.3.2 Data Services¶
Database and messaging operators manage the lifecycle of stateful data workloads. These operators handle provisioning, failover, backup, and upgrades of PostgreSQL clusters and Apache Kafka clusters respectively.
How to use this page
Each component has an Install section showing the Flux HelmRelease, a Configuration section with Helm values, and a Verify section to confirm it is working.
All code blocks are labelled with their file path in the repository. Select your target environment (AWS or Bare Metal) in any tab group — the choice syncs across the entire page.
- Using the existing
rciis-devopsrepository: All files already exist. Skip themkdirandgit add/git commitcommands — they are for users building a new repository. Simply review the files, edit values for your environment, and push. - Building a new repository from scratch: Follow the
mkdir, file creation, andgitcommands in order. - No Git access: Expand the "Alternative: Helm CLI" block under each Install section.
CloudNativePG¶
CloudNativePG is the Kubernetes operator for PostgreSQL. It manages the full lifecycle of PostgreSQL clusters including automated failover, continuous backup, rolling updates, and connection pooling. All PostgreSQL instances in the RCIIS platform (Grafana, Keycloak, application databases) are managed by this operator.
Install¶
The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).
Create the base directory and file:
| Field | Value | Explanation |
|---|---|---|
chart |
cloudnative-pg |
The Helm chart name from the CloudNativePG registry |
version |
0.27.0 |
Pinned chart version — update this to upgrade CloudNativePG |
sourceRef.name |
cnpg |
References a HelmRepository CR pointing to https://cloudnative-pg.github.io/charts |
targetNamespace |
cnpg-system |
Namespace where CloudNativePG operator runs |
crds: CreateReplace |
— | Automatically installs and updates CloudNativePG CRDs |
remediation.retries |
3 |
Flux retries up to 3 times if the install or upgrade fails |
Save the following as flux/infra/base/cloudnative-pg.yaml:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: cloudnative-pg
namespace: flux-system
spec:
targetNamespace: cnpg-system
interval: 30m
chart:
spec:
chart: cloudnative-pg
version: "0.27.0"
sourceRef:
kind: HelmRepository
name: cnpg
namespace: flux-system
releaseName: cloudnative-pg
install:
createNamespace: true
crds: CreateReplace
remediation:
retries: 3
upgrade:
crds: CreateReplace
remediation:
retries: 3
values:
replicaCount: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
nodeTaintsPolicy: Honor
labelSelector:
matchLabels:
app.kubernetes.io/name: cloudnative-pg
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
monitoring:
podMonitorEnabled: true
podMonitorAdditionalLabels:
release: prometheus
logLevel: info
webhook:
mutating:
create: true
validating:
create: true
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
config:
create: true
data:
INHERITED_ANNOTATIONS: "cert-manager.io/*"
INHERITED_LABELS: "app.kubernetes.io/*"
Alternative: Helm CLI
If you do not have Git access, install CloudNativePG directly:
Configuration¶
The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how CloudNativePG behaves. Select your environment below.
Create the environment overlay directory:
Environment Patch¶
The patch file sets resource limits and replica counts appropriate for each environment. AWS reduces replicas and resources for cost optimization. Bare Metal uses the base defaults.
Save the following as the patch file for your environment:
On AWS, CloudNativePG resources are reduced to optimize cloud costs while maintaining operator functionality.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: cloudnative-pg
spec:
values:
replicaCount: 1
topologySpreadConstraints: []
resources:
requests:
cpu: 25m
memory: 64Mi
limits:
cpu: 250m
memory: 256Mi
| Setting | Value | Why |
|---|---|---|
replicaCount |
1 |
Single operator instance reduces AWS costs |
topologySpreadConstraints |
[] |
Clears topology spread — not needed for single replica |
resources.requests |
25m/64Mi | Minimal resource footprint for AWS |
resources.limits |
250m/256Mi | Caps resource usage for cost control |
On Bare Metal, CloudNativePG uses the base configuration with full HA capabilities.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/cloudnative-pg.yaml
No environment patch needed. The base HelmRelease provides 2 operator replicas with topology spread constraints and full monitoring enabled.
On Bare Metal, CloudNativePG uses the base configuration with full HA capabilities.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/cloudnative-pg.yaml
No environment patch needed. The base HelmRelease provides 2 operator replicas with topology spread constraints and full monitoring enabled.
Helm Values¶
The base HelmRelease already includes comprehensive Helm values. If you need to customize further for your environment, reference these key settings:
| Setting | HA (Default) | Non-HA | Why |
|---|---|---|---|
replicaCount |
2 |
1 |
Single vs. multiple operator instances |
topologySpreadConstraints |
Enabled | Disabled | Spreads operator pods across nodes |
monitoring.podMonitorEnabled |
true |
false |
Prometheus metrics for observability |
webhook.mutating/validating.create |
true |
true |
CRD validation webhooks (recommended) |
Commit and Deploy¶
Once all files are in place, commit and push to trigger Flux deployment:
Flux will detect the new commit and begin deploying CloudNativePG. To trigger an immediate sync instead of waiting for the next poll interval:
Verify¶
Creating PostgreSQL clusters
The CloudNativePG operator manages PostgreSQL clusters (not the operator itself).
Individual PostgreSQL instances are created as Cluster CRs in application namespaces.
See Identity Management for the Keycloak
database example, or the Grafana PostgreSQL instance deployed alongside Prometheus.
Flux Operations¶
This component is managed by Flux as HelmRelease cloudnative-pg and Kustomization infra-cloudnative-pg.
Check whether the HelmRelease and Kustomization are in a Ready state:
Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:
Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:
View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:
Recovering a stalled HelmRelease
If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:
flux suspend helmrelease cloudnative-pg -n flux-system
flux resume helmrelease cloudnative-pg -n flux-system
flux reconcile kustomization infra-cloudnative-pg -n flux-system
Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.
Next: Continue to Strimzi below.
Strimzi¶
Strimzi is the Kubernetes operator for Apache Kafka. It manages Kafka clusters, topics, users, connectors, and bridges. The RCIIS ESB (Enterprise Service Bus) relies on Kafka for asynchronous messaging between customs systems.
Install¶
The base HelmRelease tells Flux which chart to install. This file is shared across all environments — environment-specific settings are applied via patches (shown in the Configuration section).
Create the base directory and file:
| Field | Value | Explanation |
|---|---|---|
chart |
strimzi-kafka-operator |
The Helm chart name from the Strimzi registry |
version |
0.47.0 |
Pinned chart version — update this to upgrade Strimzi |
sourceRef.name |
strimzi |
References a HelmRepository CR pointing to https://strimzi.io/charts |
targetNamespace |
strimzi-operator |
Namespace where Strimzi operator runs |
dependsOn |
prometheus |
Ensures Prometheus is deployed before Strimzi metrics are configured |
crds: CreateReplace |
— | Automatically installs and updates Strimzi CRDs |
remediation.retries |
3 |
Flux retries up to 3 times if the install or upgrade fails |
Save the following as flux/infra/base/strimzi.yaml:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: strimzi
namespace: flux-system
spec:
dependsOn:
- name: prometheus
targetNamespace: strimzi-operator
interval: 30m
chart:
spec:
chart: strimzi-kafka-operator
version: "0.47.0"
sourceRef:
kind: HelmRepository
name: strimzi
namespace: flux-system
releaseName: strimzi
install:
createNamespace: true
crds: CreateReplace
remediation:
retries: 3
upgrade:
crds: CreateReplace
remediation:
retries: 3
values:
replicas: 2
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
nodeTaintsPolicy: Honor
labelSelector:
matchLabels:
name: strimzi-cluster-operator
serviceAccount: strimzi-cluster-operator
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
watchNamespaces: []
watchAnyNamespace: true
createGlobalResources: true
createAggregateRoles: true
featureGates: ""
logLevel: INFO
dashboards:
enabled: true
namespace: monitoring
labels:
grafana_dashboard: "1"
Alternative: Helm CLI
If you do not have Git access, install Strimzi directly:
Configuration¶
The environment patch overrides the base HelmRelease with cluster-specific settings. The values file controls how Strimzi behaves. Select your environment below.
Create the environment overlay directory:
Environment Patch¶
The patch file sets resource limits and replica counts appropriate for each environment. AWS reduces replicas and resources for cost optimization. Bare Metal uses the base defaults.
Save the following as the patch file for your environment:
On AWS, Strimzi resources are reduced to optimize cloud costs while maintaining operator functionality.
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: strimzi
spec:
values:
replicas: 1
topologySpreadConstraints: []
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
| Setting | Value | Why |
|---|---|---|
replicas |
1 |
Single operator instance reduces AWS costs |
topologySpreadConstraints |
[] |
Clears topology spread — not needed for single replica |
resources.requests |
50m/256Mi | Minimal resource footprint for AWS |
resources.limits |
500m/512Mi | Caps resource usage for cost control |
On Bare Metal, Strimzi uses the base configuration with full HA capabilities.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- ../../base/strimzi.yaml
- bridge-metrics.yaml
- cluster-operator-metrics.yaml
- entity-operator-metrics.yaml
- kafka-resource-metrics.yaml
No environment patch needed. The base HelmRelease provides 2 operator replicas with topology spread constraints and full monitoring enabled.
On Bare Metal, Strimzi uses the base configuration with full HA capabilities.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- ../../base/strimzi.yaml
- bridge-metrics.yaml
- cluster-operator-metrics.yaml
- entity-operator-metrics.yaml
- kafka-resource-metrics.yaml
No environment patch needed. The base HelmRelease provides 2 operator replicas with topology spread constraints and full monitoring enabled.
Helm Values¶
The base HelmRelease already includes comprehensive Helm values. If you need to customize further for your environment, reference these key settings:
| Setting | HA (Default) | Non-HA | Why |
|---|---|---|---|
replicas |
2 |
1 |
Single vs. multiple operator instances |
topologySpreadConstraints |
Enabled | Disabled | Spreads operator pods across nodes |
dashboards.enabled |
true |
false |
Grafana dashboards for Kafka observability |
resources |
200m/512Mi req, 1000m/1Gi lim | 100m/256Mi req, 500m/384Mi lim | Resource scaling with replica count |
Extra Manifests¶
Strimzi includes Prometheus monitoring definitions as separate manifests. These are deployed from the environment-specific directories alongside the Helm chart configuration.
The AWS environment includes PodMonitor resources that enable Prometheus to scrape
Strimzi metrics. Save each of the following in flux/infra/aws/strimzi/:
bridge-metrics.yaml — Metrics for KafkaBridge components:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: bridge-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: KafkaBridge
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: rest-api
cluster-operator-metrics.yaml — Metrics for the Strimzi cluster operator:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: cluster-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: cluster-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: http
entity-operator-metrics.yaml — Metrics for entity operator (users and topics):
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: entity-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: entity-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: healthcheck
kafka-resource-metrics.yaml — Metrics for Kafka clusters and related resources:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kafka-resources-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchExpressions:
- key: "strimzi.io/kind"
operator: In
values: ["Kafka", "KafkaConnect", "KafkaMirrorMaker2"]
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: tcp-prometheus
relabelings:
- separator: ;
regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
replacement: $1
action: labelmap
- sourceLabels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
targetLabel: namespace
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
targetLabel: kubernetes_pod_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
targetLabel: node_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
targetLabel: node_ip
replacement: $1
action: replace
The Bare Metal environment includes the same PodMonitor resources. Save each of the
following in flux/infra/proxmox/strimzi/:
bridge-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: bridge-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: KafkaBridge
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: rest-api
cluster-operator-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: cluster-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: cluster-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: http
entity-operator-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: entity-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: entity-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: healthcheck
kafka-resource-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kafka-resources-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchExpressions:
- key: "strimzi.io/kind"
operator: In
values: ["Kafka", "KafkaConnect", "KafkaMirrorMaker2"]
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: tcp-prometheus
relabelings:
- separator: ;
regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
replacement: $1
action: labelmap
- sourceLabels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
targetLabel: namespace
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
targetLabel: kubernetes_pod_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
targetLabel: node_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
targetLabel: node_ip
replacement: $1
action: replace
The Bare Metal environment includes the same PodMonitor resources. Save each of the
following in flux/infra/proxmox/strimzi/:
bridge-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: bridge-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: KafkaBridge
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: rest-api
cluster-operator-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: cluster-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
strimzi.io/kind: cluster-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: http
entity-operator-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: entity-operator-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: entity-operator
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: healthcheck
kafka-resource-metrics.yaml:
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kafka-resources-metrics
namespace: strimzi-operator
labels:
app: strimzi
release: prometheus
spec:
selector:
matchExpressions:
- key: "strimzi.io/kind"
operator: In
values: ["Kafka", "KafkaConnect", "KafkaMirrorMaker2"]
namespaceSelector:
matchNames:
- rciis-prod
podMetricsEndpoints:
- path: /metrics
port: tcp-prometheus
relabelings:
- separator: ;
regex: __meta_kubernetes_pod_label_(strimzi_io_.+)
replacement: $1
action: labelmap
- sourceLabels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
targetLabel: namespace
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
targetLabel: kubernetes_pod_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
targetLabel: node_name
replacement: $1
action: replace
- sourceLabels: [__meta_kubernetes_pod_host_ip]
separator: ;
regex: (.*)
targetLabel: node_ip
replacement: $1
action: replace
PodMonitor resources
The PodMonitor CRs tell Prometheus to scrape metrics from Strimzi components. These are automatically deployed via Kustomize when you include them in the environment directory. They require the Prometheus operator to be installed (see Observability for details).
Commit and Deploy¶
Once all files are in place, commit and push to trigger Flux deployment:
Flux will detect the new commit and begin deploying Strimzi. To trigger an immediate sync instead of waiting for the next poll interval:
Verify¶
Flux Operations¶
This component is managed by Flux as HelmRelease strimzi and Kustomization infra-strimzi.
Check whether the HelmRelease and Kustomization are in a Ready state:
Trigger an immediate sync — pulls the latest Git revision and re-applies the manifests. Use after pushing config changes or to verify a fix:
Trigger a Helm upgrade — re-runs the Helm install/upgrade for this release without waiting for the next interval. Use when the HelmRelease values have changed:
View recent Flux controller logs for this release — useful for diagnosing why a sync or upgrade failed:
Recovering a stalled HelmRelease
If the HelmRelease shows Stalled with RetriesExceeded, Flux will not retry automatically. Suspend and resume to clear the failure counter, then reconcile:
flux suspend helmrelease strimzi -n flux-system
flux resume helmrelease strimzi -n flux-system
flux reconcile kustomization infra-strimzi -n flux-system
Only run this after confirming the underlying issue (e.g. pod crash, timeout) has been resolved. See Maintenance — Recovering Stalled Resources for details.
Next Steps¶
Data services are now configured. Proceed to 5.3.3 Backup & Scheduling to set up automated backup and retention policies for PostgreSQL and Kafka.