Skip to content

Kafka Cluster Deployment

Architecture Overview

Without further ado, let's jump straight to the commands:

cd ~/Projects/retail-lakehouse/kafka-cluster
bash install.sh
install.sh
#!/bin/bash

set -e

# Deploy Strimzi Cluster Operator
kubectl create namespace strimzi
kubectl create namespace kafka-cdc

helm repo add strimzi https://strimzi.io/charts/
helm repo update
helm install \
  strimzi-cluster-operator \
  oci://quay.io/strimzi-helm/strimzi-kafka-operator \
  -f values.yaml \
  -n strimzi \
  --version 0.46.1
sleep 5
kubectl wait --for=condition=Ready pod -l name=strimzi-cluster-operator -n strimzi --timeout=1200s

# Deploy Kafka cluster using Strimzi
kubectl apply -f kafka-cluster.yaml -n kafka-cdc
sleep 5
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=kafka -n kafka-cdc --timeout=1200s
sleep 5
kubectl wait --for=condition=Ready pod -l app.kubernetes.io/name=entity-operator -n kafka-cdc --timeout=1200s

This script will deploy a Kafka Cluster using the Strimzi Operator in the kafka-cdc namespace.

If you don't like my script and want to do it step by step manually, please continue reading. This article will walk you through how to deploy a Kafka Cluster using the Strimzi Operator on Kubernetes step by step, explaining each part along the way.

Deploy the Strimzi Cluster Operator

Before we dive into deploying Kafka, let's talk about what Strimzi is and why we need it.

Strimzi is a Kubernetes operator that makes running Apache Kafka on Kubernetes much easier. Think of it as your Kafka cluster manager and it handles all the complex setup, configuration, and maintenance tasks that would otherwise require manual intervention.

Instead of manually creating Kafka pods, services, and configurations, Strimzi lets you define what you want in simple YAML files, and it takes care of the rest.

You can deploy Strimzi on Kubernetes 1.25 and later using one of the following methods:

  • Deployment files (YAML files)
  • OperatorHub.io
  • Helm chart

In this guide, we'll use the Helm chart method because it's straightforward and allows for easy customization.

First, we need to create two separate Kubernetes namespaces, one for the Strimzi operator and another for our Kafka cluster:

kubectl create namespace strimzi
kubectl create namespace kafka-cdc

Next, we need to add the Strimzi Helm repository to our local Helm setup and install the Strimzi operator with our custom values:

helm repo add strimzi https://strimzi.io/charts/
helm repo update
helm install \
  strimzi-cluster-operator \
  oci://quay.io/strimzi-helm/strimzi-kafka-operator \
  -f ~/Projects/retail-lakehouse/kafka-cluster/values.yaml \
  -n strimzi \
  --version 0.46.1

Here's our customized values.yaml file and the most important settings is watchNamespaces, which tells Strimzi to specifically watch the kafka-cdc namespace where we'll deploy our Kafka cluster.

values.yaml
# Default values for strimzi-kafka-operator.

# Default replicas for the cluster operator
replicas: 1

# If you set `watchNamespaces` to the same value as ``.Release.Namespace` (e.g. `helm ... --namespace $NAMESPACE`),
# the chart will fail because duplicate RoleBindings will be attempted to be created in the same namespace
watchNamespaces:
  - kafka-cdc
watchAnyNamespace: false

defaultImageRegistry: quay.io
defaultImageRepository: strimzi
defaultImageTag: 0.46.1

image:
  registry: ""
  repository: ""
  name: operator
  tag: ""
  # imagePullSecrets:
  #   - name: secretname
logVolume: co-config-volume
logConfigMap: strimzi-cluster-operator
logConfiguration: ""
logLevel: ${env:STRIMZI_LOG_LEVEL:-INFO}
fullReconciliationIntervalMs: 120000
operationTimeoutMs: 300000
kubernetesServiceDnsDomain: cluster.local
featureGates: ""
tmpDirSizeLimit: 1Mi

# Example on how to configure extraEnvs
# extraEnvs:
#   - name: JAVA_OPTS
#     value: "-Xms256m -Xmx256m"

extraEnvs: []

tolerations: []
affinity: {}
annotations: {}
labels: {}
nodeSelector: {}
deploymentAnnotations: {}
priorityClassName: ""

podSecurityContext: {}
securityContext: {}
rbac:
  create: yes
serviceAccountCreate: yes
serviceAccount: strimzi-cluster-operator

leaderElection:
  enable: true

# https://kubernetes.io/docs/tasks/run-application/configure-pdb/
podDisruptionBudget:
  enabled: false
  # The PDB definition only has two attributes to control the availability requirements: minAvailable or maxUnavailable (mutually exclusive).
  # Field maxUnavailable tells how many pods can be down and minAvailable tells how many pods must be running in a cluster.

  # The pdb template will check values according to below order
  #
  #  {{- if .Values.podDisruptionBudget.minAvailable }}
  #     minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
  #  {{- end  }}
  #  {{- if .Values.podDisruptionBudget.maxUnavailable }}
  #     maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
  #  {{- end }}
  #
  # If both values are set, the template will use the first one and ignore the second one. currently by default minAvailable is set to 1
  minAvailable: 1
  maxUnavailable:

# If you are using the grafana dashboard sidecar,
# you can import some default dashboards here
dashboards:
  enabled: false
  namespace: ~
  label: grafana_dashboard # this is the default value from the grafana chart
  labelValue: "1" # this is the default value from the grafana chart
  annotations: {}
  extraLabels: {}

# Docker images that operator uses to provision various components of Strimzi. To use your own registry prefix the
# repository name with your registry URL.
# Ex) repository: registry.xyzcorp.com/strimzi/kafka
kafka:
  image:
    registry: ""
    repository: ""
    name: kafka
    tagPrefix: ""
kafkaConnect:
  image:
    registry: ""
    repository: ""
    name: kafka
    tagPrefix: ""
topicOperator:
  image:
    registry: ""
    repository: ""
    name: operator
    tag: ""
userOperator:
  image:
    registry:
    repository:
    name: operator
    tag: ""
kafkaInit:
  image:
    registry: ""
    repository: ""
    name: operator
    tag: ""
kafkaBridge:
  image:
    registry: ""
    repository:
    name: kafka-bridge
    tag: 0.32.0
kafkaExporter:
  image:
    registry: ""
    repository: ""
    name: kafka
    tagPrefix: ""
kafkaMirrorMaker2:
  image:
    registry: ""
    repository: ""
    name: kafka
    tagPrefix: ""
cruiseControl:
  image:
    registry: ""
    repository: ""
    name: kafka
    tagPrefix: ""
kanikoExecutor:
  image:
    registry: ""
    repository: ""
    name: kaniko-executor
    tag: ""
mavenBuilder:
  image:
    registry: ""
    repository: ""
    name: maven-builder
    tag: ""
resources:
  limits:
    memory: 384Mi
    cpu: 1000m
  requests:
    memory: 384Mi
    cpu: 200m
livenessProbe:
  initialDelaySeconds: 10
  periodSeconds: 30
readinessProbe:
  initialDelaySeconds: 10
  periodSeconds: 30

createGlobalResources: true
# Create clusterroles that extend existing clusterroles to interact with strimzi crds
# Ref: https://kubernetes.io/docs/reference/access-authn-authz/rbac/#aggregated-clusterroles
createAggregateRoles: false
# Override the exclude pattern for exclude some labels
labelsExclusionPattern: ""
# Controls whether Strimzi generates network policy resources (By default true)
generateNetworkPolicy: true
# Override the value for Connect build timeout
connectBuildTimeoutMs: 300000
# Controls whether Strimzi generates pod disruption budget resources (By default true)
generatePodDisruptionBudget: true

If everything goes well, you'll see output like this:

Result
Pulled: quay.io/strimzi-helm/strimzi-kafka-operator:0.46.1
Digest: sha256:e87ea2a03985f5dd50fee1f8706f737fa1151b86dce5021b6c0798ac8b17e27f
NAME: strimzi-cluster-operator
LAST DEPLOYED: Sun Jun 29 17:25:49 2025
NAMESPACE: strimzi
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing strimzi-kafka-operator-0.46.1

To create a Kafka cluster refer to the following documentation.

https://strimzi.io/docs/operators/latest/deploying.html#deploying-cluster-operator-helm-chart-str

Great! The STATUS: deployed tells us everything went smoothly.

Let's make sure our Strimzi operator is actually running:

helm ls -n strimzi
Result
NAME                        NAMESPACE   REVISION    UPDATED                                 STATUS      HART                            APP VERSION
strimzi-cluster-operator    strimzi     1           2025-06-29 17:25:49.773026 +0800 CST    deployed    trimzi-kafka-operator-0.46.1    0.46.1     
kubectl get all -n strimzi
Result
NAME                                           READY   STATUS    RESTARTS   AGE
pod/strimzi-cluster-operator-74f577b78-s9n25   1/1     Running   0          108s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/strimzi-cluster-operator       1/1     1            1           108s

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/strimzi-cluster-operator-74f577b78   1         1         1       108s

Perfect! At this point, the Strimzi operator is running and watching for Kafka-related resources in the kafka-cdc namespace. It's like having a dedicated Kafka administrator ready to spring into action whenever we create Kafka clusters or related components.

Deploy a Kafka Cluster

Now that we have Strimzi operator running, let's deploy our actual Kafka cluster!

kubectl create -f kafka-cluster.yaml -n kafka-cdc

This command tells Kubernetes to create both the Kafka cluster and the node pool in our kafka-cdc namespace. You should see output confirming both resources were created:

Result
kafka.kafka.strimzi.io/kafka-cluster created
kafkanodepool.kafka.strimzi.io/dual-role created

Kafka clusters take a bit of time to start up - they need to elect controllers, establish consensus, and create internal topics. Let's check on the progress:

kubectl get all -n kafka-cdc

Initially, you might see pods in Pending or ContainerCreating status. After a minute or two, you should see something like this:

Result
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/kafka-cluster-dual-role-0                           1/1     Running   0          60s
pod/kafka-cluster-dual-role-1                           1/1     Running   0          60s
pod/kafka-cluster-dual-role-2                           1/1     Running   0          60s
pod/kafka-cluster-entity-operator-5b998f6cbf-c8hdf      2/2     Running   0          24s

NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                        AGE
service/kafka-cluster-kafka-bootstrap      ClusterIP   10.105.50.103   <none>        9091/TCP,9092/TCP,9093/TCP                     61s
service/kafka-cluster-kafka-brokers        ClusterIP   None            <none>        9090/TCP,9091/TCP,8443/TCP,9092/TCP,9093/TCP   61s

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kafka-cluster-entity-operator      1/1     1            1           24s

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/kafka-cluster-entity-operator-5b998f6cbf      1         1         1       24s

Components of the Kafka Cluster:

  • Three Kafka Pods: These are your dual-role nodes, numbered 0, 1, and 2
  • Entity Operator: This Strimzi component manages Kafka topics and users for you
  • Bootstrap Service: This is how clients discover and connect to your Kafka cluster
  • Broker Service: This provides direct access to individual brokers when needed

If you are interested in how the Kafka cluster is configured, you can continue reading the next section. If not, you are good to go to the next step: Deploy a MySQL Database.

Deep Dive: Understanding the Kafka Configuration

Let's break down what this configuration is telling Kubernetes to create for us.

Version

kafka-cluster.yaml:cluster
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-cluster
  namespace: kafka-cdc
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 4.0.0
    metadataVersion: 4.0-IV3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      default.replication.factor: 3
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      min.insync.replicas: 2
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  entityOperator:
    topicOperator: {}
    userOperator: {}

We're using Kafka 4.0. The spec.kafka.metadataVersion: 4.0-IV3 setting is important (the "IV" stands for "Incompatible Version," which means this version has significant metadata structure changes that aren't backward compatible). We need to explicitly specify this to confirm we understand we're using the latest format.

Fault Tolerance and Consistency

kafka-cluster.yaml:cluster
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-cluster
  namespace: kafka-cdc
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 4.0.0
    metadataVersion: 4.0-IV3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      default.replication.factor: 3
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      min.insync.replicas: 2
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  entityOperator:
    topicOperator: {}
    userOperator: {}

The configuration section ensures our cluster is production-ready with proper replication and consistency guarantees via settings under spec.kafka.config:

  • default.replication.factor: 3
  • offsets.topic.replication.factor: 3
  • transaction.state.log.replication.factor: 3

We're telling Kafka to keep 3 copies of everything, including your regular topics, consumer offset tracking, and transaction state. This means we can lose one broker and still have all our data.

  • transaction.state.log.min.isr: 2
  • min.insync.replicas: 2

These configurations ensure that at least 2 replicas must acknowledge a write before it's considered successful. This prevents data loss even if a broker fails right after acknowledging a write.

Listeners: How Clients Connect

kafka-cluster.yaml:cluster
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-cluster
  namespace: kafka-cdc
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 4.0.0
    metadataVersion: 4.0-IV3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      default.replication.factor: 3
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      min.insync.replicas: 2
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  entityOperator:
    topicOperator: {}
    userOperator: {}

Our configuration sets up two ways for applications to connect to Kafka:

Plain Listener (port 9092)

This is your standard, unencrypted connection. Perfect for development environments, internal applications where network security is handled at other layers, and high-throughput scenarios where TLS overhead isn't desired.

TLS Listener (port 9093)

This provides encrypted connections for, cross-namespace communication, production environments with security requirements, any scenario where data in transit needs protection.

KRaft Mode

kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: kafka-cluster
  namespace: kafka-cdc
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    version: 4.0.0
    metadataVersion: 4.0-IV3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      default.replication.factor: 3
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      min.insync.replicas: 2
    template:
      pod:
        securityContext:
          runAsUser: 0
          fsGroup: 0
  entityOperator:
    topicOperator: {}
    userOperator: {}
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: dual-role
  namespace: kafka-cdc
  labels:
    strimzi.io/cluster: kafka-cluster
spec:
  replicas: 3
  roles:
    - controller
    - broker
  storage:
    type: jbod
    volumes:
      - id: 1
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
        kraftMetadata: shared

If you're coming from a traditional Kafka background, you might be expecting to see ZooKeeper configurations, but we're going to use Kafka's newer KRaft mode instead. Think of KRaft as Kafka's way of saying "I don't need ZooKeeper anymore. I can manage my own metadata."

Understanding KRaft Mode

If you've worked with Kafka before, you might remember the pain of managing ZooKeeper alongside your Kafka clusters. KRaft mode eliminates that complexity entirely. Here's what makes it special:

What KRaft Replaces:

Instead of relying on an external ZooKeeper ensemble to store Kafka's metadata (like topic configurations, partition assignments, and cluster membership), Kafka brokers now handle this responsibility themselves using a consensus algorithm similar to Raft.

Why This Matters:

  • Simplified Operations: One less system to deploy, monitor, and troubleshoot
  • Better Performance: No more network hops to ZooKeeper for metadata operations
  • Improved Reliability: Fewer moving parts means fewer potential failure points

To enable KRaft mode in Strimzi, we add two key annotations to our Kafka resource:

  • strimzi.io/node-pools: enabled: Tells Strimzi we want to use the newer node pool architecture
  • strimzi.io/kraft: enabled: Enables KRaft mode instead of ZooKeeper

The configuration file we'll use defines not just a Kafka cluster, but also something called a KafkaNodePool. This is Strimzi's way of letting you organize your Kafka nodes into different groups with different roles and storage configurations. It lets you organize your Kafka nodes into logical groups.

In our case, we're creating a dual-role node pool with 3 nodes where each node acts as both a Controller (managing metadata) and a Broker (handling client data traffic).

Why choose dual-role nodes?

  • Resource Efficient: Fewer total machines needed
  • Simpler Architecture: No need to separate controller and broker concerns
  • Still Highly Available: With 3 nodes, we can tolerate losing one node

Each node gets a 100GB persistent volume that stores both regular Kafka logs and KRaft metadata. The spec.storage.volumes.kraftMetadata: shared setting means both types of data live on the same disk, which is fine for most use cases.


Conclusion

Congratulations! You now have a fully functional Kafka cluster running in KRaft mode.

In the next section, we'll deploy a MySQL database that will serve as our data source for change data capture.

References