How to Deploy Kafka on Kubernetes

Introduction

Deploying Kafka on Kubernetes improves distributed event streaming with container orchestration features such as scalability and fault tolerance. The integration with Kubernetes enables organizations to scale Kafka clusters on demand while maintaining high availability and reliability.

This tutorial shows you how to deploy Kafka brokers on a Kubernetes cluster.

Prerequisites

Kubernetes installed (see our Kubernetes on Ubuntu and Kubernetes on Rocky Linux tutorials).
kubectl installed.
Administrative privileges on the system.

How to Set Up Kafka on Kubernetes

Kafka is a stateful application that records streaming messages from producers and stores them for consumers to use later. In Kubernetes, the recommended way to deploy stateful applications is using stateful sets, workload objects that guarantee uniqueness and ordering of application pods.

Proceed with the steps below to create and test a Kafka stateful set deployment.

Step 1: Create Project Directory

Deploying Kafka requires creating multiple Kubernetes objects using YAML files. To facilitate applying of the configuration, create a project directory to store the files:

mkdir kafka

Go to the newly created kafka directory:

cd kafka

Step 2: Create Network Policy

A network policy defines a network that an application components can use to communicate with each other. To create a network policy for Kafka, use a text editor such as Nano to generate a new YAML file:

nano kafka-network-policy.yaml

Paste the following code into the file:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kafka-network
spec:
  ingress:
    - from:
        - podSelector:
            matchLabels:
              network/kafka-network: "true"
  podSelector:
    matchLabels:
      network/kafka-network: "true"

Save the file and exit.

Step 3: Create ZooKeeper Stateful Set

ZooKeeper is a server for distributed coordination and synchronization of cloud applications. Kafka uses ZooKeeper to track the cluster node status and maintain lists of topics and messages.

Create a YAML file for the ZooKeeper stateful set:

nano zookeeper-stateful-set.yaml

Paste the YAML code below:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    service: zookeeper
  name: zookeeper
spec:
  serviceName: zookeeper
  replicas: 1
  selector:
    matchLabels:
      service: zookeeper
  template:
    metadata:
      labels:
        network/kafka-network: "true"
        service: zookeeper
    spec:
      securityContext:
        fsGroup: 1000
      enableServiceLinks: false
      containers:
        - name: zookeeper
          imagePullPolicy: Always
          image: zookeeper
          ports:
            - containerPort: 2181
          env:
            - name: ZOOKEEPER_CLIENT_PORT
              value: "2181"
            - name: ZOOKEEPER_DATA_DIR
              value: "/var/lib/zookeeper/data"
            - name: ZOOKEEPER_LOG_DIR
              value: "/var/lib/zookeeper/log"
            - name: ZOOKEEPER_SERVER_ID
              value: "1"
          resources: {}
          volumeMounts:
            - mountPath: /var/lib/zookeeper/data
              name: zookeeper-data
            - mountPath: /var/lib/zookeeper/log
              name: zookeeper-log
      hostname: zookeeper
      restartPolicy: Always
  volumeClaimTemplates:
    - metadata:
        name: zookeeper-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1024Mi
    - metadata:
        name: zookeeper-log
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1024Mi

Save and exit the file.

The code defines the deployment of ZooKeeper with one replica and connects it with the previously defined kafka-network. It also provides the ZooKeeper container image and reserves volume storage for the application's data and logs.

Step 4: Create Zookeeper Service

Deploy ZooKeeper service to create a point of contact between ZooKeeper and Kafka. Generate a YAML file for the service:

nano zookeeper-service.yaml

Create a YAML declaration that defines the service name and provides a port for the service to listen on:

apiVersion: v1
kind: Service
metadata:
  labels:
    service: zookeeper
  name: zookeeper
spec:
  ports:
    - name: "2181"
      port: 2181
      targetPort: 2181
  selector:
    service: zookeeper

Save and exit the file.

Step 5: Create Kafka Stateful Set

Kafka stateful set configures the number of Kafka replicas, passes the necessary environment variables into the containers, and sets up volume claim templates to enable persistent storage.

Make a new YAML file to define the stateful set:

nano kafka-stateful-set.yaml

Paste the following code into the file:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    service: kafka
  name: kafka
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      service: kafka
  template:
    metadata:
      labels:
        network/kafka-network: "true"
        service: kafka
    spec:
      securityContext:
        fsGroup: 1000
      enableServiceLinks: false
      containers:
      - name: kafka
        imagePullPolicy: IfNotPresent
        image: bitnami/kafka:latest
        ports:
          - containerPort: 29092
          - containerPort: 9092
        env:
          - name: KAFKA_ADVERTISED_LISTENERS
            value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
          - name: KAFKA_AUTO_CREATE_TOPICS_ENABLE
            value: "true"
          - name: KAFKA_INTER_BROKER_LISTENER_NAME
            value: "INTERNAL"
          - name: KAFKA_LISTENERS
            value: "INTERNAL://:29092,LISTENER_EXTERNAL://:9092"
          - name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
            value: "INTERNAL:PLAINTEXT,LISTENER_EXTERNAL:PLAINTEXT"
          - name: KAFKA_ZOOKEEPER_CONNECT
            value: "zookeeper:2181"
        resources: {}
        volumeMounts:
          - mountPath: /var/lib/kafka/
            name: kafka-data
      hostname: kafka
      restartPolicy: Always
  volumeClaimTemplates:
    - metadata:
        name: kafka-data
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

Save and exit the file.

Step 6: Create Kafka Headless Service

Each Kafka broker is unique, i.e., clients must connect to a specific broker to retrieve specific data. This property makes it challenging to configure Kafka using a regular Kubernetes service since services act as load balancers and cannot distinguish between pods.

The recommended solution for this problem is to create a headless service. Headless services provide direct access to pods by allowing pods to be addressed by their DNS names.

Create a YAML for the headless service:

nano kafka-service.yaml

Paste the code below into the file. The clusterIP: None line tells Kubernetes that the service is headless.

apiVersion: v1
kind: Service
metadata:
  labels:
    service: kafka
  name: kafka
spec:
  clusterIP: None
  selector:
    service: kafka
  ports:
    - name: internal
      port: 29092
      targetPort: 29092
    - name: external
      port: 30092
      targetPort: 9092

Save the file and exit.

Step 7: Apply Configuration

Use the following command to apply the YAML files created in the previous steps all at once:

kubectl apply -f .

The output shows which objects were created in the cluster.

Applying the YAML files created in the previous steps.

Wait for the containers to initialize, then check if the deployment is up and running by executing the following command:

kubectl get all

The output shows all the pods, deployments, replicas, and stateful sets as ready.

The output of kubectl get all shows a working Kafka deployment.

Step 8: Test Kafka Deployment

Test the Kafka deployment with kcat, a generic non-JVM producer and consumer application. Follow the steps below to deploy kcat and use it to test Kafka:

1. Create a YAML file:

nano kcat-deployment.yaml

2. Paste the code below to create a kcat deployment:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: kcat
  labels:
    app: kcat
spec:
  selector:
    matchLabels:
      app: kcat
  template:
    metadata:
      labels:
        app: kcat
    spec:
      containers:
        - name: kcat
          image: edenhill/kcat:1.7.0
          command: ["/bin/sh"]
          args: ["-c", "trap : TERM INT; sleep 1000 & wait"]

The code uses the official edenhill/kcat container image.

3. Save the file and exit.

4. Apply the manifest with kubectl:

kubectl apply -f kcat-deployment.yaml

5. List the pods available in the cluster:

kubectl get pods

6. Find the kcat pod and copy its name.

The kcat application pod in the kubectl get pods output.

7. Enter the pod by executing the following command:

kubectl exec --stdin --tty [pod-name] -- /bin/sh

The pod shell prompt appears.

8. Enter the command below to send Kafka a test message to ingest:

echo "Test Message" | kcat -P -b kafka:29092 -t testtopic -p -1

If successful, the command prints no output.

9. Switch to the consumer role and query Kafka for messages by typing:

kcat -C -b kafka:29092 -t testtopic -p -1

The test message appears in the output.

Press Control+C to return to the prompt.

Creating Topics in Production Environment

The test procedure described in this step worked because the KAFKA_AUTO_CREATE_TOPICS_ENABLE variable was set to true in the Kafka stateful set declaration. It enabled kcat to accept the message as part of a topic, although the topic had not been created before.

While useful for testing, setting this variable is not recommended in the production environment. The following steps show how to create Kafka topics in production manually:

1. List the pods in the cluster:

kubectl get pods

Locate Kafka pods and choose any of the pod replicas.

A Kafka pod in the output of the kubectl get pods command.

2. Enter the chosen Kafka pod:

kubectl exec --stdin --tty [pod-name] -- /bin/sh

For example, to enter the kafka-0 pod, type:

kubectl exec --stdin --tty kafka-0 -- /bin/sh

3. Execute the following to create a new topic

kafka-topics --bootstrap-server kafka:29092 --topic [topic-name] --create --partitions [number] --replication-factor [number]

For example, create a topic named test with three partitions and the replication factor of three:

kafka-topics --bootstrap-server kafka:29092 --topic test --create --partitions 3 --replication-factor 3

The output confirms the topic creation:

Creating a Kafka topic inside a Kafka pod.

Use the created topic with kcat or another producer to organize event-driven messages.

Kafka and Kubernetes: Challenges

Using Kafka with Kubernetes has multiple benefits, especially if the organization already deploys stateful applications in the cluster. However, the following is a list of potential challenges to consider before deciding to deploy Kafka in a container environment:

Kubernetes works best with stateless apps. While stateful applications can operate inside a cluster, running them requires administrators to give up many features that make Kubernetes popular.
Kafka requires an entire operating system. Kafka stores the message queue data using a file system cache, which requires an OS kernel. While Kafka can run inside a container, it still needs to communicate with the rest of the OS, so it cannot benefit from containerization as a practice.
Kafka cannot work with a single load balancer. As explained before, Kafka brokers are not interchangeable, so putting a load balancer before them results in Kafka breaking. A solution to this problem is setting up one load balancer for each broker, but this setup can create a complicated deployment when many brokers are present.

Conclusion

By following the instructions in this tutorial, you have successfully installed Kafka on Kubernetes. The article also provided a list of challenges for the successful Kubernetes deployment of Kafka.

Read our article on deploying RabbitMQ on Kubernetes to learn more about a message broker that is a viable alternative to Kafka.

Was this article helpful?

YesNo