Introduction to Calisti and SDM

61 mins remaining

About

In this tutorial, we'll discuss what makes Streaming Data Manager unique and how to leverage the security and scaling features that make this platform a complete solution for messaging and logging within a production environment.

Written by Quinn Snyder

Updated Jun 23, 2023

1. Overview

What You’ll Learn

How to deploy Calisti to your Kubernetes cluster
What makes Streaming Data Manager (SDM) unique from generic implementations of Kafka within k8s
How to monitor traffic between services within a cluster
How to secure communications using access control lists (ACLs)
How to manage topics and scale the messaging system

What You’ll Need

A free Calisti account
Access to an Amazon Web Services (AWS) account with billing information on file (more information in the tutorial)

Note: If you have already completed the AWS account steps from the Calisti SMM tutorial, you can safely skip to the “Building the EC2 Instance” step, which will walk you through the setup of the Kubernetes cluster for SDM

2. Generating AWS Keys and Creating EC2 Keypairs

AWS Instances and Cost

Prior to starting the lab, you’ll need to create an AWS account. This tutorial will depend on using a large AWS Elastic Compute Cloud (EC2) instance which does not qualify for the free tier within the AWS cloud. This instance, as configured and deployed in us-east-1, will bill at about $1.25 per hour, which includes the instance and the additional root-block storage. This instance may bill differently in other regions, and will require a payment method on file for the instance to be able to be provisioned. We will be using Terraform to provision the instance, which will allow us to easily destroy the instance after we are done with the tutorial. Please remember that this lab will not be free using the AWS resources.

A Note About AWS Regions

AWS has many different regions in which its data centers and cloud offerings are available. Generally speaking, it is advantageous (from a latency perspective) to choose a region that is geographically close to you. However, it is important to note that different services within the AWS console may either be “global” in scope (such as Identity and Access Management [IAM], wherein changes in this service will affect all regions), or regional (like EC2, such that a key pair generated in one region cannot be used for another region). The scope of the service will be indicated in the upper-right corner of the service window.

Global region

Local region

Adding a Payment Method

Once you are signed up with an AWS account, you will need to access the billing platform by typing billing in the AWS Service search bar and selecting it. Once in the billing service, click Payment preferences and add a new payment method. By default, all major credit and debit cards are accepted, but please ensure that you delete the lab when complete using Terraform so that you will not be charged for usage beyond the exercises within this tutorial.

AWS payment screen

Creating Access Keys

Once billing information has been input, you will need to create access keys to allow your local machine to provision the EC2 instance using Terraform. This task is accomplished by searching for IAM in the AWS Service search bar and selecting it.

When the IAM service page appears, go to the left-hand side of the window and click Users, which will bring up a list of the users within the account. Select your username, which will bring up a new page with various information about your user. Select Security Credentials in the main window pane, and scroll down to Access keys and click Create access key.

AWS Access Key Page

In the resulting window, select Other, followed by Next. In the next window, you can set an optional description for the access key.

AWS Create Access Key

Once you click Next, you will be presented with the access key and secret key for your account, along with the option to download a CSV file of these credentials. Once you leave this screen, you will not be able to access the secret key. You may want to download the CSV file for safekeeping, but do not commit any of these credentials to a public repository; these credentials will allow anyone to access your account without your knowledge!

AWS Created Access Key

Creating an EC2 Key Pair

The final step of gathering the AWS prerequisites will be to create a new EC2 key pair. You can do this by searching for EC2 in the AWS service search bar and selecting it.

From there, select Key pairs in the left pane (under Network and Security), and then select Create key pair.

Creating EC2 Keypair

Provide a name, ensure that both RSA and .pem are selected, and then click Create key pair.

EC2 key pair options

This action will download the key pair to your local machine. Please do not delete this keypair, because it will be required to perform Secure Shell (SSH) to the cloud instance that will be stood up in EC2, which we will accomplish next.

3. Building the EC2 Instance

Installing Terraform and Gathering Files

The EC2 instance will be provisioned in the AWS cloud using Terraform. By using Terraform, we can package the required files for deployment and help ensure that every instance is created in a similar way. Additionally, Terraform will allow us to destroy all configured resources after we are done to help ensure that we are not charged for usage outside of the lab exploration. In order to use Terraform, you’ll need to install it using the instructions found on https://terraform.io/downloads for your operating system. If you choose to use the binary download, please download it and ensure that it resides in $PATH or the folder in which you plan on executing the Terraform files.

Next, you’ll need to download the code for this lab found here: https://github.com/CiscoLearning/calisti-tutorial-sample-code/. This repository will contain all the required Terraform for building the AWS EC2 instance, as well as files that will be needed for the completion of the tutorial. Once the code has been cloned, you’ll need to navigate to the calisti-tutorial-sample-code/terraform/terraform.tfvars file and add in the required information that we acquired in the previous step (AWS IAM keys, AWS EC2 keypair name, and EC2 keypair location).

AWS keys in terraform.tfvars

Once this file has been modified to reflect your information, the final piece will be ensuring the region that you have created the EC2 key pair in (and which region you decided to use). The region location is declared at the top of the panoptica-main.tf file.

AWS region declaration in Terraform

Once these changes have been made, move your terminal into the calisti-tutorial-sample-code/terraform folder, and perform a terraform init followed by a terraform apply -auto-approve. (We’re skipping the plan step because we know what will be created.) This process should only take a couple of minutes, and at the end, you should have a successfully built AWS virtual machine (VM). The output at the end of the Terraform run should indicate the fully qualified domain name (FQDN) of your new AWS EC2 instance, which has been fully updated and had Docker, kubectl, kind, caddy, k9s, and terraform installed, as well as the copy of files in the calisti-tutorial-sample-code folder cloned directly from GitHub.

When the EC2 instance build is completed, there will be output generated by the Terraform configuration. This will print the required SSH command, including the location of the SSH key from the included variables, as well as the FQDN, which is not known until the instance is created. You can copy and paste this command and hit <ENTER> to connect to the VM.

GIF of AWS instance being created

4. Deploying Calisti

The goal of these first tasks is to deploy a Kubernetes demo cluster based on kind (Kubernetes in Docker) that will serve as the Kubernetes cluster for Calisti and the included demo application. The included shell script will build a cluster with one control-plane node and four worker nodes, and it is located in the calisti-tutorial-sample-code/calisti/cluster folder within the EC2 instance.

Creating Kind Cluster

To instantiate the kind cluster, we will reference the included shell script found at the folder location above. This file, when executed, will build a kind cluster using the 1.23.13 image, wait for the cluster to become fully active, and then install MetalLB within the cluster in Layer 2 mode. (MetalLB enables Kubernetes services of type LoadBalancer to be installed on bare-metal Kubernetes installations.) If you wish to analyze the full cluster creation script, you can view the file by executing cat calisti-tutorial-sample-code/calisti/cluster/cluster_setup.sh.

When this shell script is run, you will see output similar to the following:

ubuntu@ip-172-31-0-66:~/calisti$ bash cluster_setup.sh
Creating cluster "demo1" ...
 ✓ Ensuring node image (kindest/node:v1.23.13) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-demo1"
You can now use your cluster with:

kubectl cluster-info --context kind-demo1

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
Waiting for metallb to be deployed...
Deploying metallb
namespace/metallb-system created
customresourcedefinition.apiextensions.k8s.io/addresspools.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bfdprofiles.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bgpadvertisements.metallb.io created
customresourcedefinition.apiextensions.k8s.io/bgppeers.metallb.io created
customresourcedefinition.apiextensions.k8s.io/communities.metallb.io created
customresourcedefinition.apiextensions.k8s.io/ipaddresspools.metallb.io created
customresourcedefinition.apiextensions.k8s.io/l2advertisements.metallb.io created
serviceaccount/controller created
serviceaccount/speaker created
role.rbac.authorization.k8s.io/controller created
role.rbac.authorization.k8s.io/pod-lister created
clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
rolebinding.rbac.authorization.k8s.io/controller created
rolebinding.rbac.authorization.k8s.io/pod-lister created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
secret/webhook-server-cert created
service/webhook-service created
deployment.apps/controller created
daemonset.apps/speaker created
validatingwebhookconfiguration.admissionregistration.k8s.io/metallb-webhook-configuration created
Waiting for metallb to be ready...
pod/controller-7967ffcf8-569k6 condition met
pod/speaker-7bhz7 condition met
pod/speaker-fg2ff condition met
pod/speaker-pjd2k condition met
pod/speaker-pmwh4 condition met
pod/speaker-vhrhj condition met
ipaddresspool.metallb.io/metallb-ippool created
l2advertisement.metallb.io/metallb-l2-mode created

To verify that the entire cluster has been bootstrapped and is ready for Calisti installation, you can use a kubectl command to verify the status of all pods within the cluster:

ubuntu@ip-172-31-0-66:~/calisti$ kubectl get pods -A
NAMESPACE            NAME                                          READY   STATUS    RESTARTS   AGE
kube-system          coredns-64897985d-c7mmd                       1/1     Running   0          13m
kube-system          coredns-64897985d-kq2kt                       1/1     Running   0          13m
kube-system          etcd-demo1-control-plane                      1/1     Running   0          13m
kube-system          kindnet-6z6cm                                 1/1     Running   0          13m
kube-system          kindnet-b88bq                                 1/1     Running   0          13m
kube-system          kindnet-cfj26                                 1/1     Running   0          13m
kube-system          kindnet-k64rq                                 1/1     Running   0          13m
kube-system          kindnet-kpnww                                 1/1     Running   0          13m
kube-system          kube-apiserver-demo1-control-plane            1/1     Running   0          13m
kube-system          kube-controller-manager-demo1-control-plane   1/1     Running   0          13m
kube-system          kube-proxy-5lk8z                              1/1     Running   0          13m
kube-system          kube-proxy-8t467                              1/1     Running   0          13m
kube-system          kube-proxy-hfwvw                              1/1     Running   0          13m
kube-system          kube-proxy-m8c2c                              1/1     Running   0          13m
kube-system          kube-proxy-zgkmj                              1/1     Running   0          13m
kube-system          kube-scheduler-demo1-control-plane            1/1     Running   0          13m
local-path-storage   local-path-provisioner-58dc9cd8d9-ss289       1/1     Running   0          13m
metallb-system       controller-7967ffcf8-569k6                    1/1     Running   0          12m
metallb-system       speaker-7bhz7                                 1/1     Running   0          12m
metallb-system       speaker-fg2ff                                 1/1     Running   0          12m
metallb-system       speaker-pjd2k                                 1/1     Running   0          12m
metallb-system       speaker-pmwh4                                 1/1     Running   0          12m
metallb-system       speaker-vhrhj                                 1/1     Running   0          12m

All pods should be in the running state. Notice that we have a separate Kubernetes namespace created for MetalLB, which was created as part of the cluster build script. We’re now ready to deploy Calisti to the cluster.

Installing Calisti and the Demo Application

In order to install Calisti, you’ll need to create a free Calisti account. Once logged in, click the Connect cluster button, which will take you to a screen that documents the system requirements for the application.

Connect cluster to Calisti

We’re using kind on a large EC2 instance—c5a.8xlarge has 32 vCPU and 64 GB RAM, and we’ve given plenty of GP2 storage to the system as well—so we meet these specs.

System requirements

Clicking continue will bring you to a screen wherein you’ll enter an installation name and, if required, a proxy server. While this installation method automates a large portion of the Calisti installation into the cluster, we’ll use another method that will allow us to install all features and help ensure that we can access the dashboard on our EC2 instance.

Begin by clicking the Download link for the Calisti CLI file for Linux. This will allow us to install SDM and Service Mesh Manager (SMM) into the cluster, as well as the demo application.

In order to install the demo application and access the Calisti dashboard, we’ll need to copy over the Calisti binaries to our EC2 instance. Ensure that you download the Linux binaries, and then copy them to the EC2 instance using Secure Copy Protocol (SCP).

scp -i {KEY_LOCATION} Downloads/smm_1.12.0_linux_amd64.tar.gz ubuntu@{EC2_FQDN}:.

An example of the above:

scp -i ~/.ssh/qms-keypair.pem Downloads/smm_1.12.0_linux_amd64.tar.gz ubuntu@ec2-18-209-69-87.compute-1.amazonaws.com:../

You’ll then need to decompress and untar the compressed file and ensure that the resulting files are executable:

cd ~
tar xvzf smm__1.12.0_linux_amd64.tar.gz
chmod +x smm supertubes

Next, you’ll need to gather the shell command from here, under 02. Get your Activation Credentials. Copy the command (an example is given below) and paste it on the terminal of your EC2 instance.

SMM_REGISTRY_PASSWORD={USER_PASSWORD} ./smm activate \
--host=registry.eticloud.io \
--prefix=smm \
--user={USER_STRING}

This should provide output similar to the following, and your cluster should now be able to access the SMM repositories.

ubuntu@ip-172-31-0-66:~$ SMM_REGISTRY_PASSWORD={USER_PASSWORD} ./smm activate \
> --host=registry.eticloud.io \
> --prefix=smm \
> --user={USER_STRING}
? Are you sure to use the following context? kind-demo1 (API Server: https://127.0.0.1:36263) Yes
✓ Configuring docker image access for Calisti.
✓ If you want to change these settings later, please use the 'registry' and 'activate' commands

Once this step is complete, you can install Calisti, which we will use for the rest of our lab.

./smm install --non-interactive -a --install-sdm -a --additional-cp-settings ~/calisti-tutorial-sample-code/calisti/smm/enable-dashboard-expose.yaml

The output will indicate that it is creating a set of custom resource definitions (CRDs) and then creating Kubernetes deployments in different namespaces. When it is complete, you can perform a kubectl get pods -A. When all pods are in the Running state, you can invoke the following command to check the status of the Istio deployment, upon which SMM is built:

./smm istio cluster status -c ~/.kube/demo1.kconf

If everything is deployed correctly, you should see the following output:

logged in as kubernetes-admin
Clusters
---
Name        Type   Provider  Regions  Version   Distribution  Status  Message
kind-demo1  Local  kind      []       v1.23.13  KIND          Ready


ControlPlanes
---
Cluster     Name                        Version  Trust Domain     Pods                                                  Proxies
kind-demo1  sdm-icp-v115x.istio-system  1.15.3   [cluster.local]  [istiod-sdm-icp-v115x-7fd5ccc9d9-l2zfq.istio-system]  8/8
kind-demo1  cp-v115x.istio-system       1.15.3   [cluster.local]  [istiod-cp-v115x-6bdfb6b4bd-c7qpb.istio-system]       22/22

We’ll now deploy our demo app, which will be installed using the smm binary:

./smm demoapp install --non-interactive

This will take some time. When the command completes, we want to ensure that all pods in the smm-demo namespace are in the running state.

kubectl get pods -n smm-demo

The output should appear similar to the following:

ubuntu@ip-172-31-14-56:~$ kubectl get pods -n smm-demo
NAME                                  READY   STATUS    RESTARTS   AGE
analytics-v1-6c9fd4c7d9-lbq77         2/2     Running   0          43s
bombardier-5f59948978-cdvnw           2/2     Running   0          43s
bookings-v1-6b89b9d965-spgrm          2/2     Running   0          43s
catalog-v1-7dd5b79cf-s2z6s            2/2     Running   0          43s
database-v1-6896cd4b59-kr67p          2/2     Running   0          43s
frontpage-v1-5976b889-jkgqw           2/2     Running   0          43s
movies-v1-9594fff5f-2zm4h             2/2     Running   0          43s
movies-v2-5559c5567c-q8n2s            2/2     Running   0          43s
movies-v3-649b99d977-ttp4j            2/2     Running   0          43s
mysql-669466cc8d-dmg44                2/2     Running   0          42s
notifications-v1-79bc79c89b-zj9xz     2/2     Running   0          42s
payments-v1-547884bfdf-6wpcn          2/2     Running   0          42s
postgresql-768b8dbd7b-khxvs           2/2     Running   0          42s
recommendations-v1-5bdd4cdf5f-cxld2   2/2     Running   0          42s

Finally, we need to expose the Calisti dashboard outside of the cluster. We will accomplish this in two steps. The first step is to determine the IP address of the ingress load-balancer that is deployed as part of Calisti inside of our Kubernetes cluster. The second step is to use Caddy to forward an external port on our EC2 instance to this host port. This is accomplished with the following:

INGRESSIP=$(kubectl get svc smm-ingressgateway-external -n smm-system -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
echo $INGRESSIP
caddy reverse-proxy --from :8080 --to ${INGRESSIP}:80 > /dev/null 2>&1 &

We can now access opening a web browser and the FQDN of our EC2 instance on port 8080, like http://{EC2_FQDN};8080. We can login by generating a token from the smm CLI via our SSH session, and then pasting that into the web browser and logging in.

./smm login --non-interactive

SMM login token

Once logged in, you will be presented with a dashboard indicating the current state of the smm-demo namespace. We’re now ready to explore the capabilities of SDM.

SMM dashboard page

5. SDM Overview

Calisti SDM is the deployment tool for setting up and operating production-ready Apache Kafka clusters on Kubernetes, leveraging a cloud-native technology stack. SDM includes Zookeeper, Koperator, Envoy, and many other components hosted in a managed service mesh. All components are automatically installed, configured, and managed in order to operate a production-ready Kafka cluster on Kubernetes.

Primary Features

Some of the important features of SDM are:

Fine-grained broker configuration support for heterogeneous cluster layouts
Declarative topic and user management through custom resources (CRs)
Automatic, Mutual Transport Layer Security (mTLS)-based encrypted and authenticated communication between all SDM components
Advanced Grafana dashboards to monitor all SDM components
Automatic reaction and self-healing based on Prometheus alerts
Alert-based reactions for graceful upscaling and downscaling and adding volumes to brokers
Disaster recovery using volume snapshots and cross-cluster replication using MirrorMaker 2
Data migration from Kafka environments
Rolling upgrades for continuous operations

Architecture

calisti sdm architecture

Koperator is a core part of Calisti SDM that helps you create a production-ready Apache Kafka cluster on Kubernetes, with scaling, rebalancing, and alerts-based self-healing. While Koperator itself is an open-source project, SDM extends the functionality of Koperator with commercial features—for example, declarative ACL handling, built-in monitoring, and multiple ways of disaster recovery.

What Makes SDM Unique?

Calisti SDM is specifically built to run and manage Apache Kafka on Kubernetes. Other solutions use Kubernetes StatefulSets to run Kafka, but this approach is not really suitable for Kafka. SDM is based on simple Kubernetes resources (Pods, ConfigMaps, and PersistentVolumeClaims), allowing a much more flexible approach that makes it possible to:

Modify the configuration of unique brokers; you can configure each broker individually.
Remove specific brokers from clusters.
Use multiple persistent volumes for each broker.
Do rolling upgrades without data loss or service disruption.

6. Monitoring Kafka Traffic with SDM

When we deployed Calisti, we enabled the SDM feature. Additionally, the demo app that was deployed in a later step includes multiple Kafka brokers as well as several producers (analytics, catalog, and booking services), which push data to a predefined Kafka topic, and one consumer (recommendations service).

Kafka architecture

Navigate to the TOPOLOGY page. In the NAMESPACES drop-down list in the top-left corner, select the smm-demo and kafka namespaces. You will see the traffic between our consumer service and the producers services to the Kafka cluster.

Kafka topology

Integrated SDM User Interface

There is an integrated SDM user interface for us to check detailed information of the Apache Kafka CR objects we deployed.

The MENU > TOPICS page shows information about the cluster’s Kafka topics. When you click a specific topic, the details related to that topic are displayed in the pane that appears on the right side of the window. We can do this for the recommendations-topic that is present within our app. The details will display information such as offset and size for each of its partitions.

Kafka Menu for Topics

Kafka Topic Details

The MENU > BROKERS page shows information about the cluster’s Kafka brokers. In a similar fashion to the topics, if we click a broker, the details related to that broker are displayed in the pane that appears on the right side of the window, such as the number of partitions and log size.

Kafka Broker Details

7. Securing SDM Traffic Using ACLs

Calisti provides out-of-the box support for mTLS-encrypted communication to the Apache Kafka brokers as well as managing access using access lists. From the client application’s point of view, there is no change required in order to support these features.

Calisti SDM fully automates managed mTLS encryption and authentication. You don’t need to configure your brokers to use Secure Sockets Layer (SSL) because SDM provides mTLS at the network layer (implemented through a lightweight, managed Istio mesh). All services deployed by SDM—Zookeeper, Koperator, the Kafka cluster, Cruise Control, MirrorMaker 2, and so on—interact with each other using mTLS.

To view the mTLS connections between services, navigate to the Topology view within the Calisti user interface and select the security option in the Edge labels drop-down. Encrypted links will be shown with a closed lock. (Switch back to request rate/throughput at the end.)

Kafka mTLS View

ACLs on Kafka

Using SDM, you can accomplish two tasks without having to modify your Kafka client libraries or code:

Authenticate the Kafka clients using their Kubernetes namespaces and service accounts.
Automatically apply the ACLs defined for the <namespace>-<serviceaccount> Kafka user to the client applications.

The authentication and authorization flow performs the following actions automatically::

When a Kafka client application runs inside an Istio mesh where mTLS communication is provided by Istio, Istio generates a certificate for the client application, which is used during the mTLS session. This Istio-generated certificate contains the namespace and the name of the service account of the client application.
SDM—or more precisely, the Kafka ACL WebAssembly (WASM) filter for Envoy in SDM—checks this certificate when traffic from the client application flows toward a broker. It extracts the namespace and service account information, then mutates the traffic by injecting <namespace>-<serviceaccount> into it.
The Kafka brokers receive this mutated stream that contains <namespace>-<serviceaccount>. SDM reads the <namespace>-<serviceaccount> from the TCP stream and uses it as the Kafka principal that represents the client application.

kafka ssl

Configuring Default Access

By default, Kafka allows all applications to access any topic if no ACL is present.

We will restrict access to Kafka and also configure which authorization strategy we want to use, which is done with the following lines within the Kafka CR:

authorizer.class.name=kafka.security.authorizer.AclAuthorizer
allow.everyone.if.no.acl.found=false

We can use kubectl patch to apply these changes without having to edit the CR with a text editor. These two settings, among others, are stored in a CR. We will use the following kubectl patch command to add these settings:

kubectl patch kafkacluster kafka -n kafka --type=merge --patch '{"spec":{"readOnlyConfig":"auto.create.topics.enable=false\noffsets.topic.replication.factor=2\ncruise.control.metrics.topic.auto.create=true\ncruise.control.metrics.topic.num.partitions=12\ncruise.control.metrics.topic.replication.factor=2\ncruise.control.metrics.topic.min.insync.replicas=1\nsuper.users=User:CN=kafka-default;User:CN=kafka-kafka-operator;User:CN=supertubes-system-supertubes;User:CN=supertubes-system-supertubes-ui\nauthorizer.class.name=kafka.security.authorizer.AclAuthorizer\nallow.everyone.if.no.acl.found=false"}}'

Alternatively, you can directly edit the Kafka cluster CR:

kubectl edit kafkacluster kafka -n kafka

This action will open the Kafka CR inside of a vim instance. The lines should be added below line 560 in the CR. Unless you are familiar with the operation of vim, however, it is strongly recommended that you use the patch command.

Edting Kafka CR with vim

Now that the CR has been edited to deny access to producers and consumers by default, we need to create the ACLs that will allow that communication to resume. The ACLs are provided within the ~/calisti-tutorial-sample-code/calisti/sdm/demoapp_kafka_acls.yaml file. For brevity, this file is not included inline, but it can be examined using the following command:

cat ~/calisti-tutorial-sample-code/calisti/sdm/demoapp_kafka_acls.yaml

We can use kubectl to apply the ACL configuration to the cluster:

kubectl apply -f ~/calisti-tutorial-sample-code/calisti/sdm/demoapp_kafka_acls.yaml

Connect to Kafka and Manage Access

We’re going to create another service in a dedicated namespace and have it try to access the recommendations-topic to illustrate the ACLs and access restrictions. We’ll start by creating a new namespace for an app called kcat and enable that namespace for Istio sidecar injection.

kubectl create ns kcat
smm sidecar-proxy auto-inject on kcat

The following YAML file, located at ~/calisti-tutorial-sample-code/calisti/sdm/kcat_deploy.yaml, creates the definition to deploy the kcat pod. The kcat application can simulate the behavior of the producer and consumer sending messages to topics.

apiVersion: v1
kind: Pod
metadata:
  name: kcat
spec:
  containers:
  - name: kafka-test
    image: "edenhill/kcat:1.7.1"
    # Just spin & wait forever
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]

We’ll deploy this application using kubectl, but ensure that you include the -n kcat switch that will deploy the application to the kcat namespace (and will have an Istio sidecar deployed with the application).

kubectl apply -n kcat -f ~/calisti-tutorial-sample-code/calisti/sdm/kcat_deploy.yaml

Now that the application is deployed, we need to determine which ports our Kafka Brokers are using.

kubectl get services -n kafka

Once run, you should receive output similar to the following:

NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                        AGE
kafka-0                       ClusterIP   10.104.208.233   <none>        29092/TCP,29093/TCP,9020/TCP   87m
kafka-1                       ClusterIP   10.101.37.131    <none>        29092/TCP,29093/TCP,9020/TCP   87m
kafka-all-broker              ClusterIP   10.96.228.30     <none>        29092/TCP,29093/TCP            87m
kafka-cruisecontrol-svc       ClusterIP   10.100.16.195    <none>        8090/TCP,9020/TCP              86m
kafka-kminion                 ClusterIP   10.99.224.82     <none>        8080/TCP                       87m
kafka-operator-alertmanager   ClusterIP   10.99.247.101    <none>        9001/TCP                       90m
kafka-operator-authproxy      ClusterIP   10.99.157.137    <none>        8443/TCP                       90m
kafka-operator-operator       ClusterIP   10.106.8.119     <none>        443/TCP                        90m

As you can see, there is a kafka-all-broker service that binds to Kafka pods on both port 29092/TCP and 29093/TCP. We’ll use these ports to attempt to connect to the broker using our kcat application deployed earlier. We do this by executing the following command to connect to the shell of the kcat pod:

kubectl exec -n kcat -it kcat sh

Note: You are now on the shell of the kcat pod. Notice that the shell prompt has changed.

Start the kcat application by running the following command. This command specifies the broker’s FQDN within the cluster, as well as the port on which we will need to connect. The -C flag indicates to kcat to run as a consumer, pulling messages from the recommendations-topic.

kcat -b kafka-all-broker.kafka.svc.cluster.local:29092 -C -t recommendations-topic
exit

Because the ACL is in a place that doesn’t allow the kcat pod to the Kafka messaging broker, you should receive output similar to the following after entering the kcat command in the shell:

# kcat -b kafka-all-broker.kafka.svc.cluster.local:29092 -C -t recommendations-topic
% ERROR: Topic recommendations-topic error: Broker: Topic authorization failed

Note: Ensure that you enter the exit command to return to the shell of your EC2 instance.

To allow kcat to communicate, let’s add a specific ACL for allowing the kcat app on the topic. This configuration can be found in ~/calisti-tutorial-sample-code/calisti/sdm/kcat-acl.yaml and is shown here:

apiVersion: kafka.banzaicloud.io/v1beta1
kind: KafkaACL
metadata:
  labels:
    app: kcat
  name: kcat
  namespace: kafka
spec:
  clusterRef:
    name: kafka
    namespace: kafka
  kind: User
  name: CN=kcat-default
  roles:
  - name: consumer
    resourceSelectors:
    - name: recommendations-topic
      namespace: kafka
    - name: recommendations-group
      namespace: kafka

Let’s apply the configuration to the cluster:

kubectl apply -f ~/calisti-tutorial-sample-code/calisti/sdm/kcat-acl.yaml

If we reconnect to the kcat pod, it should be possible to access the topic and see the messages:

kubectl exec -n kcat -it kcat sh

Note: You are now on the shell of the kcat pod. Notice that the shell prompt has changed.

Invoke the kcat application to connect to the Kafka broker:

kcat -b kafka-all-broker.kafka.svc.cluster.local:29092 -C -t recommendations-topic

The output should be a lot of scrolling messages similar to the following:

...
analytics-message
analytics-message
analytics-message
bookings-message
bookings-message
catalog-message
catalog-message
...

Note: Use CTRL-C to stop the messages from appearing on the screen. However, it may take some time to return you to the shell, depending on how many messages are still queued for display.

Also, you can display the metadata for recommendations-topic by running the following command:

kcat -b kafka-all-broker.kafka.svc.cluster.local:29092 -L -t recommendations-topic
exit

This action will result in output similar to the following:

Metadata for recommendations-topic (from broker -1: kafka-all-broker.kafka.svc.cluster.local:29092/bootstrap):
 2 brokers:
  broker 0 at kafka-0.kafka.svc.cluster.local:29092 (controller)
  broker 1 at kafka-1.kafka.svc.cluster.local:29092
 1 topics:
  topic "recommendations-topic" with 6 partitions:
    partition 0, leader 1, replicas: 1,0, isrs: 1,0
    partition 1, leader 0, replicas: 0,1, isrs: 0,1
    partition 2, leader 1, replicas: 1,0, isrs: 1,0
    partition 3, leader 0, replicas: 0,1, isrs: 0,1
    partition 4, leader 1, replicas: 1,0, isrs: 1,0
    partition 5, leader 0, replicas: 0,1, isrs: 0,1

Finally, we can remove the ACL from the cluster using the kubectl command:

kubectl delete -f ~/calisti-tutorial-sample-code/calisti/sdm/kcat-acl.yaml

8. Declaratively Managing Kafka Topics

Once you start running Apache Kafka in production, an essential task will be managing topics so that producers and consumers can do their work. Usually, you would use one of the Kafka command-line tools to create and manage topics. While it is possible in a Kubernetes environment, it requires a few more steps, such as:

Find the correct pod.
Connect to the pod.
Execute the appropriate command.

While everything is fine with this approach, it is more difficult to reproduce a setup in disaster recovery or when standing up another (test) environment to mirror production.

Instead of connecting to specific brokers to create a new topic, we can use the kubectl CLI. Here, we create a new topic declaratively using YAML. The advantage of this is that we can let Calisti do the heavy lifting of creating the topics within the pods; we simply define our configuration changes in YAML, which can be stored in a version control system for use or reference later.

Create a New Kafka Topic via YAML

Let’s define a new Kafka topic using YAML. This configuration is located within ~/calisti-tutorial-sample-code/calisti/sdm/kafka_topic.yaml and is defined below. We can use the kubectl command to deploy this configuration to the cluster.

apiVersion: kafka.banzaicloud.io/v1alpha1
kind: KafkaTopic
metadata:
    name: my-topic
spec:
    clusterRef:
        name: kafka
    name: my-topic
    partitions: 1
    replicationFactor: 1
    config:
        "retention.ms": "604800000"
        "cleanup.policy": "delete"

kubectl apply -n kafka -f ~/calisti-tutorial-sample-code/calisti/sdm/kafka_topic.yaml

Note: Ensure that you use the -n kafka switch to deploy the configuration to the kafka namespace.

We can then use the dashboard to confirm that the topic is present (Navigate using the Menu and then click TOPICS.)

Kafka Topic List

Alternatively, topics can be created in the user interface.

You might notice the config: key containing additional parameters in the code example above. If the new topic requires different configuration options, please refer to the Kafka documentation.

Update a Kafka Topic via YAML

We are also able to update Kafka topics using kubectl. Let’s increase the number of partitions to 5.

kubectl patch -n kafka kafkatopic my-topic --patch '{"spec": {"partitions": 5}}' --type=merge

The Calisti user interface reflects the updated setting:

Updated Topic

9. Event-Based Scaling with SDM

Calisti SDM exposes Cruise Control and Kafka Java Management Extensions (JMX) metrics to Prometheus and acts as a Prometheus Alertmanager. It receives alerts defined in Prometheus and creates actions based on Prometheus alert annotations so that it can handle and react to alerts automatically, without having to involve human operators.

Vertical Capacity Scaling

There are many situations in which the horizontal scaling of a cluster is impossible. When only one broker is throttling and needs more CPU or requires additional disks (because it handles the most partitions), a StatefulSet-based solution is useless because it does not distinguish between the specifications of replicas. The handling of such a case requires unique broker configurations. If we need to add a new disk to a unique broker, with a StatefulSet-based solution, we waste a lot of disk space (and money). Because it can’t add a disk to a specific broker, the StatefulSet adds one to each replica.

Graceful Kafka Cluster Scaling

To scale Kafka clusters both up and down gracefully, SDM integrates LinkedIn’s Cruise Control and is configured to react to events. The three default actions are:

Upscale cluster (add a new broker)
Downscale cluster (remove a broker)
Add another disk to a broker

Additionally, you can define custom actions.

koperator alerts

Let’s apply a policy that will trigger an alert on Prometheus whenever there is a broker that is not working for more than 30 seconds. This configuration is defined in ~/calisti-tutorial-sample-code/calisti/sdm/kafka_scaling.yaml.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: kafka-rules
    banzaicloud.io/managed-by: supertubes
  name: kafka-autoscale-rules
  namespace: kafka
spec:
  groups:
  - name: KafkaAutoscaleRules
    rules:
    - alert: BrokerUnderReplicated
      expr: kafka_server_replicamanager_underreplicatedpartitions_value > 0
      for: 30s
      labels:
        severity: alert
      annotations:
        description: 'broker underreplicated'
        summary: 'broker underreplicated'
        brokerConfigGroup: 'default'
        command: 'upScale'

kubectl apply -f ~/calisti-tutorial-sample-code/calisti/sdm/kafka_scaling.yaml

This policy will result in the deployment of a new broker as soon as the alert is generated in Prometheus.

If we navigate to the BROKERS menu item, we will see that we currently have two brokers within our cluster.

Kafka Brokers Normal

In order to generate an alert, we will need to connect to one of the brokers and stop the Kafka process. We’ll start by connecting to the shell of a broker:

kubectl exec -it -n kafka "$(kubectl get pods --selector=app=kafka -n kafka -o jsonpath='{.items[0].metadata.name}')" /bin/bash

Note: You are now on the shell of a Kafka pod within the cluster. Note the shell prompt change.

Then, stop the running Kafka process:

kill -STOP "$(ps -ef | grep '[j]ava' | awk '{print $2}')"
exit

After 2 to 5 minutes, there should be a new Kafka broker being created.

Kafka Brokers Recreated

Using the supertubes utility provided with Calisti, we can also manage the Kafka brokers. We will restart the Kafka process in the broker and then manually remove a broker using supertubes.

Let’s connect to the broker terminal shell:

kubectl exec -it -n kafka "$(kubectl get pods --selector=app=kafka -n kafka -o jsonpath='{.items[0].metadata.name}')" /bin/bash

Note: You are now on the shell of a Kafka pod within the cluster. Note the shell prompt change.

Restart the Kafka process. You’ll see that we now have three total brokers.

kill -CONT "$(ps -ef | grep '[j]ava' | awk '{print $2}')"
exit

Kafka Brokers Extra

Finally, we can use the supertubes binary to downscale the number of brokers:

cd ~
./supertubes cluster broker remove --broker-id="$(kubectl get kafkacluster kafka -n kafka -o jsonpath='{.spec.brokers[0].id}')"

This action will generate the following output:

ubuntu@ip-172-31-86-37:~$ ./supertubes cluster broker remove --broker-id="$(kubectl get kafkacluster kafka -n kafka -o jsonpath='{.spec.brokers[0].id}')"
? Are you sure to use the current context? kind-demo1 (API Server: https://127.0.0.1:35503) Yes
? Kafka cluster (namespace "kafka"): kafka
2023-06-23T22:51:19.973Z	INFO	minimum broker count	{"count": 2}
2023-06-23T22:51:19.973Z	INFO	removing brokers {"broker ids": [0], "cluster": "kafka", "namespace": "kafka"}
? Existing resource kafkacluster.kafka.banzaicloud.io:kafka/kafka is not yet managed by us Manage this resource from now on
2023-06-23T22:51:28.184Z	INFO kafkacluster.kafka.banzaicloud.io:kafka/kafka configured

The broker-id is the broker ID number (for example, 0 or 1). It takes 2 to 5 minutes to see the changes reflected in the user interface.

Kafka Brokers Downscale

With SDM, adding a new disk to any broker is as easy as changing a CR configuration. Similarly, any broker-specific configuration can be done on a broker-by-broker basis.

10. Congratulations

Congratulations! You have successfully completed the tutorial exploring Calisti SDM and the features it provides for your Kubernetes environment!

Learn More

Back