Strimzi Training - Lab 5

Lab 5 is using Strimzi 0.6.0. It takes you through different aspects of monitoring Strimzi.

  • Checkout this repository which will be used during the lab:
    • git clone
  • Go to the lab-5 directory
    • cd lab-5
  • Start you OpenShift cluster
    • You should use OpenShift 3.9 or higher
    • Run minishift start or oc cluster up
  • Login as cluster administrator
    • oc login -u system:admin
  • Install the Cluster Operator
    • oc apply -f install/
  • Install the Kafka cluster
    • oc apply -f kafka.yaml
  • Install the Hello World application
    • oc apply -f hello-world.yaml

Command line tools

  • Exec into one of the Kafka pods
    • oc exec -ti my-cluster-kafka-0 -- bash

Topic monitoring

  • Topics can be monitored using the utility
  • Describe all topics
    • bin/ --zookeeper localhost:2181 --describe
  • List all under-replicated partitions
    • bin/ --zookeeper localhost:2181 --describe --under-replicated-partitions (should be empty in our cluster)
  • List unavailable partitions
    • bin/ --zookeeper localhost:2181 --describe --unavailable-partitions (should be empty in our cluster)

Consumer group monitoring

  • List consumer groups
    • bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --list
  • Show consumer group details
    • bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --describe --group my-hello-world-consumer
    • Notice how all partitions are attached to single consumer
    • Try to scale the consumer instances with oc scale deployment hello-world-consumer --replicas=3
    • Check the consumer group details again and notice how the consumer group rebalanced with the new replicas
    • Try to scale the consumer to 0 instances with oc scale deployment hello-world-consumer --replicas=0
    • After the pods are terminated, check the consumer group again and notice how the lag starts incresing
  • Reset the consumer offset for the topic my-topic
    • Reset the offset to the latest message with bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --group my-hello-world-consumer --reset-offsets --topic my-topic --execute --to-latest and check that the lag disappeared
    • Or to the earliest message with bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --group my-hello-world-consumer --reset-offsets --topic my-topic --execute --to-earliest and check that the offsets are now 0.
    • Scale the consumer to at least one pod with oc scale deployment hello-world-consumer --replicas=3
    • Verify that the new pod started consuming messages from the offset 0
    • On your own: Play with the other options how to reset the offsets based on time, to specific offset etc.
  • Offsets can be observed with from different perspectives
    • The default perspective was based on partition
    • You can also display the offsets based on the client
    • bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --describe --group my-hello-world-consumer --members --verbose

Log dirs

  • Log dirs used to store topic can be described using
    • Run bin/ --bootstrap-server my-cluster-kafka-bootstrap:9092 --describe --topic-list my-topic to list where are the data of our topic located
    • Formated example of the output can be found in kafka-log-dirs-example.json
    • This tool can be used to analyze the topic logs for offset and to see in which directory it is stored (useful when JBOD storage is used - currently not supported by Strimzi / AMQ Streams)

Kafka Connect

  • Deploy Kafka Connect
    • oc apply -f connect.yaml
    • The deployment also created an OpenShift Route for the Kafka Connect REST API
  • Deploy the FileSink connetor from your local command line:
    • curl -X POST -H "Content-Type: application/json" --data '{ "name": "sink-test", "config": { "connector.class": "FileStreamSink", "tasks.max": "1", "topics": "my-topic", "file": "/tmp/test.sink.txt" } }'
  • Check that the connector has been deployed and that it works:
    • curl
    • curl
    • curl
  • Try to deploy another connector which will fail:
    • curl -X POST -H "Content-Type: application/json" --data '{ "name": "sink-failing", "config": { "connector.class": "FileStreamSink", "tasks.max": "1", "topics": "my-topic", "file": "/root/" } }'
  • Check the state of the failing connector:
    • curl
    • curl

Prometheus metrics

  • Check the kafka.yaml and connect.yaml files
    • Check the metrics configuration in the metrics fields
  • Deploy Prometheus and Grafana installation
    • oc apply -f prometheus/
  • Open Grafana on address
    • Login with username admin and password admin
    • Click the Add data source button
    • Add new data source with following options:
      • Name: Prometheus
      • Type: Prometheus
      • URL: http://prometheus:9090
      • Press the Add button and make sure is says Success: Data source is working
      • In the same window switch to Dahsboard and import the Prometheus Stats dashboard
    • Click the icon in the top left corner and select Dashboards and Home and afterwards in the menu next to the icon select the Prometheus Stats dashboard.
      • Verify that you see the Target Scrapes and Scrape Duration charts
      • These show metrics of how Prometheus scrapes the Kafka metrics
    • Click the icon in the top left corner and select Dashboards and Import
      • In the import window, select the dashboard.json file from this directory and Prometheus as the data source and import it
      • Select the Kafka Dashboard and have a look at the metrics
    • Play with the Kafka components and watch how it reflects ni the metrics. For example:
      • Scale the consumer down to 0 using oc scale deployment hello-world-consumer --replicas=0
      • Scale the producer to 0 using oc scale deployment hello-world-producer --replicas=0
      • Change the number of messages the producer is sending (environment variable DELAY_MS in the producer deployment)
    • Try to add some new chart to the Dashboard
      • Click on the title of the Bytes Out Per Second chart and select duplicate
      • Click th title of the newly added graph and select edit
      • In the General tab change the title to Bytes In Per Second: my-topic
      • In the Metrics change the query to sum(kafka_server_brokertopicmetrics_bytesinpersec_topic_my_topic)
      • On you own: Try to do this for other topics and find out which topic generates most trafic