Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cruise Control with MSK #2146

Open
UdayaPriyaKannan opened this issue Apr 30, 2024 · 4 comments
Open

Cruise Control with MSK #2146

UdayaPriyaKannan opened this issue Apr 30, 2024 · 4 comments

Comments

@UdayaPriyaKannan
Copy link

UdayaPriyaKannan commented Apr 30, 2024

Looking for some help in getting Cruise Control working against an AWS MSK cluster. I tried setting up the configuration as per these instructions. All the metrics from MSK are pushed to Prometheus. We are not explicitly filtering any metrics. Also, from the CruiseControl host, we are able to access the JMX and Node metrics on ports 11001 and 11002 of the brokers directly. I was able to configure cruise-control server and UI successfully but I could see the below observations in Cruise control UI

Kafka cluster state metrics like partition count, replicas are visible but Kafka cluster load, Kafka partition load, Resource distribution tabs are not available stating GET request failure.

ERROR: Error processing GET request '/load' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1712057449014] (index [1, -1]). Window index (current: 0, oldest: 0).

I'm not able to dry-run any Kafka cluster administration tasks. Getting same exception as above.

Both Cruise Control and UI are latest from GitHub
The Kafka version in Amazon MSK is 3.2.0 and the CruiseControl version being used is 2.5.137.
In the monitored windows, I could observe 0% training.
Initially, we created the __CruiseControlMetrics topic manually since it was not present and in the default configuration of MSK nodes auto.create.topics is set to false.
Topics __KafkaCruiseControlPartitionMetricSamples and __KafkaCruiseControlModelTrainingSamples were created automatically and they have data in them whereas “__CruiseControlMetrics” topic is empty.
Also, I could see below line in the cruise control server logs
App info kafka.consumer for KafkaCruiseControlSampleStore-consumer-unregistered

@marcelloromani
Copy link

NotEnoughValidWindowsException means that CC hasn't been able to collect enough data yet about the MSK cluster.

In my experience metrics from MSK must be fetched from the OpenTelemetry ports using Prometheus. The default instructions do no work as with MSK you can't just "drop a jar in the Kafka server classpath".

I started my journey here: https://docs.aws.amazon.com/msk/latest/developerguide/cruise-control.html

@UdayaPriyaKannan
Copy link
Author

WARN Skip generating metric sample for broker 2 because the following required metrics are missing [ALL_TOPIC_REPLICATION_BYTES_OUT, ALL_TOPIC_BYTES_OUT, BROKER_PRODUCE_TOTAL_TIME_MS_MEAN, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MAX, ALL_TOPIC_BYTES_IN, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_REQUEST_QUEUE_SIZE, ALL_TOPIC_FETCH_REQUEST_RATE, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, ALL_TOPIC_MESSAGES_IN_PER_SEC, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MAX, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MEAN, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN, ALL_TOPIC_PRODUCE_REQUEST_RATE, BROKER_FOLLOWER_FETCH_REQUEST_RATE, BROKER_PRODUCE_TOTAL_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_LOCAL_TIME_MS_MEAN, BROKER_PRODUCE_LOCAL_TIME_MS_MEAN, BROKER_FOLLOWER_FETCH_TOTAL_TIME_MS_MEAN, BROKER_REQUEST_HANDLER_AVG_IDLE_PERCENT, BROKER_PRODUCE_REQUEST_QUEUE_TIME_MS_MAX, BROKER_CONSUMER_FETCH_LOCAL_TIME_MS_MAX, ALL_TOPIC_REPLICATION_BYTES_IN, BROKER_CONSUMER_FETCH_REQUEST_QUEUE_TIME_MS_MEAN, BROKER_PRODUCE_LOCAL_TIME_MS_MAX, BROKER_FOLLOWER_FETCH_REQUEST_QUEUE_TIME_MS_MAX, BROKER_RESPONSE_QUEUE_SIZE, BROKER_CONSUMER_FETCH_TOTAL_TIME_MS_MAX]. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingUtils)

I followed the instructions in the developer guide but a lot of broker metrics are missing. Please help me figure out whats wrong.

@micr01996
Copy link

Hello @UdayaPriyaKannan, have you solved this issue?
I'm getting the same and i have replicated conf from AWS labs. It seems that as we're already scraping metrics from MSK there is some conflicts happening. There is left big window to make sure that cc has enough time for getting the metrics.

@UdayaPriyaKannan
Copy link
Author

@micr01996 No, the issue is not solved yet
My training stopped at 20%
I'm able to do a PLE dry run but other operations in Kafka cluster administration tab throws Not Enough Valid Windows exception.
Kafka partition load, resource distribution tab also throws the same exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants