-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CC reporting less disk for partition load #2155
Comments
@rmb938 thanks for report, do you mind sharing more details/evidence? |
Yup I can provide some more details and evidence. Give me a day or so to
recollect the data. Unfortunately I did not save my initial findings.
…On Tue, Aug 27, 2024, 10:03 PM Maryan Hratson ***@***.***> wrote:
@rmb938 <https://github.com/rmb938> thanks for report, do you mind
sharing more details/evidence?
—
Reply to this email directly, view it on GitHub
<#2155 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEE6IMVHRC3UQPX5S6ZDFLZTU4XHAVCNFSM6AAAAABIH5HIUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJUGEYDMOBZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When looking at the output from kafka-log-dirs and comparing it to cruise control's partition load rest api, it seems like cruise control is showing a smaller disk amount.
This leads to the broker load showing less disk then it should, and the cluster not balancing disk correctly when disk is set as a goal.
Looking into this further it seems like CC is only reporting the partition disk size from the leader, it doesn't also use the partition disk sizes from the followers.
Most of the time the leaders and followers will have pretty close partition sizes so this issue doesn't matter as much. However taking into account that each Kafka broker runs it's log cleaner independently the sizes between every partition replica could be different.
In extremely large Kafka clusters that have hundreds of terabytes of data and billions of messages per topic this difference does add up and having cruise control be unaware of this when determining broker load does leave the cluster unbalanced.
In the worst possible case I have seen, it is around a 1-2TB difference between what kafka-log-dirs says and what CC reports as broker disk usage. But I've seen a difference from a few megabytes to around 100-200GB. This is relatively small compared to the overall cluster size, but without cruise control knowing about this the brokers do end up being unbalanced over time.
The text was updated successfully, but these errors were encountered: