-
Notifications
You must be signed in to change notification settings - Fork 54
rebalance command
rebalance
is used for:
- targeted broker storage rebalancing*
- incremental scaling
*In contrast to storage rebalancing in rebuild
(which requires that 100% of partitions for a targeted topic are relocated), rebalance
is used for partial partition rebalancing from most to least storage utilized brokers.
Rebalance works by examining the free storage utilization on all referenced brokers and selecting those that are more than 20% below the harmonic mean (configurable via the --storage-threshold
parameter). For each broker targeted for partition offloading, partitions are planned for relocation to the least-utilized, most-suitable destination.
Destination broker suitability is determined as either:
- (locality scoped) the least utilized broker with the same
rack.id
as the offload target - (non locality scoped) the least utilized broker that wouldn't result in duplicate
rack.id
values in the resulting replica list
A partition relocation plan is then computed. The relocation planner runs in a fair-share, first-fit descending fashion: it iterates over each broker targeted for partition offloading and plans a relocation for the largest partition that that won't exceed the upper/lower storage bounds. The storage bounds are determined by the --tolerance
parameter (the default being automatic, optimal selection). Each broker is allowed to schedule at most one partition relocation before the scheduler moves on to the next broker. The offload broker list is iterated over until no more relocations can be scheduled. The relocation plan is then translated to a partition map and stored for the user to apply (as a kafka-reassign-partitions
compatible file).
-
Rebalance takes an input topic list (similar to rebuild: comma delimited with regex support) and a broker list. Typically the broker list would include all brokers that the target topics(s) currently occupy. Removing brokers is not allowed in rebalance; only adding additional, new brokers is permitted. All 'mapped' brokers (that is, brokers that hold at least one partition for any topic referenced in the
--topics
input) can be automatically referenced with-1
as an input to--brokers
.-1
automatically expands to the mapped broker IDs. -
Rebalance uses the same broker/topic metrics mechanism as rebuild (both of which can be supplemented with metricsfetcher).
-
Alternatively, brokers below a free storage in gigabytes can be targeted for offload using the
--storage-threshold-gb
flag. -
Relocations can be scoped by
rack.id
via the--locality-scoped
flag. For instance, ifrack.id
values reflected physical data centers, performing a rebalance with a locality scope would rebalance partitions among brokers per each data center in isolation. -
The
--tolerance
flag specifies specifies the upper and lower storage bounds; these are boundaries that limit how much data can be moved from offload targets and to destination targets as a distance (in percent) from the storage free arithmetic mean. If using the default tolerance of 10% with a broker mean storage free of 800GB, a partition cannot be moved when:- the source free storage would exceed 880GB (mean+10%)
- the destination free storage would drop 720GB (mean-10%)
Specifying a value of 0
(default) results in topicmappr automatically choosing an optimal tolerance value. It does this by computing a map for every tolerance value between 1 and 100 in parallel, then choses the result with the lowest resulting storage utilization {range-spread, std. deviation}.
Fetching up-to-date metrics data with metricsfetcher:
$ metricsfetcher --broker-storage-query "avg:system.disk.free{cluster:kafka-test,device:/data}" --partition-size-query "max:kafka.log.partition.size{cluster:kafka-test} by {topic,partition}"
Submitting max:kafka.log.partition.size{cluster:kafka-test} by {topic,partition}.rollup(avg, 3600)
success
Submitting avg:system.disk.free{cluster:kafka-test,device:/data} by {broker_id}.rollup(avg, 3600)
success
Data written to ZooKeeper
Running rebuild for "test-topic" and providing all of the brokers "test-topic" partitions reside on (the --brokers
has an explicit broker list, but as described in the usage section, --brokers -1
would yield the same results):
$ topicmappr rebalance --topics "test-topic" --brokers 1200,1201,1202,1203,1205,1208,1209,1211,1212,1213,1214,1215,1216,1217,12
20,1223,1224,1225,1234,1235,1236,1247,1254,1255,1256,1267,1376 --storage-threshold 0.05 --tolerance 0.2 | grep -v no-op
Topics:
test-topic
Validating broker list:
OK
Rebalance parameters:
Free storage mean, harmonic mean: 2299.03GB, 2199.97GB
Broker free storage limits (with a 20.00% tolerance from mean):
Sources limited to <= 2758.83GB
Destinations limited to >= 1839.22GB
Brokers targeted for partition offloading (>= 5.00% threshold below hmean):
1203
1209
1211
1212
1214
1217
1224
1225
1247
1255
1256
1376
Broker 1203 relocations planned:
[800.20GB] test-topic p117 -> 1200
Broker 1209 relocations planned:
[827.74GB] test-topic p119 -> 1235
Broker 1211 relocations planned:
[602.12GB] test-topic p125 -> 1236
Broker 1212 relocations planned:
[825.81GB] test-topic p22 -> 1208
Broker 1214 relocations planned:
[678.96GB] test-topic p59 -> 1213
[510.32GB] test-topic p37 -> 1213
Broker 1217 relocations planned:
[none]
Broker 1224 relocations planned:
[692.60GB] test-topic p118 -> 1220
Broker 1225 relocations planned:
[255.21GB] test-topic p75 -> 1216
Broker 1247 relocations planned:
[none]
Broker 1255 relocations planned:
[660.11GB] test-topic p20 -> 1235
Broker 1256 relocations planned:
[none]
Broker 1376 relocations planned:
[none]
Partition map changes:
test-topic p20: [1255 1203] -> [1235 1203] replaced broker
test-topic p22: [1211 1212] -> [1211 1208] replaced broker
test-topic p37: [1217 1214] -> [1217 1213] replaced broker
test-topic p59: [1236 1214] -> [1236 1213] replaced broker
test-topic p75: [1225 1209] -> [1216 1209] replaced broker
test-topic p117: [1203 1247] -> [1200 1247] replaced broker
test-topic p118: [1247 1224] -> [1247 1220] replaced broker
test-topic p119: [1225 1209] -> [1225 1235] replaced broker
test-topic p125: [1212 1211] -> [1212 1236] replaced broker
Broker distribution:
degree [min/max/avg]: 2/7/4.30 -> 2/7/4.81
-
Broker 1200 - leader: 5, follower: 3, total: 8
Broker 1201 - leader: 4, follower: 4, total: 8
Broker 1202 - leader: 5, follower: 5, total: 10
Broker 1203 - leader: 4, follower: 5, total: 9
Broker 1205 - leader: 5, follower: 5, total: 10
Broker 1208 - leader: 4, follower: 5, total: 9
Broker 1209 - leader: 5, follower: 4, total: 9
Broker 1211 - leader: 5, follower: 4, total: 9
Broker 1212 - leader: 5, follower: 4, total: 9
Broker 1213 - leader: 4, follower: 6, total: 10
Broker 1214 - leader: 5, follower: 3, total: 8
Broker 1215 - leader: 5, follower: 5, total: 10
Broker 1216 - leader: 6, follower: 5, total: 11
Broker 1217 - leader: 5, follower: 5, total: 10
Broker 1220 - leader: 5, follower: 5, total: 10
Broker 1223 - leader: 5, follower: 5, total: 10
Broker 1224 - leader: 5, follower: 4, total: 9
Broker 1225 - leader: 4, follower: 5, total: 9
Broker 1234 - leader: 5, follower: 5, total: 10
Broker 1235 - leader: 4, follower: 6, total: 10
Broker 1236 - leader: 4, follower: 6, total: 10
Broker 1247 - leader: 5, follower: 5, total: 10
Broker 1254 - leader: 5, follower: 5, total: 10
Broker 1255 - leader: 4, follower: 5, total: 9
Broker 1256 - leader: 5, follower: 5, total: 10
Broker 1267 - leader: 5, follower: 4, total: 9
Broker 1376 - leader: 5, follower: 5, total: 10
Storage free change estimations:
range: 2031.15GB -> 971.02GB
range spread: 130.47% -> 53.45%
std. deviation: 521.41GB -> 305.21GB
-
Broker 1200: 3587.97 -> 2787.77 (-800.20GB, -22.30%)
Broker 1201: 2708.39 -> 2708.39 (+0.00GB, 0.00%)
Broker 1202: 2209.01 -> 2209.01 (+0.00GB, 0.00%)
Broker 1203: 1865.20 -> 2665.40 (+800.20GB, 42.90%)
Broker 1205: 2120.30 -> 2120.30 (+0.00GB, 0.00%)
Broker 1208: 3224.55 -> 2398.75 (-825.81GB, -25.61%)
Broker 1209: 1912.19 -> 2739.93 (+827.74GB, 43.29%)
Broker 1211: 1873.23 -> 2475.35 (+602.12GB, 32.14%)
Broker 1212: 1916.88 -> 2742.69 (+825.81GB, 43.08%)
Broker 1213: 3165.90 -> 1976.62 (-1189.28GB, -37.57%)
Broker 1214: 1556.82 -> 2746.10 (+1189.28GB, 76.39%)
Broker 1215: 2091.04 -> 2091.04 (+0.00GB, 0.00%)
Broker 1216: 2150.41 -> 1895.21 (-255.21GB, -11.87%)
Broker 1217: 1816.75 -> 1816.75 (+0.00GB, 0.00%)
Broker 1220: 2877.80 -> 2185.20 (-692.60GB, -24.07%)
Broker 1223: 2347.95 -> 2347.95 (+0.00GB, 0.00%)
Broker 1224: 1977.97 -> 2670.58 (+692.60GB, 35.02%)
Broker 1225: 1960.09 -> 2215.30 (+255.21GB, 13.02%)
Broker 1234: 2109.06 -> 2109.06 (+0.00GB, 0.00%)
Broker 1235: 3369.32 -> 1881.47 (-1487.85GB, -44.16%)
Broker 1236: 2656.35 -> 2054.22 (-602.12GB, -22.67%)
Broker 1247: 1956.20 -> 1956.20 (+0.00GB, 0.00%)
Broker 1254: 2416.52 -> 2416.52 (+0.00GB, 0.00%)
Broker 1255: 1850.83 -> 2510.94 (+660.11GB, 35.67%)
Broker 1256: 1986.07 -> 1986.07 (+0.00GB, 0.00%)
Broker 1267: 2301.33 -> 2301.33 (+0.00GB, 0.00%)
Broker 1376: 2065.64 -> 2065.64 (+0.00GB, 0.00%)
New partition maps:
test-topic.json
Results after applying test-topic.json
(red bars indicate start, finish events from autothrottle):
NOTE: this has been deprecated in favor of the scale
subcommand.
The rebalance command can effectively be used for scaling a topic incrementally (introducing new brokers in addition to existing brokers). This is done by providing the existing brokers list hosting a topic along with additional brokers.
The default --storage-threshold
of 0.2
is best suited for targeting moderate to extreme outlier brokers in a normal rebalance scenario. In a scaling scenario, it is likely desired to draw partitions from most or all of the original brokers to relocate to the newly provided brokers.
There's several ways to do this:
- setting
--storage-threshold
to0
to automatically target all original brokers (preferred) - setting an explicit
--storage-threshold-gb
value - lowering the
--storage-threshold
value
If a scale up is intended that will target all original brokers, it's highly recommended to add an equal number of brokers per rack.id
used. Otherwise, brokers will not be able to schedule relocations unless --locality-scoped
is set to false
.
Lastly, it's likely that a non-default --tolerance
value will be optimal. In testing, scaling an existing broker pool that was mostly in balance showed optimal partition placement with a tolerance value of 0.02
.
Example running a scale up where the broker list includes the original 18 brokers a topic was mapped to with an additional 6 new brokers:
$ topicmappr rebalance --topics test-topic --brokers 1652,1653,1654,1655,1656,1657,1658,1659,1660,1661,1662,1663,1664,1665,1666,1667,1668,1669,1670,1671,1672,1673,1674,1675 --storage-threshold 0 --tolerance 0.02 | grep -v no-op
Topics:
test-topic
Validating broker list:
New broker 1670
New broker 1675
New broker 1671
New broker 1673
New broker 1674
New broker 1672
-
6 additional brokers added
-
OK
Rebalance parameters:
Free storage mean, harmonic mean: 2319.90GB, 2170.92GB
Broker free storage limits (with a 2.00% tolerance from mean):
Sources limited to <= 2366.29GB
Destinations limited to >= 2273.50GB
Brokers targeted for partition offloading (>= 0.00% threshold below hmean):
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
Broker 1660 relocations planned:
[191.10GB] test-topic p17 -> 1671
[181.75GB] test-topic p67 -> 1674
[176.98GB] test-topic p13 -> 1671
Broker 1659 relocations planned:
[181.50GB] test-topic p69 -> 1674
[168.80GB] test-topic p10 -> 1671
[155.98GB] test-topic p15 -> 1674
Broker 1661 relocations planned:
[202.70GB] test-topic p8 -> 1674
[184.24GB] test-topic p70 -> 1674
Broker 1653 relocations planned:
[162.43GB] test-topic p7 -> 1673
[158.50GB] test-topic p65 -> 1675
[116.04GB] test-topic p39 -> 1675
Broker 1667 relocations planned:
[181.61GB] test-topic p16 -> 1672
[172.74GB] test-topic p11 -> 1670
[98.39GB] test-topic p118 -> 1670
Broker 1664 relocations planned:
[216.87GB] test-topic p18 -> 1671
[151.79GB] test-topic p19 -> 1671
Broker 1658 relocations planned:
[184.24GB] test-topic p70 -> 1675
[181.75GB] test-topic p67 -> 1673
Broker 1657 relocations planned:
[216.87GB] test-topic p18 -> 1673
[202.79GB] test-topic p68 -> 1675
Broker 1654 relocations planned:
[181.50GB] test-topic p69 -> 1675
[178.05GB] test-topic p6 -> 1673
[57.76GB] test-topic p96 -> 1673
Broker 1668 relocations planned:
[191.10GB] test-topic p17 -> 1670
[178.05GB] test-topic p6 -> 1672
Broker 1662 relocations planned:
[149.87GB] test-topic p14 -> 1674
[142.93GB] test-topic p56 -> 1674
Broker 1666 relocations planned:
[202.79GB] test-topic p68 -> 1672
[154.61GB] test-topic p73 -> 1670
[45.10GB] test-topic p45 -> 1672
Broker 1669 relocations planned:
[190.34GB] test-topic p12 -> 1670
[168.56GB] test-topic p66 -> 1672
Broker 1655 relocations planned:
[202.70GB] test-topic p8 -> 1675
[168.56GB] test-topic p66 -> 1673
Broker 1663 relocations planned:
[155.98GB] test-topic p15 -> 1670
[142.54GB] test-topic p5 -> 1670
[57.76GB] test-topic p96 -> 1672
Broker 1656 relocations planned:
[190.34GB] test-topic p12 -> 1673
[157.66GB] test-topic p9 -> 1675
Broker 1665 relocations planned:
[157.66GB] test-topic p9 -> 1672
[149.70GB] test-topic p57 -> 1672
Broker 1652 relocations planned:
[172.74GB] test-topic p11 -> 1671
[111.72GB] test-topic p59 -> 1671
Partition map changes:
test-topic p5: [1663 1655] -> [1670 1655] replaced broker
test-topic p6: [1654 1668] -> [1673 1672] replaced broker
test-topic p7: [1653 1669] -> [1673 1669] replaced broker
test-topic p8: [1655 1661] -> [1675 1674] replaced broker
test-topic p9: [1656 1665] -> [1675 1672] replaced broker
test-topic p10: [1667 1659] -> [1667 1671] replaced broker
test-topic p11: [1652 1667] -> [1671 1670] replaced broker
test-topic p12: [1669 1656] -> [1670 1673] replaced broker
test-topic p13: [1660 1654] -> [1671 1654] replaced broker
test-topic p14: [1657 1662] -> [1657 1674] replaced broker
test-topic p15: [1659 1663] -> [1674 1670] replaced broker
test-topic p16: [1661 1667] -> [1661 1672] replaced broker
test-topic p17: [1668 1660] -> [1670 1671] replaced broker
test-topic p18: [1664 1657] -> [1671 1673] replaced broker
test-topic p19: [1666 1664] -> [1666 1671] replaced broker
test-topic p39: [1665 1653] -> [1665 1675] replaced broker
test-topic p45: [1656 1666] -> [1656 1672] replaced broker
test-topic p56: [1658 1662] -> [1658 1674] replaced broker
test-topic p57: [1665 1660] -> [1672 1660] replaced broker
test-topic p59: [1663 1652] -> [1663 1671] replaced broker
test-topic p65: [1652 1653] -> [1652 1675] replaced broker
test-topic p66: [1669 1655] -> [1672 1673] replaced broker
test-topic p67: [1660 1658] -> [1674 1673] replaced broker
test-topic p68: [1657 1666] -> [1675 1672] replaced broker
test-topic p69: [1659 1654] -> [1674 1675] replaced broker
test-topic p70: [1661 1658] -> [1674 1675] replaced broker
test-topic p73: [1666 1660] -> [1670 1660] replaced broker
test-topic p96: [1654 1663] -> [1673 1672] replaced broker
test-topic p118: [1667 1664] -> [1670 1664] replaced broker
Broker distribution:
degree [min/max/avg]: 7/11/8.89 -> 4/10/7.92
-
Broker 1652 - leader: 6, follower: 6, total: 12
Broker 1653 - leader: 6, follower: 5, total: 11
Broker 1654 - leader: 5, follower: 6, total: 11
Broker 1655 - leader: 6, follower: 6, total: 12
Broker 1656 - leader: 6, follower: 6, total: 12
Broker 1657 - leader: 6, follower: 6, total: 12
Broker 1658 - leader: 7, follower: 5, total: 12
Broker 1659 - leader: 5, follower: 7, total: 12
Broker 1660 - leader: 5, follower: 6, total: 11
Broker 1661 - leader: 6, follower: 6, total: 12
Broker 1662 - leader: 7, follower: 6, total: 13
Broker 1663 - leader: 6, follower: 5, total: 11
Broker 1664 - leader: 7, follower: 5, total: 12
Broker 1665 - leader: 6, follower: 6, total: 12
Broker 1666 - leader: 7, follower: 5, total: 12
Broker 1667 - leader: 6, follower: 5, total: 11
Broker 1668 - leader: 6, follower: 6, total: 12
Broker 1669 - leader: 5, follower: 8, total: 13
Broker 1670 - leader: 5, follower: 2, total: 7
Broker 1671 - leader: 3, follower: 4, total: 7
Broker 1672 - leader: 2, follower: 6, total: 8
Broker 1673 - leader: 3, follower: 4, total: 7
Broker 1674 - leader: 4, follower: 3, total: 7
Broker 1675 - leader: 3, follower: 4, total: 7
Storage free change estimations:
range: 330.33GB -> 149.22GB
range spread: 19.12% -> 6.70%
std. deviation: 79.92GB -> 38.49GB
-
Broker 1652: 2057.61 -> 2342.07 (+284.46GB, 13.82%)
Broker 1653: 1894.79 -> 2331.75 (+436.96GB, 23.06%)
Broker 1654: 1943.69 -> 2361.00 (+417.31GB, 21.47%)
Broker 1655: 1969.27 -> 2340.53 (+371.26GB, 18.85%)
Broker 1656: 2007.44 -> 2355.43 (+347.99GB, 17.34%)
Broker 1657: 1943.51 -> 2363.17 (+419.65GB, 21.59%)
Broker 1658: 1941.90 -> 2307.89 (+365.99GB, 18.85%)
Broker 1659: 1778.32 -> 2284.60 (+506.28GB, 28.47%)
Broker 1660: 1727.29 -> 2277.11 (+549.82GB, 31.83%)
Broker 1661: 1841.17 -> 2228.11 (+386.94GB, 21.02%)
Broker 1662: 1957.37 -> 2250.16 (+292.79GB, 14.96%)
Broker 1663: 2005.48 -> 2361.76 (+356.28GB, 17.77%)
Broker 1664: 1921.78 -> 2290.43 (+368.65GB, 19.18%)
Broker 1665: 2021.37 -> 2328.72 (+307.35GB, 15.21%)
Broker 1666: 1958.24 -> 2360.74 (+402.50GB, 20.55%)
Broker 1667: 1903.49 -> 2356.23 (+452.73GB, 23.78%)
Broker 1668: 1948.37 -> 2317.52 (+369.15GB, 18.95%)
Broker 1669: 1958.26 -> 2317.16 (+358.89GB, 18.33%)
Broker 1670: 3483.02 -> 2377.33 (-1105.69GB, -31.75%)
Broker 1671: 3483.02 -> 2293.05 (-1189.97GB, -34.16%)
Broker 1672: 3483.02 -> 2341.81 (-1141.22GB, -32.77%)
Broker 1673: 3483.02 -> 2327.28 (-1155.75GB, -33.18%)
Broker 1674: 3483.02 -> 2284.06 (-1198.97GB, -34.42%)
Broker 1675: 3483.02 -> 2279.61 (-1203.42GB, -34.55%)
New partition maps:
test-topic.json
After applying the map:
While running any of the above operations, it's possible to finally optimize each broker's leader to follower ratio using the --optimize-leadership flag.
See the leadership optimization section in the Rebuild command documentation.
Enabling --verbose
will give per offload target, per partition placement decision information.
- It has few, large partitions and even the smallest one available would free up too much storage on the source or consume too much on any destination.
- All partitions examined were too large to find an optimal relocation. Increasing the
--partition-limit
flag beyond the default of 30 increases the likelihood of finding a possible relocation (if the broker holds more than 30 partitions). - No suitable destination brokers have enough free storage. Possible actions:
- adding additional brokers to the congested
rack.id
locality - disabling locality scoping (
--locality-scoped=false
)
- adding additional brokers to the congested
The storage range is a key metric in improving storage balance. Sometimes a poor range can be a result of offload targets being unable to schedule relocations (see above). Factors such as partition counts, distribution, sizes, broker counts, replica locality and other constraints make this a difficult problem to optimize for.
Likewise, which brokers to target for offloading is an influencing factor. Larger --storage-threshold
values (such as the default 20%) are intended to target outlier brokers. If balance is somewhat good to begin with, lower values (such as 5% in the example) can be used to target more brokers, which opens more opportunity for improved balance. At some point, it may be best to use the rebuild command with the storage placement functionality and just build a storage optimal map from scratch on a new set of target brokers.