LevelDB compaction performance issue with large worlds (especially converted worlds) #6580
Labels
Category: Core
Related to internal functionality
Performance
Status: Debugged
Cause of the bug has been found, but not fixed
Problem description
LevelDB automatic level compaction tends to continuously hammer the disk with larger worlds (several GB).
Compacting 10 GB of data (at level 4) can take around 30 minutes of continuous I/O usage. LevelDB attempts to distribute this work in order to not max out I/O, but this appears to manifest as continuous background load on disks.
Wtf is compaction? Why is this happening?
Compaction orders all the data keys in a DB level according to the DB's comparator (by default a simple bytewise comparator that sorts like alphabetically but for byte values).
When a level in the DB fills up, the data from the lower level is merged with the higher level, and the higher level's data may be rebuilt to order it correctly.
During compaction, any files whose key ranges do not overlap with the higher level are directly moved to the higher level without rebuilding. This is cheap, so we want this to be the norm.
Files whose key ranges do overlap are costly to merge into the higher level, because the new data must be sorted with the set of existing data in the higher level.
This means that key structure is critical to getting best compaction performance. Keys that are accessed at similar times (e.g. chunks with similar coordinates) should have leading key bits that are as similar as possible to minimize range overlap.
However, thanks to Mojang's choice of key format, the data from lower levels is practically guaranteed to overlap because the key ordering is so disorganized relative to actual data usage patterns. This means that every compaction is I/O intensive. This is a big problem with large worlds, where the I/O cost becomes very obvious and sometimes continuous.
There are two main problems:
0,0
,0,1
,1,0
,1,1
, chunk0,2
will land in the middle of this order, requiring a lot of data to be moved. Scale this up 10 million times and you can see why large DBs are a problem.strrev(name)
.Proposed solution
There's a couple of ways to solve this problem:
2a) Use Z-order (morton) codes or another space-filling curve - this would structure keys to look like XZXZXZXZ at the bit level instead of XXXXZZZZ.
2b) Encode in big-endian to make sure related chunks data appears nearer each other. Little-endian doesn't make sense for this case.
The bottom line is that there's no way to fix this without deviating from the Mojang world format and forcing users to rebuild their worlds.
Really, this is a change that Mojang should make, but I doubt it'll ever happen, since vanilla worlds rarely grow big enough for this kind of problem to show up.
Alternative solutions that don't require API changes
There doesn't seem to be a good alternative solution to this problem. I considered having compactions triggered more frequently and on specific key sets, but I don't think it will address the problem of constantly hammering I/O. In addition, compactions by key ranges aren't super helpful anyway on account of poor cache locality.
The text was updated successfully, but these errors were encountered: