Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LevelDB compaction performance issue with large worlds (especially converted worlds) #6580

Open
dktapps opened this issue Dec 20, 2024 · 1 comment
Labels
Category: Core Related to internal functionality Performance Status: Debugged Cause of the bug has been found, but not fixed

Comments

@dktapps
Copy link
Member

dktapps commented Dec 20, 2024

Problem description

LevelDB automatic level compaction tends to continuously hammer the disk with larger worlds (several GB).

Compacting 10 GB of data (at level 4) can take around 30 minutes of continuous I/O usage. LevelDB attempts to distribute this work in order to not max out I/O, but this appears to manifest as continuous background load on disks.

Wtf is compaction? Why is this happening?

Compaction orders all the data keys in a DB level according to the DB's comparator (by default a simple bytewise comparator that sorts like alphabetically but for byte values).
When a level in the DB fills up, the data from the lower level is merged with the higher level, and the higher level's data may be rebuilt to order it correctly.

During compaction, any files whose key ranges do not overlap with the higher level are directly moved to the higher level without rebuilding. This is cheap, so we want this to be the norm.
Files whose key ranges do overlap are costly to merge into the higher level, because the new data must be sorted with the set of existing data in the higher level.

This means that key structure is critical to getting best compaction performance. Keys that are accessed at similar times (e.g. chunks with similar coordinates) should have leading key bits that are as similar as possible to minimize range overlap.

However, thanks to Mojang's choice of key format, the data from lower levels is practically guaranteed to overlap because the key ordering is so disorganized relative to actual data usage patterns. This means that every compaction is I/O intensive. This is a big problem with large worlds, where the I/O cost becomes very obvious and sometimes continuous.

There are two main problems:

  • Keys are formatted XXXXZZZZ, so chunks with the same X coordinate will appear next to each other regardless of how far apart they are in the actual world. e.g. if you have chunks 0,0,0,1,1,0,1,1, chunk 0,2 will land in the middle of this order, requiring a lot of data to be moved. Scale this up 10 million times and you can see why large DBs are a problem.
  • Keys use little-endian packed int32s, meaning that chunks with similar coordinates will appear about as far apart in the DB as they possibly could (e.g. X coordinates would be ordered like 0, 256, 65536, 1, 257, 65537, etc) since the keys are compared byte-by-byte. This is like having an index at a library having the names of books sorted according to strrev(name).

Proposed solution

There's a couple of ways to solve this problem:

  1. Use a custom comparator to sort the keys optimally - Not super portable because the whole DB must be rebuilt if comparator was changed
  2. Change the key structure -also not ideal, would also require a full DB rebuild
    2a) Use Z-order (morton) codes or another space-filling curve - this would structure keys to look like XZXZXZXZ at the bit level instead of XXXXZZZZ.
    2b) Encode in big-endian to make sure related chunks data appears nearer each other. Little-endian doesn't make sense for this case.
  3. Split the DB up into "regions" similar to Anvil & MCRegion. This would avoid the key overlap problem to a large extent by ensuring that these overlapping keys don't appear in the same DB in the first place.

The bottom line is that there's no way to fix this without deviating from the Mojang world format and forcing users to rebuild their worlds.
Really, this is a change that Mojang should make, but I doubt it'll ever happen, since vanilla worlds rarely grow big enough for this kind of problem to show up.

Alternative solutions that don't require API changes

There doesn't seem to be a good alternative solution to this problem. I considered having compactions triggered more frequently and on specific key sets, but I don't think it will address the problem of constantly hammering I/O. In addition, compactions by key ranges aren't super helpful anyway on account of poor cache locality.

@dktapps dktapps added Category: Core Related to internal functionality Status: Debugged Cause of the bug has been found, but not fixed Performance labels Dec 20, 2024
@dktapps
Copy link
Member Author

dktapps commented Dec 21, 2024

Sadly local experiments with improved key structure didn't make a huge difference. It definitely improved things but not nearly as much as I'd hoped.

dktapps added a commit that referenced this issue Dec 21, 2024
This new impl (which is not loadable by vanilla) is targeted at very large worlds, which experience significant I/O performance issues due to a variety of issues described in #6580.

Two main changes are made in RegionizedLevelDB:
- First, multiple LevelDBs are used, which cover a fixed NxN segment of terrain, similar to Anvil in Java. However, there's no constraint on these region sizes. Several experimental sizes are supported by default in WorldProviderManager.
- Second, bigEndianLong(morton2d(chunkX, chunkZ)) is used for chunk keys instead of littleEndianInt(chunkX).littleEndianInt(chunkZ). This new scheme has much better cache locality than Mojang's version, which reduces overlap and costly DB compactions.

The following new provider options are available as a result of this change:
- custom-leveldb-regions-32
- custom-leveldb-regions-64
- custom-leveldb-regions-128
- custom-leveldb-regions-256

Smaller sizes will likely be less space-efficient, but will also probably have better performance.
Once a sweet spot is found, a default will be introduced.

Note that the different variations of custom-leveldb-regions-* are not cross-compatible.
Conversion between the different formats is necessary if you want to change formats.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Core Related to internal functionality Performance Status: Debugged Cause of the bug has been found, but not fixed
Projects
None yet
Development

No branches or pull requests

1 participant