Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact() method for maps doesn't check if a rehash is needed #1668

Open
brunnsbe opened this issue Aug 15, 2024 · 5 comments
Open

Compact() method for maps doesn't check if a rehash is needed #1668

brunnsbe opened this issue Aug 15, 2024 · 5 comments

Comments

@brunnsbe
Copy link

In for example LongDoubleHashMap the compact() method directly calls:
this.rehash(this.smallestPowerOfTwoGreaterThan(this.size()));

This rehashes also the maps in those cases where the map cannot be compacted. It would be better to first check if this.smallestPowerOfTwoGreaterThan(this.size()) != getTableSize() too quickly see if we can compact the map or not.

@mohrezaei
Copy link
Member

The only reason to call compact() is because the map has undergone some removal. In other words, the code is expected to look like:

map.remove(k1);
map.remove(k2);
... // maybe a loop to remove stuff
//at the end:
map.compact();

If you're calling compact otherwise, it would be odd.

If a map has undergone removal, compact can not only change the size, but also remove sentinels (aka "tombstones").

This is how something is removed:

        this.keysValues[index] = REMOVED_KEY;
        this.keysValues[index + 1] = EMPTY_VALUE;

The map does keep track of occupiedWithData and occupiedWithSentinels, which in theory could be used to make compact a bit smarter (your check with size is insufficient), but it would be good to understand how you're trying to use compact that this in an issue.

@brunnsbe
Copy link
Author

Thank you for your swift reply!
In my use case I'm using compact after removing items from a map, my idea was that it's unnecessary work to rehash everything an recreate the internal arrays unless their length change. The maps I'm building can be quite enormous, with over one billion of elements so I'm interested in compacting them if possible, but if the map's internal arrays still stay the same length the compact call doesn't give much of a value. 🤔

@brunnsbe
Copy link
Author

brunnsbe commented Aug 22, 2024

After investigating the compact() behavior further it seems that it does unnecessary work.
Let's say that I have a map with the size 2916 and the current length of its internal arrays as 16384.
In compact() it first calls:

this.rehash(this.smallestPowerOfTwoGreaterThan(this.size()));

The this.smallestPowerOfTwoGreaterThan(this.size()) returns 4096.
Inside rehash(int newCapacity) it calls this.smallestPowerOfTwoGreaterThan(this.size()) which changes the internal arrays to be of size 4096. Then the code starts copying the values to the new arrays and inside addKeyValueAtIndex(long key, V value, int index) there is a call to maxOccupiedWithData(), which for 4096 returns 2048. This leads to the addKeyValueAtIndex() eventually calling this.rehashAndGrow(); which grows the internal arrays to the length of 8192.

Would it be possible that with the compact() call not do the extra grow from 4096 to 8192 as one could assume that after calling compact() that the map won't be modified anymore. The other option would be to initially in compact set it to 8192 (using e.g. smallestPowerOfTwoGreaterThan(this.fastCeil(size << 1))) and avoid the extra rehash.

@mohrezaei
Copy link
Member

Good catch. That should be fixed, and the optimization regarding compact only if size changes or there are sentinels can be applied as well.

Do you still need another function that only compacts if there is a memory benefit?

@brunnsbe
Copy link
Author

brunnsbe commented Aug 27, 2024

Thank you!

Do you still need another function that only compacts if there is a memory benefit?

How would that differ from "and the optimization regarding compact only if size changes or there are sentinels can be applied as well"? Or do you mean that compact() always would try to compact() and we would have a separate isCompactable() (horrible name) or something similar? Yes, a method like that would be useful unless compact internally would call it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants