it = map.erase(it) potentially passes entries twice #20

martinus · 2019-07-31T18:50:30Z

I've stumbled upon a bug in my robin_hood map which also uses backward shift deleting, and wondered how you solve this issue in tsl::robin_map, and it seems you have exactly the same issue. (see martinus/robin-hood-hashing#42)

The problem is that due to backward shift deleting, when iterating a map and erasing elements, it is possible that the same elements will be iterated over twice.

Here is a reproducer, where I insert two elements into the map. When erasing the second element, the first element will wrap around and be iterated over a second time. I currently think it is best to just claim that's the way it is, because I am afraid there is no easy workaround (except not using backward shift deleting)

template <typename T>
struct BadHash {
    size_t operator()(T const& val) const {
        return static_cast<size_t>(val);
    }
};

int main(int, char**) {
    tsl::robin_map<int, int, BadHash<int>> map;
    map.reserve(16); // resizes with mask 31

    map.emplace(31, 1);        // gets into last bucket
    map.emplace(1024 + 31, 2); // would also get into last bucket, but wraps around

    // first entry in the map
    auto it = map.begin();
    std::cout << it->first << "->" << it->second << std::endl;

    // second entry in the map, erase it
    ++it;
    std::cout << it->first << "->" << it->second << std::endl;
    it = map.erase(it);

    // now all two elements are iterated, we should be at the map's end, but due to backward shift
    // deletion we are not
    if (it != map.end()) {
        std::cout << it->first << "->" << it->second << std::endl;
    }
    it = map.erase(it);
    std::cout << "are we now at the end? " << (it == map.end()) << std::endl;
}

I wonder what you think about that issue?

The text was updated successfully, but these errors were encountered:

Tessil · 2019-07-31T21:02:38Z

Hello,

Thank you very much for the report. I effectively completely missed that when implementing the backward shift deletion.

I started the implementation of a fix in the fix_issue_20 branch. I still have to check a bit if everything works fine and improve the documentation before merging the changes.

Basically I check if I wrap around during the backward shift deletion and if it's the case i just return end().

Thibaut

martinus · 2019-08-01T05:08:27Z

I think simply checking for a wrap around is not enough. E.g when you insert 3 elements at location 30, 2 elements are at the end and one will wrap around. When erasing the second to last, backward shift will wrap around but there is still one element left to be iterated

Tessil · 2019-08-01T06:34:04Z

Effectively I went a bit too fast with the proposed solution, it will not work in multiple cases. I'll check this evening to see how we can do that.

Tessil · 2019-08-10T11:40:05Z

Hi,

I checked a bit and currently I see the following possible solutions:

Keeping the same behaviour as std::unordered_map.
- Don't wrap around, we just add a buffer at the end of the buckets (that is not counted in the bucket_count) to store the elements that would have wrapped around. We rehash the hash table if the buffer gets full regardless of the current load factor. The problem is the risk of memory DoS attacks as an attacker may force the hash table to rehash by inserting elements at the end.
- When creating an iterator, add a reference to the last current element of the hash table except on erase where we propagate the reference to the returned iterator. If on erase we see that the element of the iterator is the same as the reference to the last element in the iterator we return end() after erasing it instead of the next element.
- When calling begin(), return the first element that was not wrapped around. When incrementing the iterator, wrap around and continue until the last wrapped element
Different behaviour compare to std::unordered_map.
- Leave it as it is and document the difference. When using erase in a loop it's usually to erase an element that meets a condition (e.g. if(it->first % 2 == 0) it = erase(it);) and not erase everything that come after an element.
- Just don't return anything, we just change the return type to void. Not really viable as erasing while iterating can be quite useful.

The problem is more complex than expected and I have to think a bit if I can find another solution or which solution to pick-up.

Thibaut

martinus · 2019-08-11T05:44:43Z

I also thought about adding an overflow buffer at the end. This might have a slight performance advantage because less need to & (tablesize-1). The DoS attack could also be mitigated with a hash that e.g. uses the this pointer, then the positions become less predictable.

Your iterator idea brings me to a new idea: How about instead of a reference, add a counter of the number of remaining items. begin() sets the counter to size(), and end() sets the counter to 0. Each ++it or it = erase(it) decrease the counter, comparing iterators just need to compare the counter. That needs no specil wrap-around handling, and begin() and end() stay fast operations

Tessil · 2019-08-11T08:15:36Z

Adding a counter would also be possible but the problem is that you also need to initialize it in other methods than begin(). For example on insertion you need to return an iterator to the inserted element, how can you get the remaining size until end() efficiently?

I also realised that a reference to the last element is not really possible either as it may be invalidated. It also has the same problem as with a counter, how do you efficiently get a reference to the last element in some methods? You could eventually maintain a pointer to the last element in the class and update it on each insert/erase. But it's getting quite complex.

martinus changed the title ~~it = map.erase(it) potentially passes object twice~~ it = map.erase(it) potentially passes entries twice Jul 31, 2019

martinus mentioned this issue Jul 31, 2019

it = map.erase(it) potentially passes object twice martinus/robin-hood-hashing#42

Closed

Tessil added a commit that referenced this issue Jul 31, 2019

Start fix for issue #20.

03979ec

martinus mentioned this issue Aug 28, 2019

Improve speed, memory efficiency with alternate hashmap bitcoin/bitcoin#16718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

it = map.erase(it) potentially passes entries twice #20

it = map.erase(it) potentially passes entries twice #20

martinus commented Jul 31, 2019 •

edited

Loading

Tessil commented Jul 31, 2019 •

edited

Loading

martinus commented Aug 1, 2019

Tessil commented Aug 1, 2019

Tessil commented Aug 10, 2019

martinus commented Aug 11, 2019

Tessil commented Aug 11, 2019 •

edited

Loading

it = map.erase(it) potentially passes entries twice #20

it = map.erase(it) potentially passes entries twice #20

Comments

martinus commented Jul 31, 2019 • edited Loading

Tessil commented Jul 31, 2019 • edited Loading

martinus commented Aug 1, 2019

Tessil commented Aug 1, 2019

Tessil commented Aug 10, 2019

martinus commented Aug 11, 2019

Tessil commented Aug 11, 2019 • edited Loading

martinus commented Jul 31, 2019 •

edited

Loading

Tessil commented Jul 31, 2019 •

edited

Loading

Tessil commented Aug 11, 2019 •

edited

Loading