-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Add System.Collections.Concurrent.ConcurrentSet<T> #110825
Comments
That would be weird. The convention for hashed collections is that equality comparisons and hash codes are provided by one of:
|
Could your use case be satisfied with |
I honestly didn't think about that point enough, good catch. This serves the intended purpose regardless. Edited the original comment. |
Semantically, yes if this was just a HashSet underneath that implemented IDictionary<T,T> with a bonded Key-Value object (the value assignment also updates the key) but in implementation/as a substitute to ConcurrentSet, no because this results in a duplication of the structs. The example I used is simple but in more complex cases this would add a lot of memory overhead and potentially desync between the key and stored value without boilerplate code. |
This seems related to #21603 (comment). TL;DR we've been resisting the inclusion of methods like Apart from that, I don't believe much has changed from when #39919 got closed. Even if it was implemented as a wrapper over ConcurrentDictionary it would be creating a substantial yet trivial new API surface. |
I read through a large chunk of the discussion and I mostly agree with @theodorzoulias . I can understand not adding more definitions. I would be satisfied with a concurrent The reason this is not a trivial API surface is Databases. What this API proposal fundamentally serves is having data sets with unique keys, foreign keys and/or compound keys defined within the data struct. This type of lookup is already commonplace! However, .NET doesn't support it for some reason. The original authors for the HasSet API adding TryGetValue did not make a mistake. It was a thoughtful decision to implement a collection that supported developer-designed complex lookups on a collection without data duplication/desync (Dictionary KVP) without having O(n) lookup time. Look, being frank. Are you suggesting that we all just use ConcurrentBag and live with O(n), or conversely use ConcurrentDictionary<T,T> with developer-made boilerplate and have double the memory footprint for value keys? Implementing a custom key is a recipe for data desync between the key and the value stored, among other things. What is the option we're supposed to use for a concurrent-safe, multi-key/compound-key query on a collection? Because right now, there isn't one outside of me maintaning a bunch of dictionaries, one for every key type! |
Agree to disagree, I suppose. Even if we did add ConcurrentSet in the future, I doubt we would be adding a
I'm not convinced that duplicating the key is going to be the dominant size factor in the context of concurrent collections. Happy to be proven wrong if you have profiling data to share.
Can you clarify what you mean by this? Are you suggesting that mutations in the value can change its equality? |
The whole purpose of this, is as stated in the remarks on MSDN, and that is literally the use-case. How can you say that it's a 'mistake' when .NET does not provide an alternative API for this? The use case here is one that's a billion-dollar industry: databases. The entire purpose of this is to support a "Queryable Dataset" with O(1) lookup time,; with the constraint of requiring a unique key in your query. Again, I asked you if there's an alternative and you ignored by question.
So, the first issue here is that so long as the For example, let's say we want to store some data for a resource, with the struct having the ability to store a reference to that resource when instantiated. It is entirely possible for the I undserstand that this is not "best practices" but that doesn't mean that this isn't a real and subtle pitfall. // note: the struct is immutable
var resource = new ResourceStruct
{
Name = "name",
Location = "/some/file/location.bin",
Instance = SomeResourceFactory()
};
_myResourceCache[resource] = resource; // this resource key has a reference to the object We decide it's time to unload resources, but we keep the metadata struct alive, just without the object reference: //key implicit conversion and type implements IEqualityComparer<ResourceStruct>
_myResourceCache[cacheRef] = new ResourceStruct
{
Name = "name",
Location = "/some/file/location.bin",
Instance = null // This doesn't matter because the 'key' object in the Dictionary still has a strong handle!
}; This is a pitfall. You need to clear the entry from the Dictionary first and then re-add it or the key object will hold a handle in itself. I am aware that I could store the reference elsewhere, but this example is just an illustration. And before you say "Just make a custom key struct", I already have one, the data type's Interface implements And this goes outside of unloading obviously. Any mutable collection, even if the entries themselves are immutable, are going to have this problem so long as they can be replaced/updated. You can only update the value in a Dictionary, never the key, because the Dictionary was designed around immutable keys. ==== You already have This is hacky maybe, but we use the === |
I don't have an opinion on whether ConcurrentSet<T> should be added to .NET. But if you need it in your application now, I think you should try copying the ConcurrentDictionary<TKey, TValue> source code and removing TValue. You'd have to maintain the result yourself but you wouldn't need to wait for Microsoft decisions and new versions of .NET. ConcurrentDictionary<TKey, TValue> does use some internal parts of .NET Runtime. It uses NonRandomizedStringEqualityComparer; but you wrote that you'd use ConcurrentSet<T> with T being a struct rather than System.String, so you can strip that out too. ConcurrentDictionary<TKey, TValue> has some code using |
It's a matter of cost vs. benefit. Implementing collections is expensive, and our resources are finite. It's not every year that we allocate engineering time to work on a new collection type, but if we do we want to make sure that the new functionality has proportional impact. Besides that point, it is not the goal of |
Note too that especially in the concurrent collection space, most of the functionality is also covered by implementations of memory caching libraries or databases, which may be more relevant solutions to the actual domain-specific problem being solved in a lot of these cases. |
Edit: I am aware of Redis, and some others. This is a fair point. My use case is that I have a giant collection of resources that I need to Query based on unique keys and it needs to be updated based on both disk-cached information and in-memory compiled assemblies + loaded binary assemblies. I essentially need an in-memory database with high performance read/write but the keys are simple and unique and a I want to avoid having to write and maintain a collection of HashSets with my own threading controls. I know it would take months to implement but I've come across this whole "Lightweight, Concurrent, In-Memory Database" issue several times over the years and it's bugged me that a package doesn't exist that does the job as I've described. |
You might want to look at MS faster or MS Garnet for your usecase. Garnet is basically the successor of Faster. |
What about |
See this comment here: #110825 (comment) The fundamental problem is that the dictionaries do not give you access to the key object without doing an O(n) search on the |
I don't know what you mean by "the key object"
Why would it be O(n)?
I don't know why that would matter. -- |
The object/entity stored as the key in the kvpair?
Because you cannot retrieve the key using the hash lookup. You have to query/loop over the keys list.
Because structs are value types, meaning I can't cheat by doing Dict<T,T> and make the key and value the same reference pointer without wrapping it. Edit: Also, since the type is immutable, even if they were reference types, any upserting would cause the key and value objects to be different instances.
See my previous post. |
WDYM then that they do not give you access to it?
I don't understand what you mean by 'retrieve the key'. You have the key. You use it to lookup a value (in the dict case) or test for containment (in the set case). Why would you need to 'retrieve the key'. Furthermore, how would a ConcurrentSet help here? How does a concurrent set hel you 'retrieve the key'.
I don't knkow what this means.
i hve. it's still not making sense to me. I'm also not seeing how a hypothetical concurrent set makes this possible. A concurrent set it just a set that can be used safely/concurrently from multiple threads. It would otherwise be the same as any other set type in the bcl (like HashSet). |
You clearly have not? The key is one or more members of the data object using a custom This allows me to find the other data object of equality in the dictionary, which has actual data on it. The problem is the real data object is being stored as a key, not a value, since it's used as a key. HashSet solves this implementating This was all in my post above. #110825 (comment) |
Background and motivation
Copying this issue but with an addition to the type's use case.
I didn't see any open issues on this repo for it. If there are, sorry in advance.
This is something I was searching for when I needed to create a collection with near O(1) lookup of combined key-values, where the struct itself contains the key and value data, this is very useful in both resource lookups with custom Multi-Key (OR compare logic against select members) types for dictionaries.
This is very useful for in-memory resources lookups from disk-loaded or network-retrieved assets. See below for an example of a sample based on our usage. The implementation is similar to the previous proposal but with the addition of some new functions to justify it's use.
This should already be a thing, because .NET already implements this in the default
HashSet<T>
with my exact justification in the remarks: sourceAPI Proposal
Edit: removed
IEqualityComparer<T>
constraint.Proposal for modified continuation of implementation of:
ConcurrentSet<T>
.API Usage
Alternative Designs
No response
Risks
The breaking change would be the addition of
AddOrReplace()
toGenericICollection<T>
and other related interfaces.The text was updated successfully, but these errors were encountered: