Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in a weighted index based on balanced trees? #1053

Open
marcusklaas opened this issue Sep 28, 2020 · 9 comments
Open

Interest in a weighted index based on balanced trees? #1053

marcusklaas opened this issue Sep 28, 2020 · 9 comments
Labels
E-help-wanted Participation: help wanted F-new-int Functionality: new, within Rand

Comments

@marcusklaas
Copy link

marcusklaas commented Sep 28, 2020

Background

What is your motivation?
The current WeightedIndex is pretty good. Lookups perform very well. And while it's possible to do updates, they are potentially slow as the (average) complexity is O(n). For matchmaking systems where updates are about as frequent as samples, this may be better served by an underlying datastructure which offers quick updates.

What type of application is this? (E.g. cryptography, game, numerical simulation)
Real-time matchmaking

Feature request

If I'm not mistaken, balanced tree structures like AVL trees or red-black trees enable O(log(n)) operations (search/ sample, update, insert, delete). The constant coefficient will obviously be worse than an implementation using a vector, but for large n, it should pay off.

Is this something that could have a place in rand? I'd be willing to take a shot at it if so.

@vks
Copy link
Collaborator

vks commented Sep 28, 2020

I'm not sure how significant the performance gains world be, but in principle such an alternative implementation could live in rand_distr. There is a precedent: we already have a WeightedIndex implementation employing the alias method there.

@dhardy
Copy link
Member

dhardy commented Sep 29, 2020

update_weights is not appropriate for your use-case?

Well, that doesn't handle insertions/deletions. One way of getting around that might be to allow dummy entries with weight zero and over-allocate initially, periodically re-allocating, and tracking a "free list". Getting good performance may require a little more work but it's not obvious which would perform best.

Also note that due to floating-point inaccuracies you might want to periodically re-calculate the cumulative weights anyway.

@dhardy
Copy link
Member

dhardy commented Sep 29, 2020

Sorry, I'm forgetting that one still needs to adjust all cumulative weights after the modified entry, which makes that API especially bad for random updates. Yes, a tree based structure storing the sub-tree total in each node could be a decent alternative, if you don't want to just create a new WeightedIndex for each sample (O(n^2), assuming the number of samples is proportional to the number of entries).

@dhardy dhardy added E-help-wanted Participation: help wanted F-new-int Functionality: new, within Rand T-distributions labels Jul 8, 2021
@xmakro
Copy link
Contributor

xmakro commented Dec 26, 2023

I've recently implemented the 'tree based structure storing the sub-tree total in each node' outlined by @dhardy, and would be interested in upstreaming it.

The implementation is very straight forward. The tree is stored as a flat vector, and the value of each node is defined as follows:

value(i) =  weight(i) + value(2*i+1) * value(2*i+2)

For sampling an index, we sample a target_weight uniformly between between 0 and total_weight of the entire tree. Then we start walking the tree from the root and repeatedly descend to the child until we reach the node index that corresponds to the target_weight.

The runtimes of the operations are:

  • Constructing the initial tree from a slice of weights: O(n)
  • Updating the weight of a node (we walk up the tree): O(log n)
  • Sampling an index (we walk down the tree): O(log n)

Would a PR for this be welcome? I'm happy to work through any discussions.

Open questions are:

  • What should happen when the total sum of weights in the tree is zero? WeightedIndex raises an error before the update. I actually found this inconvenient and would prefer if we either switch to uniform sampling or raise an error on sampling.
  • Should we allow adding new nodes? It would be easy to support.

@dhardy
Copy link
Member

dhardy commented Jan 10, 2024

Sorry for not answering sooner.

Would a PR for this be welcome? I'm happy to work through any discussions.

Yes, I think so — but I'm not so sure where. We already have rand::distributions::WeightedIndex and rand_distr::weighted_alias::WeightedAliasIndex. Probably the only reason to have WeightedIndex in rand is for use by rand::seq::SliceRandom so likely to rand_distr. @vks?

What should happen when the total sum of weights in the tree is zero?

For usage with a fixed distribution of weights it surely makes sense to error as soon as possible (i.e. at construction time). For usage with mutating weights, perhaps there is utility in not raising an error until sampling time. I don't think we should just switch to linear sampling (there may be arguments around rounding errors, but if an item is added with weight=0 I would expect it to never be sampled).

Should we allow adding new nodes?

It sounds like the whole point of a tree-based weighted distribution is to support mutation, so yes. As a result I would expect the API may need to differ a little from other weighted distributions.

@xmakro
Copy link
Contributor

xmakro commented Jan 10, 2024

No worries, hope you had a good new years.

Sounds good. I'll work on a PR targeting rand_distr. And I'll raise an error at sampling time then and allow adding nodes.

Perhaps we could add a link to the tree-based implementation in the documentation of rand::distributions::WeightedIndex::update_weights, so that people can find it if they have the use case.

I saw that rand_distr was last published 2 years ago. Would we be able to release a new version in the foreseeable future?

@xmakro
Copy link
Contributor

xmakro commented Jan 10, 2024

Please take a look at #1372 which should be as discussed.

@dhardy
Copy link
Member

dhardy commented Jan 10, 2024

Perhaps we could add a link to the tree-based implementation in the documentation of rand::distributions::WeightedIndex::update_weights, so that people can find it if they have the use case.

Please do.

@xmakro
Copy link
Contributor

xmakro commented Jan 12, 2024

Updated the docs and also added overflow handling. PTAL when you get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-help-wanted Participation: help wanted F-new-int Functionality: new, within Rand
Projects
None yet
Development

No branches or pull requests

4 participants