-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequential selection WIP #81
Conversation
@camallen easiest solution would just to tell the DB to |
I'd be hesitant to add more load to that database, though could be convinced with some quality benchmarking of variable data set sizes that shows minimal impact on the db. Asthe number of linked subjects gets large then the order by clause can be slow in my experience. We worked / benchmarked this a bit before the panoptes rebuild and cellect implementation. Is there a way to load the data without ordering in the db and then re-order in memory or use a datastructure that orders based on the priority field? For ref - Cellect uses some internal data structures to handle this then adds the data to a Fibonacci heap for ordered priority access using this comparator function https://github.com/parrish/diff_set/blob/a37aaf87f5761c5b3ff0d814df9193f0d5ba1c3c/ext/diff_set/priority_set.h#L36 And then access the data in order |
Yeah we can do that, but currently at that point in the code there's no knowledge of how the workflow is configured, so it can't make that decision there. We'll have to thread the data through in that case. |
Circling waaaaay back to this now: the sorting needs to occur either on the DB side or in the Elixir process. The implication a while back was that it was in Designator's Subject model that this data was pulled from the DB and cached, but in revisiting this I can't find anywhere where the |
Yeah that appears to be dead code. Workflow is where it's at. |
FWIW, after Cam's comments and the ensuing delay, I would suggest doing the sorting in Elixir (at the point just after we retrieve from the DB, and before it gets written to the SubjectSetCache) until we have evidence that it's not feasible. Easy to implement, and it won't cause the entire database to fall over, at most it'll make reloading slow. |
Above includes the priority when pulling from SMS, then sorts by it and returns only the ids in that order. This should, when called from the sync or async reloader, push them into the cache in that order. Selection basically assumes that the order of the Array as it exists in the cache will be the order that they're selected. So any stream involved in a request for subjects for a Questions this raised:
Going to keep at this, happy to talk it through if anyone has any insights. |
Excellent - We need to ensure consistency with the internal API selector, that is |
Closing because the remote repo has been deleted. |
Currently is sorting by whatever order the subject ids are in the cache, not by priority.
Fixed #77