-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] allow index from a signature zipfile #74
Conversation
With full GTDB Indexing (390k sigs): 3h 40min
Gather (28 threads, 64 queries): 2-3min
And with
Using 1 thread ... is just as fast? Hmm...
On an
Is the global thread pool being limited properly for |
Or, is rdb Running again, this time with On
On
on another run, 1 thread: 5min
so yes, seems threads aren't doing much here. |
That's awesome! Great job, @bluegenes! |
* try adding rdb index and manysearch * init testing * use pathlist loading for better errs; more tests * also check intersect_hashes * add test for index check * add multiquery mastiff gather * init mastiff gather testing * remove original single-query mastiff search, gather * more cleanup * MRG: fix `if let` warnings (#63) * fix threads for changes from main * rm threshold * [MRG] allow index from a signature zipfile (#74) * zipfile hackaround * fix * fix tests * clean up; unify search testing; pin core to commit * upd py toml * test index zip * add some indexed fastmultigather testing * add cargo lock * more index tests * indexed multigather tests * revert to branch while trying upds * better help; avoid recalc threshold * EXP: try fix CI for rocksdb (#80) * Add trial workflow * ok weird removing sourmash * try again * do the test * remove maturin CI for the moment * try caching rust build stuff * fix yaml syntax * test actions --------- Co-authored-by: C. Titus Brown <[email protected]>
* [MRG] add mastiff interface functions (#58) * try adding rdb index and manysearch * init testing * use pathlist loading for better errs; more tests * also check intersect_hashes * add test for index check * add multiquery mastiff gather * init mastiff gather testing * remove original single-query mastiff search, gather * more cleanup * MRG: fix `if let` warnings (#63) * fix threads for changes from main * rm threshold * [MRG] allow index from a signature zipfile (#74) * zipfile hackaround * fix * fix tests * clean up; unify search testing; pin core to commit * upd py toml * test index zip * add some indexed fastmultigather testing * add cargo lock * more index tests * indexed multigather tests * revert to branch while trying upds * better help; avoid recalc threshold * EXP: try fix CI for rocksdb (#80) * Add trial workflow * ok weird removing sourmash * try again * do the test * remove maturin CI for the moment * try caching rust build stuff * fix yaml syntax * test actions --------- Co-authored-by: C. Titus Brown <[email protected]> * improve gather output * cargo lock * re add jaccard * added test for max cont * version and cite * rm warning * bump versions --------- Co-authored-by: Tessa Pierce Ward <[email protected]>
* add max_containment column * change variable name * another variable rename * bump version * MRG: re-add Jaccard; many UX output improvements (#85) * [MRG] add mastiff interface functions (#58) * try adding rdb index and manysearch * init testing * use pathlist loading for better errs; more tests * also check intersect_hashes * add test for index check * add multiquery mastiff gather * init mastiff gather testing * remove original single-query mastiff search, gather * more cleanup * MRG: fix `if let` warnings (#63) * fix threads for changes from main * rm threshold * [MRG] allow index from a signature zipfile (#74) * zipfile hackaround * fix * fix tests * clean up; unify search testing; pin core to commit * upd py toml * test index zip * add some indexed fastmultigather testing * add cargo lock * more index tests * indexed multigather tests * revert to branch while trying upds * better help; avoid recalc threshold * EXP: try fix CI for rocksdb (#80) * Add trial workflow * ok weird removing sourmash * try again * do the test * remove maturin CI for the moment * try caching rust build stuff * fix yaml syntax * test actions --------- Co-authored-by: C. Titus Brown <[email protected]> * improve gather output * cargo lock * re add jaccard * added test for max cont * version and cite * rm warning * bump versions --------- Co-authored-by: Tessa Pierce Ward <[email protected]> * cleanup * wat * dup simple test * test fix --------- Co-authored-by: C. Titus Brown <[email protected]> Co-authored-by: Tessa Pierce Ward <[email protected]>
If I'm not mistaken, we can't yet robustly read signatures from sig
zip
files into rust. This is an experimental hackaround to extract files from zip and then pass the new filepaths into the branchwater/mastiffindex
function.chatgpt helped me a lot with the rust, so I make no guarantees on it ... just that it seems to work for me.
indexing gtdb:
db is
6.7G
.and running gather with podar-ref set: