Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BOSS indexing in DBGSuccinct + make RowDiff independent #484

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

adamant-pwn
Copy link
Contributor

WIP pull request for CI.

@adamant-pwn
Copy link
Contributor Author

Hi @hmusta, can you took a look here? I think it's still far from complete, but I'd still apreciate some input on the general direction to make sure that what I'm doing is aligned with what we want.

Base automatically changed from sshash_module to master July 25, 2024 07:16
@adamant-pwn
Copy link
Contributor Author

First of all, I'll wait for workflows to run through to make sure it still works as intended on DBGSuccinct. After that, I wonder if there is some simple enough way to test this with some other graph types. Some of my older notes from a few months ago say the following:

  • Test_row_diff.cpp checks work with annotations
  • Look into build_anno_graph
  • Run LabeledAlignerTest on other graph types

Would be nice to check whether it is still relevant.

@hmusta
Copy link
Collaborator

hmusta commented Oct 8, 2024

We can test this by modifying the following block in build_anno_graph to work with DeBruijnGraph:

if constexpr(std::is_same_v<Annotation, RowDiffColumnAnnotator>) {

Then, we can add instantiations for other graph types (let's start with DBGHashFast for now) with RowDiff annotations in these places:

template std::unique_ptr<AnnotatedDBG> build_anno_graph<DBGSuccinct, RowDiffColumnAnnotator>(uint64_t, const std::vector<std::string> &, const std::vector<std::string>&, DeBruijnGraph::Mode, bool);

class AnnotatedDBGWithNTest : public ::testing::Test {};

class AnnotatedDBGNoNTest : public ::testing::Test {};

typedef ::testing::Types<std::pair<DBGHashFast, annot::ColumnCompressed<>>,

metagraph/src/annotation/row_diff_builder.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/row_diff_builder.cpp Outdated Show resolved Hide resolved
distance[v] = 0;
(*terminal)[v] = true;
};
call_nodes([&](node_index v) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we know this works, can we add num_threads as a parameter to call_nodes so we can do this with multiple threads?

The set_bit, fetch_bit, etc. methods used in BOSS::row_diff_traverse can be used here for thread-safe atomic operations on sdsl::bit_vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, might be best to get back to it after some initial version is landed.

adamant-pwn and others added 4 commits October 8, 2024 14:00
* Update download-artifact

* Touch commit for workflows

* Update upload-artifact to @v4

* Own artifact name for _noAVX
@adamant-pwn
Copy link
Contributor Author

adamant-pwn commented Oct 10, 2024

I hopefully fixed all unit tests. It seems that some integration tests still fail, so I'll look into it further over the next days. In the meantime I think it'd be nice to try to get it reviewed and merged, as the change becomes large and difficult to manage within one PR. Then, the tests for other graphs can be added in a separate PR.

@adamant-pwn adamant-pwn changed the title Implement RowDiff in DeBruijnGraph class Make RowDiff independent from DBGSuccinct + use BOSS indexing in DBGSuccinct Oct 15, 2024
@adamant-pwn adamant-pwn changed the title Make RowDiff independent from DBGSuccinct + use BOSS indexing in DBGSuccinct Use BOSS indexing in DBGSuccinct + make RowDiff independent Oct 15, 2024
Comment on lines +197 to +201
for (node_index i = 1; i <= max_index() && !terminate(); ++i) {
if (is_valid(i)) {
callback(i);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it exists, we can probably replace this with valid_edges_->call_ones, but with a try-catch to implement terminate. What do you think? It should be more efficient than calling operator[] on all elements in the vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally dislike the idea of using exceptions for control flow. I implemented this approach for now, but I feel like there should be other ways to work with it. For example, encode termination condition with the return value (instead of void). But it seems like it will require another huge overhaul of the codebase...

On the same note, it seems that some functions use terminate() for this, and others stop_early() 😔

@@ -260,34 +272,34 @@ TEST(RowDiff, GetAnnotationBifurcation) {
RowDiff<ColumnMajor> annot(&graph, std::move(mat));
annot.load_anchor(fterm_temp.name());

EXPECT_EQ("CTAG", graph.get_node_sequence(4));
EXPECT_EQ("CTAG", graph.get_node_sequence(graph.select_node(4)));
ASSERT_THAT(annot.get_rows({3})[0], ElementsAre(0, 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be graph.select_node(4)-1? Same for the ones below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, eyes, but it's only really needed for masked version, while without masks it's consecutive indexing, so select_node is excessive. Do you still want me to add it?

metagraph/tests/annotation/test_converters.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/test_converters.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/row_diff/test_row_diff.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/row_diff/test_row_diff.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/row_diff/test_row_diff.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/row_diff_builder.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants