Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use BOSS indexing in DBGSuccinct + make RowDiff independent #484

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f2e15e4
Add row_diff_traverse, row_diff_successor
adamant-pwn May 23, 2024
f3fb0b0
Update, propagate to DBGSuccinct
adamant-pwn Jul 2, 2024
d7878bf
Use graph::DeBruijnGraph in build_pred_succ and assign_anchors
adamant-pwn Oct 7, 2024
3da66a9
Update download-artifact
adamant-pwn Oct 8, 2024
3a16332
Apply suggestions from code review
adamant-pwn Oct 8, 2024
76b29b7
Merge master to rowdiff (#504)
adamant-pwn Oct 8, 2024
3a9f1c6
Try simplifying build_pred_succ, temporarily rollback to graph.get_la…
adamant-pwn Oct 8, 2024
b18d463
Special handling of last.size()
adamant-pwn Oct 8, 2024
629cb09
Use last instead of rd_succ if it's empty
adamant-pwn Oct 9, 2024
71c19a1
Add checks to fix integration tests
adamant-pwn Oct 9, 2024
90abc08
Use BOSS index space in DBGSuccinct
adamant-pwn Oct 9, 2024
f019a05
override final for call_nodes + add select/rank node + some fixes
adamant-pwn Oct 9, 2024
358e445
num_nodes() -> max_index() for dbg_succ_
adamant-pwn Oct 9, 2024
1068116
Add is_valid checks to nodes in DBGSuccinct
adamant-pwn Oct 9, 2024
42707ff
Fix DBGSuccinct tests
adamant-pwn Oct 9, 2024
a49f07f
Fix AnnotatedDBG test group
adamant-pwn Oct 9, 2024
68dd5f5
Fix RowDiff tests
adamant-pwn Oct 10, 2024
ee0c4a8
Return npos in certain callbacks for dummy nodes
adamant-pwn Oct 10, 2024
a46f6a4
Validate edges on BOSS edge -> DBG node transition
adamant-pwn Oct 10, 2024
60851da
More validate_edge checks
adamant-pwn Oct 10, 2024
91ea56f
Use is_valid in adjacent_outgoing_rc_strand
adamant-pwn Oct 10, 2024
1976be1
Fix identation + annotations without succ
adamant-pwn Oct 10, 2024
4e971da
Fix RowDiff test
adamant-pwn Oct 10, 2024
7737e26
Move get_last, row_diff_traverse, row_diff_successor into row_diff_bu…
adamant-pwn Oct 10, 2024
04740f1
Preserve lifetime of get_last
adamant-pwn Oct 10, 2024
e2347d6
Fix integration tests
adamant-pwn Oct 15, 2024
94e9f90
Fix integration tests + return dict in _get_stats
adamant-pwn Oct 15, 2024
3bc714d
Apply review suggestions
adamant-pwn Nov 4, 2024
3a87651
Use valid_edges_->call_ones
adamant-pwn Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ on:
tags:
- 'v*'
pull_request:
branches:
- master

env:
REGISTRY: ghcr.io
Expand Down Expand Up @@ -127,9 +125,9 @@ jobs:
run: mv metagraph/build/metagraph_${{ matrix.alphabet }} metagraph/build/metagraph_${{ matrix.alphabet }}_noAVX
- name: upload static binary
if: ${{ matrix.build_static == 'ON' && matrix.compiler == 'g++-11' }}
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: metagraph_${{ matrix.alphabet }}_linux_x86
name: metagraph_${{ matrix.alphabet }}${{ matrix.with_avx == 'OFF' && '_noAVX' || '' }}_linux_x86
path: metagraph/build/metagraph_${{ matrix.alphabet }}${{ matrix.with_avx == 'OFF' && '_noAVX' || '' }}

- name: run unit tests
Expand Down Expand Up @@ -213,7 +211,7 @@ jobs:
python-version: 3.8

- name: fetch static binary
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
path: artifacts

Expand Down Expand Up @@ -285,7 +283,7 @@ jobs:
run: git submodule update --init --recursive

- name: fetch static binary
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4
with:
path: artifacts

Expand Down
13 changes: 11 additions & 2 deletions metagraph/integration_tests/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,19 @@ def setUpClass(cls):
def _get_stats(graph_path):
stats_command = METAGRAPH + ' stats ' + graph_path + ' --mmap'
res = subprocess.run(stats_command.split(), stdout=PIPE, stderr=PIPE)
assert(res.returncode == 0)
if res.returncode != 0:
raise AssertionError(f"Command '{stats_command}' failed with return code {res.returncode} and error: {res.stderr.decode()}")
stats_command = METAGRAPH + ' stats ' + graph_path + MMAP_FLAG
res = subprocess.run(stats_command.split(), stdout=PIPE, stderr=PIPE)
return res
parsed = dict()
parsed['returncode'] = res.returncode
res = res.stdout.decode().split('\n')[2:]
for line in res:
if ': ' in line:
x, y = map(str.strip, line.split(':', 1))
assert(x not in parsed or parsed[x] == y)
parsed[x] = y
return parsed

@staticmethod
def _build_graph(input, output, k, repr, mode='basic', extra_params=''):
Expand Down
108 changes: 48 additions & 60 deletions metagraph/integration_tests/test_align.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,10 @@ def test_simple_align_all_graphs(self, representation):
k=11, repr=representation,
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16438', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16438', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align --align-only-forwards -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand Down Expand Up @@ -68,11 +67,10 @@ def test_simple_align_map_all_graphs(self, representation):
k=11, repr=representation,
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16438', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16438', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align -i {graph} --map --count-kmers {reads}'.format(
exe=METAGRAPH,
Expand All @@ -99,11 +97,10 @@ def test_simple_align_map_all_graphs_subk(self, representation):
k=11, repr=representation,
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16438', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16438', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align -i {graph} --map --count-kmers --align-length 10 {reads}'.format(
exe=METAGRAPH,
Expand Down Expand Up @@ -134,11 +131,10 @@ def test_simple_align_map_canonical_all_graphs(self, representation):
k=11, repr=representation, mode='canonical',
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 32782', params_str[1])
self.assertEqual('mode: canonical', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('32782', params['nodes (k)'])
self.assertEqual('canonical', params['mode'])

stats_command = '{exe} align -i {graph} --map --count-kmers {reads}'.format(
exe=METAGRAPH,
Expand All @@ -165,11 +161,10 @@ def test_simple_align_json_all_graphs(self, representation):
k=11, repr=representation,
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16438', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16438', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align --align-only-forwards -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand All @@ -189,11 +184,10 @@ def test_simple_align_fwd_rev_comp_all_graphs(self, representation):
k=11, repr=representation,
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16438', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16438', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand Down Expand Up @@ -222,11 +216,10 @@ def test_simple_align_canonical_all_graphs(self, representation):
k=11, repr=representation, mode='canonical',
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 32782', params_str[1])
self.assertEqual('mode: canonical', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('32782', params['nodes (k)'])
self.assertEqual('canonical', params['mode'])

stats_command = '{exe} align -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand Down Expand Up @@ -256,11 +249,10 @@ def test_simple_align_canonical_subk_succinct(self, representation):
k=11, repr=representation, mode='canonical',
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 32782', params_str[1])
self.assertEqual('mode: canonical', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('32782', params['nodes (k)'])
self.assertEqual('canonical', params['mode'])

stats_command = '{exe} align -i {graph} --align-min-exact-match 0.0 --align-min-seed-length 10 {reads}'.format(
exe=METAGRAPH,
Expand All @@ -286,11 +278,10 @@ def test_simple_align_primary_all_graphs(self, representation):
k=11, repr=representation, mode='primary',
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT.primary' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16391', params_str[1])
self.assertEqual('mode: primary', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT.primary' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16391', params['nodes (k)'])
self.assertEqual('primary', params['mode'])

stats_command = '{exe} align -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand Down Expand Up @@ -320,11 +311,10 @@ def test_simple_align_primary_subk_succinct(self, representation):
k=11, repr=representation, mode='primary',
extra_params="--mask-dummy")

res = self._get_stats(self.tempdir.name + '/genome.MT.primary' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16391', params_str[1])
self.assertEqual('mode: primary', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT.primary' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16391', params['nodes (k)'])
self.assertEqual('primary', params['mode'])

stats_command = '{exe} align -i {graph} --align-min-exact-match 0.0 --align-min-seed-length 10 {reads}'.format(
exe=METAGRAPH,
Expand All @@ -349,11 +339,10 @@ def test_simple_align_fwd_rev_comp_json_all_graphs(self, representation):
output=self.tempdir.name + '/genome.MT',
k=11, repr=representation)

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16461', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16461', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align --json -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand All @@ -375,11 +364,10 @@ def test_simple_align_edit_distance_all_graphs(self, representation):
output=self.tempdir.name + '/genome.MT',
k=11, repr=representation)

res = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
params_str = res.stdout.decode().split('\n')[2:]
self.assertEqual('k: 11', params_str[0])
self.assertEqual('nodes (k): 16461', params_str[1])
self.assertEqual('mode: basic', params_str[2])
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation])
self.assertEqual('11', params['k'])
self.assertEqual('16461', params['nodes (k)'])
self.assertEqual('basic', params['mode'])

stats_command = '{exe} align --json --align-edit-distance -i {graph} --align-min-exact-match 0.0 {reads}'.format(
exe=METAGRAPH,
Expand Down
Loading
Loading