Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why alignment results are 16 characters shorter than input sequences? #2

Open
chenying2016 opened this issue Aug 20, 2017 · 1 comment

Comments

@chenying2016
Copy link

chenying2016 commented Aug 20, 2017

In bench.c, we make sequences a and b identical sequences and both of length 10000. And we then add a statement
printf("m(%lld)\n", m);
after line 3674 of file gaba.c.
When we compile and run bench.c, it always output
m(9984)
while the normal output should be
m(10000).
Is there something going wrong?

@ocxtal
Copy link
Owner

ocxtal commented Aug 21, 2017

First of all, I'm sorry the bench.c being incorrect in calling the libgaba APIs. The correct one will be the following:

/* root coordinates; align from the heads */
uint32_t asp = 0, bsp = 0;

/* margin */
char const m[96] = {0};

/* build section structs */
gaba_section_t asec = gaba_build_section(0, a, strlen(a));
gaba_section_t bsec = gaba_build_section(2, b, strlen(b));
gaba_section_t msec = gaba_build_section(4, &m[32], 32);

/* keep current pointers in ap and bp */
gaba_section_t const *ap = &asec, *bp = &bsec;

/* fill the body of the banded matrix */
gaba_fill_t *f = gaba_dp_fill_root(
	dp, ap, asp, bp, bsp);

/* keep section with the maximum cell value in m */
gaba_fill_t *m = f;

/* fill-in the tail of the banded matrix */
uint32_t flag = GABA_STATUS_TERM;
do {
	if (f->status & GABA_STATUS_UPDATE_A) {
		ap = &msec;
	}
	if (f->status & GABA_STATUS_UPDATE_B) {
		bp = &msec;
	}
	flag |= f->status & (
		GABA_STATUS_UPDATE_A | GABA_STATUS_UPDATE_B
	);
	f = gaba_dp_fill(dp, f, ap, bp);
	m = (f->max > m->max)? f : m;
} while (!(flag & f->status));

/* fill-in is done, m holds block with the maximum */
printf("score(%lld)\n", m->max);

The behavior of the library comes from its design principle to handle each input sequence as a concatenation of sequence segments. The library forwards the front of the band (or ``extends the band'') till 32-base-long subsequence pair is available (see Fig in https://github.com/ocxtal/libgaba#sequences). So, to align toward the tips of two sequences we have to pass margin or dummy sequences after the sequences of interest.

Thanks,

Hajime Suzuki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants