Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing w/ a combined genome for spike normalization #3

Open
cmuyehara opened this issue Oct 1, 2021 · 1 comment
Open

Processing w/ a combined genome for spike normalization #3

cmuyehara opened this issue Oct 1, 2021 · 1 comment

Comments

@cmuyehara
Copy link

cmuyehara commented Oct 1, 2021

Hi,

I've been trying to use your pipeline to align samples that have Drosophila spike-ins. Rather than doing sequential alignment, I generated a combined mouse and Drosophila genome w/ the dmel chromosomes in the format "dm6_{chrom}". I didn't recover any signal along the Dmel genome. The problem seems to be that when you filter out rRNA and chrM, you also pass it through grep '_' -v here:

zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz
and
zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz

I edited those lines to remove the grep '_' -v section while still removing the rRNA and chrM reads, and it seems to have fixed the problem. However, I was wondering why that was there. In the mm10 annotation I'm using, none of the chromosomes have '_' in them.

I would also maybe recommend documenting that behavior, as this seems to be a relatively common way of doing spike normalization.

edmundmiller added a commit to nf-core/nascent that referenced this issue Mar 31, 2024
@khdsudeep
Copy link

khdsudeep commented Oct 8, 2024

Hi @cmuyehara,

I have a quick follow-up question that you might already have figured out. I’m also trying to normalize using the Drosophila spike-in, but I ran into a bit of a problem. Since the pipeline seems to remove unaligned reads from the BAM file, I’m having trouble getting the total reads (here the aligned reads are shown as total reads). Could you kindly guide me on how you calculated the scale factor (normalizing factor)? I have human-aligned and drosophila-aligned reads from the pipeline. I’d really appreciate your help!

Thank you in advance!

-Sudeep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants