Processing w/ a combined genome for spike normalization #3

cmuyehara · 2021-10-01T21:18:43Z

Hi,

I've been trying to use your pipeline to align samples that have Drosophila spike-ins. Rather than doing sequential alignment, I generated a combined mouse and Drosophila genome w/ the dmel chromosomes in the format "dm6_{chrom}". I didn't recover any signal along the Dmel genome. The problem seems to be that when you filter out rRNA and chrM, you also pass it through grep '_' -v here:

proseq2.0/proseq2.0.bsh

Line 873 in c3260bd

    
           zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz

and

proseq2.0/proseq2.0.bsh

Line 1166 in c3260bd

    
           zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz

I edited those lines to remove the grep '_' -v section while still removing the rRNA and chrM reads, and it seems to have fixed the problem. However, I was wondering why that was there. In the mm10 annotation I'm using, none of the chromosomes have '_' in them.

I would also maybe recommend documenting that behavior, as this seems to be a relatively common way of doing spike normalization.

The text was updated successfully, but these errors were encountered:

Think this is breaking the analysis Danko-Lab/proseq2.0#3 Danko-Lab/proseq2.0#4 cmuyehara/proseq2.0@b4349ba

khdsudeep · 2024-10-08T17:33:30Z

Hi @cmuyehara,

I have a quick follow-up question that you might already have figured out. I’m also trying to normalize using the Drosophila spike-in, but I ran into a bit of a problem. Since the pipeline seems to remove unaligned reads from the BAM file, I’m having trouble getting the total reads (here the aligned reads are shown as total reads). Could you kindly guide me on how you calculated the scale factor (normalizing factor)? I have human-aligned and drosophila-aligned reads from the pipeline. I’d really appreciate your help!

Thank you in advance!

-Sudeep

edmundmiller added a commit to nf-core/nascent that referenced this issue Mar 31, 2024

fix(dreg): Remove grep "_" -v

1bc468c

Think this is breaking the analysis Danko-Lab/proseq2.0#3 Danko-Lab/proseq2.0#4 cmuyehara/proseq2.0@b4349ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing w/ a combined genome for spike normalization #3

Processing w/ a combined genome for spike normalization #3

cmuyehara commented Oct 1, 2021 •

edited

Loading

khdsudeep commented Oct 8, 2024 •

edited

Loading

Processing w/ a combined genome for spike normalization #3

Processing w/ a combined genome for spike normalization #3

Comments

cmuyehara commented Oct 1, 2021 • edited Loading

khdsudeep commented Oct 8, 2024 • edited Loading

cmuyehara commented Oct 1, 2021 •

edited

Loading

khdsudeep commented Oct 8, 2024 •

edited

Loading