Low level of divergence in the sample being assembled #1889

YuanwenGuo · 2021-01-31T17:05:39Z

Hello,
I am trying to assemble ~20 linear plasmids which are very similar to each other, the whole plasmids (average length ~16kbp) are almost identical to each other, except for a single region (~20bp) within each plasmid is different. In other words, all the plasmids have same backbone, just the insert part is different. I already tried to reduce correctedErrorRate to 0.13 (we have OXN data), but we still can not get all 20 sequence in the final assembly, we have only ~10.

I realized this is a special case for any genome assemblers, but since we are having really awesome results with Canu for regular genome assembly, so we would like to give it a try.

I am just wondering how Canu will treat this localized 20 bp mismatches during assembly, and is there any parameters I can adjust in Canu to make it work? My current commands are:
canu maxMemory=80 redMemory=50 oeaMemory=50 gridOptions="--time=100:00:00 --partition=**" -p $prefix -d $dir genomeSize=320k correctedErrorRate=0.13 gnuplotTested=true -nanopore-raw $OXN_file

Just in case it might help, we have high coverage raw data, about a few thousands X.

Thank you!
Yuanwen

skoren · 2021-02-01T18:46:06Z

A 20bp difference out of 16kb is quite small given nanopore error rates, in general I'd expect the correction to corrupt that difference. Ideally, I'd suggest something similar to what I suggested in #1885. That is, use a consensus plasmid to map reads + call variants. The variant callers can give you reads supporting each variant so you can bin + assemble subsets rather than mixing them all.

You could also try re-calling the ONT data with a newer basecaller (like bonito or recent guppy versions) which can be assembled w/o correction due to their higher accuracy as I suggested in #1715 (-untrimmed 'batOptions=-eg 0.12 -sb 0.01' 'correctedErrorRate=0.12' 'maxInputCoverage=100' -pacbio-hifi <your nanopore fastq>) assuming you're using Canu 2.1.1

YuanwenGuo · 2021-02-01T21:28:02Z

I appreciate the suggestion! We probably will try the targeted assembly strategy first to see how it works.

Best,
Yuanwen

skoren closed this as completed Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low level of divergence in the sample being assembled #1889

Low level of divergence in the sample being assembled #1889

YuanwenGuo commented Jan 31, 2021

skoren commented Feb 1, 2021 •

edited

Loading

YuanwenGuo commented Feb 1, 2021

Low level of divergence in the sample being assembled #1889

Low level of divergence in the sample being assembled #1889

Comments

YuanwenGuo commented Jan 31, 2021

skoren commented Feb 1, 2021 • edited Loading

YuanwenGuo commented Feb 1, 2021

skoren commented Feb 1, 2021 •

edited

Loading