Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low level of divergence in the sample being assembled #1889

Closed
YuanwenGuo opened this issue Jan 31, 2021 · 2 comments
Closed

Low level of divergence in the sample being assembled #1889

YuanwenGuo opened this issue Jan 31, 2021 · 2 comments

Comments

@YuanwenGuo
Copy link

Hello,
I am trying to assemble ~20 linear plasmids which are very similar to each other, the whole plasmids (average length ~16kbp) are almost identical to each other, except for a single region (~20bp) within each plasmid is different. In other words, all the plasmids have same backbone, just the insert part is different. I already tried to reduce correctedErrorRate to 0.13 (we have OXN data), but we still can not get all 20 sequence in the final assembly, we have only ~10.

I realized this is a special case for any genome assemblers, but since we are having really awesome results with Canu for regular genome assembly, so we would like to give it a try.

I am just wondering how Canu will treat this localized 20 bp mismatches during assembly, and is there any parameters I can adjust in Canu to make it work? My current commands are:
canu maxMemory=80 redMemory=50 oeaMemory=50 gridOptions="--time=100:00:00 --partition=**" -p $prefix -d $dir genomeSize=320k correctedErrorRate=0.13 gnuplotTested=true -nanopore-raw $OXN_file

Just in case it might help, we have high coverage raw data, about a few thousands X.

Thank you!
Yuanwen

@skoren
Copy link
Member

skoren commented Feb 1, 2021

A 20bp difference out of 16kb is quite small given nanopore error rates, in general I'd expect the correction to corrupt that difference. Ideally, I'd suggest something similar to what I suggested in #1885. That is, use a consensus plasmid to map reads + call variants. The variant callers can give you reads supporting each variant so you can bin + assemble subsets rather than mixing them all.

You could also try re-calling the ONT data with a newer basecaller (like bonito or recent guppy versions) which can be assembled w/o correction due to their higher accuracy as I suggested in #1715 (-untrimmed 'batOptions=-eg 0.12 -sb 0.01' 'correctedErrorRate=0.12' 'maxInputCoverage=100' -pacbio-hifi <your nanopore fastq>) assuming you're using Canu 2.1.1

@YuanwenGuo
Copy link
Author

I appreciate the suggestion! We probably will try the targeted assembly strategy first to see how it works.

Best,
Yuanwen

@skoren skoren closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants