-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flye does not generate any output ("No disjointigs were assembled" message) #128
Comments
Interesting, looks like indeed a lot of overlaps were found, but no disjointigs were assembled. Is it possible to send me the full flye.log? I also suggest to try --meta mode - it is more robust to solid k-mer selection in case there is any contamination / instrumental artificial sequence. |
[2019-06-22 11:00:05] root: INFO: Starting Flye 2.4.2-release
[2019-06-23 17:20:11] INFO: Assembled 0 disjointigs |
Thank you, indeed looks strange. Maybe high coverage confuses Flye, but I also suspect there might be some non-target reads in the sample. I suggest to try two more runs (i) metagenome mode (ii) normal mode with |
I just finished running Flye using the two runs that you suggest. Both of them completed, but the assembly with ''--asm-coverage 50'' seems better (in terms of N50, total size, etc.) |
Glad that it helped! |
The solution of normal mode with |
@fenderglass Since I am running flye with |
@ptrebert Seems strange. My only guess would be that PacBio reads might not be properly split into subreads (we had a couple cases like that before). Try to process the reads with https://github.com/fenderglass/pbclip - it should tell you if there is a significant amount of "chimeric" subreads. Alternatively, you can also try to run with |
@fenderglass |
ping: testing Flye |
@fenderglass
Could you help with interpreting these numbers (I may want to get in touch with the seq lab about this sample)? I'll try to assemble to output FASTA now with flye v2.7b, let's see what happens. |
pbclip finds PacBio reads that were not properly split into subreads. Depending on the DNA library, polymerase might make multiple passes over the fragment (which is used to produce high quality CCS reads). However, fragments in CLR libraries (at least from the assembly perspective) are not expected to be read multiple times to produce longer reads. When multiple passes does happen, such reads should be split into subreads (each subread is a single polymerase pass). Typically this is handled by the PacBio software at the FASTQ generation stage. The numbers suggest that ~40% of your reads have multiple polymerase passes. This is a lot (typical value could be 1-2%) and suggests that there is indeed an issue with subread splitting. The number of chopped reads are those reads that pbclip was able to split into parts successfully. The bad reads are the reads with the same pattern that pbclip was not able to recover. Feel free to run the latest Flye version on the output produced by pbclip - I think it it should work now. You can also double check with the lab if they performed subread splitting or have raw PacBio files to regenerate valid Fastqs. |
@fenderglass |
probably last comment regarding this: even with the corrected reads (FASTA input now), flye 2.7b fails to assemble disjointigs. Seems like there is something else off about this data... |
@ptrebert I see - this could be tricky sometimes. Did you have any luck with other assemblers? Wtdbg2 might be a fast way to check. |
@fenderglass If I find the time, I'll try another assembler. For now, I asked the sequencing centre to double-check everything about this particular sample, let's see if they find something... |
@fenderglass |
@ptrebert good to know, thanks for the update! At this early stage of assembly, not much could be inferred from the logs, I think.. I guess it the log shows that "Overlap-based coverage" is reasonable (let's say, >10), but no disjointigs are produced, then there is a problem somewhere. |
No, they all show a zero for the "overlap-based coverage". Whatever the problem is, it's in the data then... thanks for all your support! |
Hello All, I am working an Mycobacterium ulcerans genome which was sequenced with oxford nanopore technology. I am trying to do denovo assembly with flye but I run into a warning and the pipeline stops . The command I used is I get this message below WARNING: Expected read coverage is 4744, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? |
@jotes35 your expected genome size is 50kb (0.05 Mb). It needs to be "5m", not "0.05m" (assuming you are aiming for 5 Mb genome). |
Please is there a way to know the expected genome size before hand? |
@fenderglass is there a way to know the expected genome size before starting the assembly? |
@jotes35 Please check the FAQ - it provides some answers to your question. Let me know if anything us unclear. |
Hello, I have the same problem "No disjointigs were assembled". Expected genome is 110M and my expected coverage is about 49, I tried --meta and different --asm-coverage (since my over all coverage is smaller than 50x) but it didn't solve the issue. My N50 is quite high, would that be the reason I am getting the error? |
@matteo1313 Seems that you have ~800kb of reads for a bacteria of size 1.6Mb, so it simply not enough coverage to assemble. You typically need at least 10x, and 30x+ is recommended. Also, your read N50 is 70kb, seems too good to be true for a bacteria - something might be wrong with the input data formatting. |
I'm also encountering this error. I'm running Flye as a plugin in Geneious Prime. My data consists of Nanopore reads generated from a cDNA library produced from RNA extracted from a cell culture infected with a virus. I'm trying to assemble the viral genome. I've filtered my reads by mapping against the host transcriptome, but this process is imperfect. I think that of the ~100,000 unmapped reads I have left, about 90% are viral. The virus has a segmented genome consisting of eight segments, with a total size of about 15 Kb. I've tried setting the genome size to various values including 15k, 100k and 2.4g (the approximate size of the host genome), but I keep getting the same error message. ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct Failed to run: C:\WINDOWS\System32\bash.exe -c '/mnt/c/Users/sgodwin/AppData/Local/Geneious/plugins/Flye/resources/Windows/bin/flye' --nano-corr input_0_Unpaired.fastq --threads 24 --genome-size 15k --meta --iterations 1 --out-dir out >stdout.txt 2>stderr.txt, exit code: 1 Flye reported the following errors: [2022-09-30 17:43:19] INFO: Starting Flye 2.7-b1585 [2022-09-30 17:43:31] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% |
@Scott-Godwin you are using outdated version of Flye. The latest release (2.9+) was optimized for viral assembly and should work better for you. |
I stopped using flye because it did not work on all my virus fastq files. What codes are people using for viral assembly now? I want to try it again. I remember by error was with the genome size. thanks! |
@fenderglass Can I run Flye 2.9 from a bash terminal on a windows machine? I'm a wet lab guy. I'm a total beginner when it comes to all things bioinformatics. |
@Scott-Godwin No, you can't. But you can install WSL (Windows System for Linux) and a Linux distribution like Ubuntu. |
Hi I uploaded the new version of Flye and I'am still getting "Pipeline aborted". Also, do you know why Canu can assemble contigs with this fastq file but flye cannot?- I am trying to understand the theory behind different long-read de novo assemblers and why some can assemble, and some cannot, even though I am using the same fastq file. Thanks! flye --nano-raw barcode01.fastq --out-dir barcode01.flye --meta --threads 20 |
Looks like my N50 is <1kb, so Flye can't assemble anything where the N50 is <1kb? What does N50 mean? |
https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics
N50[edit<https://en.wikipedia.org/w/index.php?title=N50,_L50,_and_related_statistics&action=edit§ion=2>]
N50 statistic defines assembly quality in terms of contiguity<https://en.wiktionary.org/wiki/contiguity>. Given a set of contigs, the N50 is defined as the sequence length of the shortest contig at 50% of the total assembly length. It can be thought of as the point of half of the mass of the distribution; the number of bases<https://en.wikipedia.org/wiki/Nucleotide> from all contigs longer than the N50 will be close to the number of bases from all contigs shorter than the N50. For example, consider 9 contigs with the lengths 2,3,4,5,6,7,8,9,and 10; their sum is 54, half of the sum is 27, and the size of the genome also happens to be 54. 50% of this assembly would be 10 + 9 + 8 = 27 (half the length of the sequence). Thus the N50=8, which is the size of the contig which, along with the larger contigs, contain half of sequence of a particular genome. Note: When comparing N50 values from different assemblies, the assembly sizes must be the same size in order for N50 to be meaningful.
N50 can be described as a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.
From: katie vigil ***@***.***>
Sent: Wednesday, November 23, 2022 2:19 PM
To: fenderglass/Flye ***@***.***>
Cc: Richie, Christopher (NIH/NIDA) [E] ***@***.***>; Comment ***@***.***>
Subject: [EXTERNAL] Re: [fenderglass/Flye] Flye does not generate any output ("No disjointigs were assembled" message) (#128)
Looks like my N50 is <1kb, so Flye can't assemble anything where the N50 is <1kb? What does N50 mean?
-
Reply to this email directly, view it on GitHub<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffenderglass%2FFlye%2Fissues%2F128%23issuecomment-1325553733&data=05%7C01%7Cchrisr%40nida.nih.gov%7C749744c54dd64e3a793708dacd878fad%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638048279365438405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ntXLaw9IQL1ZHcZ8kPk3wdqH1g3BML6lO1CKTsiCM5Y%3D&reserved=0>, or unsubscribe<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAR4COQYPNLB7IQE5BZQJTGLWJZUZZANCNFSM4H22HVOQ&data=05%7C01%7Cchrisr%40nida.nih.gov%7C749744c54dd64e3a793708dacd878fad%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638048279365438405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0Ovc%2Fm%2FxKPK28wJt4Ha4WphelWpqIqGjKH5QlDfxvGY%3D&reserved=0>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.
|
@ChristopherRichie Thank you! I figured out that Metaflye is based on De Bruijn graph and Canu is an overlapping graph (OLC) based method. |
I've had this issue when using |
Hello, I'm working with the Nanopore data, of the alpacas. I have tried all the different parameters but each run gives the same error. I'm unsure what the problem is. I have been using the adapter and barcode trimmed fastq file as an input to nano-raw. I have tried all the trouble shooting as mentioned above in the discussion but ending up with the same error. 2023-04-27 12:58:27] root: INFO: Starting Flye 2.9.2-b1786 | * |
|
@PavithraV0223 could you tell more about your sample? And please attach a log with |
Hello, I am having similar issues. I have tried the --meta mode and the --asm-coverage 50 without success. [2023-05-22 09:29:27] root: INFO: Starting Flye 2.9-b1778
[2023-05-22 09:30:42] INFO: Assembled 0 disjointigs |
@emmannaemeka seems like you're assembling very short reads, Flye really needs few kb reads to work. |
Trying to fix Error "No disjointigs were assembled", based on mikolmogorov/Flye#128
trying --meta, the other suggestion from mikolmogorov/Flye#128 --asm-coverage requires genome size estimate
I encountered a similar issue.
I have attached the log file. Upon checking the fastq.gz file via pbclip, the result shows: The |
@miniluphy your read error rate is ~13%, so this is not HiFi reads. If it is pacbio, use |
Hi, I have more a conceptual question that arose while solving a similar issue as the one mentioned in this thread. With the I guess the If this question doesn't belong here, but should be a separate "issue", I will change it :) Thank you for any additional information you can provide ! |
@SinaedaA sorry for the late response! The shorter fragments may be plasmids. You can try to visualize the assembly graph using Bandage to see if they form separate connected components and are circular. To check for strain heterogeneity, you can run flye with |
I have been trying to assemble a 10Mb genome with uncorrected nanopore data (3-4 chromosomes expected). We have a lot of data, is that the reason Flye fails at the end?
[2019-06-22 11:00:05] INFO: >>>STAGE: configure
[2019-06-22 11:00:05] INFO: Configuring run
[2019-06-22 11:00:27] INFO: Total read length: 10964270213
[2019-06-22 11:00:27] INFO: Input genome size: 10000000
[2019-06-22 11:00:27] INFO: Estimated coverage: 1096
[2019-06-22 11:00:27] WARNING: Expected read coverage is 1096, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
[2019-06-22 11:00:27] INFO: Reads N50/N90: 29675 / 9753
[2019-06-22 11:00:27] INFO: Minimum overlap set to 5000
[2019-06-22 11:00:27] INFO: Selected k-mer size: 15
[2019-06-22 11:00:27] INFO: >>>STAGE: assembly
[2019-06-22 11:00:27] INFO: Assembling disjointigs
[2019-06-22 11:00:27] INFO: Reading sequences
[2019-06-22 11:01:01] INFO: Generating solid k-mer index
[2019-06-22 11:01:17] INFO: Counting k-mers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:02:49] INFO: Counting k-mers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:08:39] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:13:50] INFO: Extending reads
[2019-06-22 12:54:29] INFO: Overlap-based coverage: 1177
[2019-06-22 12:54:29] INFO: Median overlap divergence: 0.119637
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-23 17:20:11] INFO: Assembled 0 disjointigs
[2019-06-23 17:20:23] INFO: Generating sequence
[2019-06-23 17:22:11] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
flye --nano-raw one.fastq --out-dir flye --genome-size 10m --threads 20
The text was updated successfully, but these errors were encountered: