Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No disjointigs were assembled #211

Closed
Chandrima-04 opened this issue Jan 30, 2020 · 9 comments
Closed

No disjointigs were assembled #211

Chandrima-04 opened this issue Jan 30, 2020 · 9 comments

Comments

@Chandrima-04
Copy link

Hi,
I looked through the suggestions and tried using --meta parameter as well as --asm-coverage, but somehow no assembly is getting formed. I have raw metagenomic data from oxford nanopore. I am getting an overlap-based coverage of 1. In the past issues, I have seen that the files which failed assembly had coverage of 0. I am attaching the log file too!
flye.log

@mikolmogorov
Copy link
Owner

Hi,

From what I can tell, you are assembling a very short sequence (e.g. 100kb) - is that so? Flye was not designed for that, unfortunately (e.g. for amplicons / viral sequences).

@Chandrima-04
Copy link
Author

No it is metagenome, mostly bacterial!

@mikolmogorov
Copy link
Owner

Ok. I don't see anything wrong in the log file otherwise. Most likely, there is simply not enough coverage to assemble any chromosomes. There is 38 Mb of reads, which would not be sufficient to assemble an isolate, and the size of metagenome could be much larger - we have experience in assembling gigabases.

@ptrebert
Copy link

I have the same error with flye 2.6 (installed via Conda) with a PacBio human dataset (uncorrected reads, ~80x total coverage; assembled with preset --pacbio-raw and --asm-coverage 50, exepcted genome size was set to 3.1g).

@frihaka
Copy link

frihaka commented Feb 3, 2020

Hi,
thanks a lot for this software, it's really great, I am using it for bacterial genomes assembly.
Flye2.6 has been working very well with other datasets so far - 16plexed ones.

But with datasets with higher depth, I cannot make it work anymore.
I am running the default command:

flye --pacbio-raw bbmap_fasta/dataset.fasta --genome-size 1.1m --out-dir flye_default_param/dataset --threads 12

The run fails with the same error message as for other users above:

[2020-02-03 06:58:49] root: INFO: Starting Flye 2.6-release
[2020-02-03 06:58:49] root: DEBUG: Cmd: /home/user/miniconda2/bin/flye --pacbio-raw bbmap_fasta/dataset.fasta --genome-size 1.1m --out-dir flye_default_param/dataset --threads 12
[2020-02-03 06:58:49] root: DEBUG: Python version: 2.7.17 |Anaconda, Inc.| (default, Oct 21 2019, 19:04:46) 
[GCC 7.3.0]
[2020-02-03 06:58:49] root: INFO: >>>STAGE: configure
[2020-02-03 06:58:49] root: INFO: Configuring run
[2020-02-03 07:00:10] root: INFO: Total read length: 3291173741
[2020-02-03 07:00:10] root: INFO: Input genome size: 1100000
[2020-02-03 07:00:10] root: INFO: Estimated coverage: 2991
[2020-02-03 07:00:10] root: WARNING: Expected read coverage is 2991, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
[2020-02-03 07:00:10] root: INFO: Reads N50/N90: 5434 / 1988
[2020-02-03 07:00:10] root: INFO: Minimum overlap set to 2000
[2020-02-03 07:00:10] root: INFO: Selected k-mer size: 15
[2020-02-03 07:00:10] root: INFO: >>>STAGE: assembly
[2020-02-03 07:00:10] root: INFO: Assembling disjointigs
[2020-02-03 07:00:10] root: DEBUG: -----Begin assembly log------
[2020-02-03 07:00:10] root: DEBUG: Running: flye-assemble --reads bbmap_fasta/dataset.fasta --out-asm flye_default_param/dataset/00-assembly/draft_assembly.fasta --genome-size 1100000 --config /home/user/miniconda2/lib/python2.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --log flye_default_param/dataset/flye.log --threads 12 --min-ovlp 2000 --kmer 15
[2020-02-03 07:00:10] DEBUG: Build date: Sep 19 2019 20:15:45
[2020-02-03 07:00:10] DEBUG: Total RAM: 376 Gb
[2020-02-03 07:00:10] DEBUG: Available RAM: 368 Gb
[2020-02-03 07:00:10] DEBUG: Total CPUs: 40
[2020-02-03 07:00:10] DEBUG: Parameters:
[2020-02-03 07:00:10] DEBUG: 	big_genome_threshold=29000000
[2020-02-03 07:00:10] DEBUG: 	low_cutoff_warning=1
[2020-02-03 07:00:10] DEBUG: 	hard_min_coverage_rate=10
[2020-02-03 07:00:10] DEBUG: 	assemble_kmer_sample=1
[2020-02-03 07:00:10] DEBUG: 	repeat_graph_kmer_sample=1
[2020-02-03 07:00:10] DEBUG: 	read_align_kmer_sample=1
[2020-02-03 07:00:10] DEBUG: 	maximum_jump=1500
[2020-02-03 07:00:10] DEBUG: 	maximum_overhang=1500
[2020-02-03 07:00:10] DEBUG: 	repeat_kmer_rate=100
[2020-02-03 07:00:10] DEBUG: 	assemble_ovlp_relative_divergence=0.10
[2020-02-03 07:00:10] DEBUG: 	repeat_graph_ovlp_divergence=0.15
[2020-02-03 07:00:10] DEBUG: 	read_align_ovlp_divergence=0.25
[2020-02-03 07:00:10] DEBUG: 	max_coverage_drop_rate=5
[2020-02-03 07:00:10] DEBUG: 	chimera_window=100
[2020-02-03 07:00:10] DEBUG: 	min_reads_in_disjointig=4
[2020-02-03 07:00:10] DEBUG: 	max_inner_reads=10
[2020-02-03 07:00:10] DEBUG: 	max_inner_fraction=0.25
[2020-02-03 07:00:10] DEBUG: 	add_unassembled_reads=0
[2020-02-03 07:00:10] DEBUG: 	max_separation=500
[2020-02-03 07:00:10] DEBUG: 	unique_edge_length=50000
[2020-02-03 07:00:10] DEBUG: 	min_repeat_res_support=0.51
[2020-02-03 07:00:10] DEBUG: 	out_paths_ratio=5
[2020-02-03 07:00:10] DEBUG: 	graph_cov_drop_rate=5
[2020-02-03 07:00:10] DEBUG: 	coverage_estimate_window=100
[2020-02-03 07:00:10] DEBUG: 	extend_contigs_with_repeats=1
[2020-02-03 07:00:10] DEBUG: 	min_read_cov_cutoff=3
[2020-02-03 07:00:10] DEBUG: 	short_tip_length=10000
[2020-02-03 07:00:10] DEBUG: 	long_tip_length=100000
[2020-02-03 07:00:10] DEBUG: 	max_bubble_length=50000
[2020-02-03 07:00:10] DEBUG: Running with k-mer size: 15
[2020-02-03 07:00:10] DEBUG: Running with minimum overlap 2000
[2020-02-03 07:00:10] DEBUG: Metagenome mode: N
[2020-02-03 07:00:10] INFO: Reading sequences
[2020-02-03 07:01:06] DEBUG: Building positional index
[2020-02-03 07:01:06] DEBUG: Total sequence: 3291173741 bp
[2020-02-03 07:01:06] DEBUG: Expected read coverage: 2991
[2020-02-03 07:01:06] INFO: Generating solid k-mer index
[2020-02-03 07:01:06] DEBUG: Hard threshold set to 5
[2020-02-03 07:01:06] DEBUG: Started k-mer counting
[2020-02-03 07:01:20] INFO: Counting k-mers (1/2):
[2020-02-03 07:01:50] INFO: Counting k-mers (2/2):
[2020-02-03 07:03:00] DEBUG: Estimated minimum kmer coverage: 507
[2020-02-03 07:03:00] DEBUG: Filtered 88309991 erroneous k-mers
[2020-02-03 07:03:00] DEBUG: Repetitive k-mer frequency: 95540
[2020-02-03 07:03:00] DEBUG: Filtered 14 repetitive k-mers (1.27291e-05)
[2020-02-03 07:03:00] INFO: Filling index table
[2020-02-03 07:03:01] DEBUG: Sampling rate: 1
[2020-02-03 07:03:01] DEBUG: Solid k-mers: 1099828
[2020-02-03 07:03:01] DEBUG: K-mer index size: 1045597867
[2020-02-03 07:03:01] DEBUG: Mean k-mer frequency: 950.692
[2020-02-03 07:03:52] DEBUG: Sorting k-mer index
[2020-02-03 07:04:09] DEBUG: Peak RAM usage: 6 Gb
[2020-02-03 07:04:09] DEBUG: Estimating k-mer identity bias
[2020-02-03 07:04:53] DEBUG: Median overlap divergence: 0.170039
[2020-02-03 07:04:53] DEBUG: K-mer estimate bias: -0.00528269
[2020-02-03 07:04:53] DEBUG: Max divergence threshold set to 0.270039
[2020-02-03 07:04:53] INFO: Extending reads
[2020-02-03 07:04:53] DEBUG: Estimating overlap coverage
[2020-02-03 07:09:37] INFO: Overlap-based coverage: 1685
[2020-02-03 07:09:37] INFO: Median overlap divergence: 0.174149
[2020-02-03 07:09:37] DEBUG: Sequence divergence distribution: 

    |                               **                     |                                             
    |                               ***                    |                                             
    |                               ****                   |                                             
    |                              *****                   |                                             
    |                              *****                   |                                             
    |                              *****                   |                                             
    |                              ********                |                                             
    |                              ********                |                                             
    |                             **********               |                                             
    |                             **********               |                                             
    |                             **********               |                                             
    |                            *************             |                                             
    |                            *************             |                                             
    |                            *************             |                                             
    |                            **************            |                                             
    |                            ***************  *        |                                             
    |                            **************** * **     |                                             
    |                           ***********************    |                                             
    |                          *************************   |        *                                    
    |                        ********************************   * * **     * **  *   * **      *         
    ----------------------------------------------------------------------------------------------------
    0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

    Q25 = 0.16, Q50 = 0.17, Q75 = 0.2

[2020-02-03 13:37:09] INFO: Assembled 0 disjointigs
[2020-02-03 13:37:09] INFO: Generating sequence
[2020-02-03 13:37:09] DEBUG: Writing FASTA
[2020-02-03 13:37:09] DEBUG: Peak RAM usage: 26 Gb
-----------End assembly log------------
[2020-02-03 13:37:10] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

@mikolmogorov
Copy link
Owner

@ptrebert @frihaka please follow the suggestions from #128.

I am marking this issue as a duplicate. Please continue the discussion in #128 if those solutions did not help.

@frihaka
Copy link

frihaka commented Feb 7, 2020

sorry, I had missed #128.

Indeed, playing with --asm-coverage values and --meta options solved the issue. Thanks!

@yige-luo
Copy link

Hi,

From what I can tell, you are assembling a very short sequence (e.g. 100kb) - is that so? Flye was not designed for that, unfortunately (e.g. for amplicons / viral sequences).

Hi,

I have a quick question - can the latest Flye version (2.8.2) handle very short assemblies (amplicon/viral)?

@mikolmogorov
Copy link
Owner

@drosophila92 There were no significant changes with that. Flye might assemble some, but full support is not guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants