Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large and discontinuous assembly results using only HiFi sequencing data #702

Open
biozzq opened this issue Sep 5, 2024 · 0 comments
Open

Comments

@biozzq
Copy link

biozzq commented Sep 5, 2024

Dear @chhylp123

It's a very useful tool, but to achieve good genome assembly results, a deep understanding of the software is still necessary. Recently, I used HiFi data to prepare for assembling a genome of about 2.8G in size. The statistics for the HiFi data are as follows:
|format|type|num_seqs|sum_len|min_len|avg_len|max_len|
|FASTQ|DNA|7,926,350|196,547,172,078|173|24,796.7|73,674|

the command I used is as follow:
hifiasm -o ${output} -t 32 --hg-size 3.0g ${input}.fastq

the logs (attached here, log.txt) show that the peak_hom: 66; peak_het: 64 are similar (will this affect the assembly result?), and the assembled genome is slightly larger and has several thousand contigs. I'm not sure which key parameters I may have overlooked that led to this (for a diploid genome with 60X HiFi data). Additionally, I would like to know if there are other tools available to assess the reliability of the current assembly results. Can I align the sequencing data with the assembled genome to check the alignment rate, genome coverage depth, and the number of large structural variants, especially homozygous structural variants? Do you have any other assessment methods? Thank you very much.
log.txt

Best wishes,
Zheng zhuqing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant