Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools header check err #2312

Open
xiucz opened this issue Nov 5, 2024 · 2 comments
Open

bcftools header check err #2312

xiucz opened this issue Nov 5, 2024 · 2 comments

Comments

@xiucz
Copy link

xiucz commented Nov 5, 2024

Hi, bcftools team,
I want to normalize clinvar_20240902.vcf.gz by HGVS rules,

~/bcftools1.21/bin/bcftools norm -f ~/hs37d5/hs37d5.fasta -O v -g ~/GRCh37_latest_genomic.gff --force -m-both \
clinvar_20240902.vcf -o clinvar_20240902.norm.vcf.gz

However, it gives error:

[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_format] Invalid BCF, CONTIG id=1 not present in the header

So, I add CONTIG INFO to the vcf header, but the error still shows. Can you give me some advice?

...
##ID=<Description="ClinVar Variation ID">
##contig=<ID=M,length=16571,assembly=hg19>
###contig=<ID=1,length=249250621,assembly=hg19>
###contig=<ID=2,length=243199373,assembly=hg19>
###contig=<ID=3,length=198022430,assembly=hg19>
###contig=<ID=4,length=191154276,assembly=hg19>
###contig=<ID=5,length=180915260,assembly=hg19>
###contig=<ID=6,length=171115067,assembly=hg19>
###contig=<ID=7,length=159138663,assembly=hg19>
###contig=<ID=8,length=146364022,assembly=hg19>
###contig=<ID=9,length=141213431,assembly=hg19>
###contig=<ID=10,length=135534747,assembly=hg19>
###contig=<ID=11,length=135006516,assembly=hg19>
###contig=<ID=12,length=133851895,assembly=hg19>
###contig=<ID=13,length=115169878,assembly=hg19>
###contig=<ID=14,length=107349540,assembly=hg19>
###contig=<ID=15,length=102531392,assembly=hg19>
###contig=<ID=16,length=90354753,assembly=hg19>
###contig=<ID=17,length=81195210,assembly=hg19>
###contig=<ID=18,length=78077248,assembly=hg19>
###contig=<ID=19,length=59128983,assembly=hg19>
###contig=<ID=20,length=63025520,assembly=hg19>
###contig=<ID=21,length=48129895,assembly=hg19>
###contig=<ID=22,length=51304566,assembly=hg19>
###contig=<ID=X,length=155270560,assembly=hg19>
###contig=<ID=Y,length=59373566,assembly=hg19>
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
...

Best,
xiucz

@pd3
Copy link
Member

pd3 commented Nov 6, 2024

Any chance you could share a small test case to reproduce the problem? If the header contains the appropriate contig line, it surely shouldn't complain.

@xiucz
Copy link
Author

xiucz commented Nov 7, 2024

@pd3
Thanks for your quick reply, here is the subset file.

head -n 500  clinvar_20240902.vcf > subset.vcf.txt

subset.vcf.txt

Best,
xiucz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants