Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on returning empty sequence #1

Open
harish0201 opened this issue Oct 21, 2022 · 2 comments
Open

Fails on returning empty sequence #1

harish0201 opened this issue Oct 21, 2022 · 2 comments

Comments

@harish0201
Copy link

Hi!

Thank you for the wonderful tool! I was testing this out earlier this week and ran into an issue as follows:

INFO [job extract_protein] /tmp/48sbgcsb$ samtools
faidx
--output
output.fasta
--region-file
/tmp/qkhr8fv0/stgf37b22d0-9409-4aeb-ab28-6ba52a517ec7/proteins1.txt
/tmp/qkhr8fv0/stgf94fefbd-b6be-49c5-8ce9-50df72640298/proteins.fasta
samtools: /d/Apps/Miniconda/bin/../lib/libtinfow.so.6: no version information available (required by samtools)
samtools: /d/Apps/Miniconda/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
samtools: /d/Apps/Miniconda/bin/../lib/libncursesw.so.6: no version information available (required by samtools)
[W::fai_get_val] Reference W2RFS7 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in W2RFS7
WARNING [job extract_protein] exited with status: 1
WARNING [job extract_protein] completed permanentFail
WARNING [step extract_protein] completed permanentFail
INFO [workflow extract_region_pairs] completed permanentFail
ERROR Workflow cannot make any more progress.
WARNING Final process status is permanentFail
Traceback (most recent call last):
File "/d/Apps/Miniconda/bin/lukasa.py", line 71, in
workflow_output_str = subprocess.check_output(cwl_commandline)
File "/d/Apps/Miniconda/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/d/Apps/Miniconda/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cwltool', '--no-container', '/d/Apps/Miniconda/share/lukasa/lukasa.cwl', '/tmp/tmp7njg_qgc']' returned non-zero exit status 1.

This is an excerpt from the log. I also extracted the sequences for this particular protein when I ran metaeuk separately, which didn't fail.

Here they are:

W2RFS7|canuscaffold_137_pilon|-|59|5.917e-10|1|7332|7430|7430[7430]:7332[7332]:99[99]
MSKRVPSTLKMDFQNFPVKRGSRSETIATGKPC
W2RFS7|canuscaffold_265_pilon|-|57|2.367e-09|1|3396|3494|3494[3494]:3396[3396]:99[99]
MSNRVPSTLKMDFQNFPVKRGSLSETIANGKPC
W2RFS7|canuscaffold_98_pilon|-|59|5.917e-10|1|7564|7662|7662[7662]:7564[7564]:99[99]
MSKRVPSTLKIDFQNFPVKRGSRSETIATGKPC
W2RFS7|canuscaffold_339_pilon|+|47|2.424e-06|1|114232|114321|114232[114232]:114321[114321]:90[90]
RVPSTLKINFQNFPVKRGSLSEIIATGKPC
W2RFS7|canuscaffold_68_pilon|-|43|3.878e-05|1|13126|13224|13224[13224]:13126[13126]:99[99]
ISIRVPSPLKMDFQNCPVKRGSLSEKIAVGRPC
W2RFS7|canuscaffold_97_pilon|+|57|2.367e-09|1|3248|3346|3248[3248]:3346[3346]:99[99]
MSKRVPSTSKMDFQNFPVKRGSRSETIATGKPC
W2RFS7|canuscaffold_54_pilon|+|58|1.183e-09|1|4187|4285|4187[4187]:4285[4285]:99[99]
MSKRVPSTLKMDFQNFPVKRGSLSETIANGKPC
W2RFS7|canuscaffold_259_pilon|-|48|1.212e-06|1|4997|5083|5083[5083]:4997[4997]:87[87]
VPSTLKMDFQNFPVNRASLSEIIATGKPC
W2RFS7|canuscaffold_115_pilon|-|59|5.917e-10|1|9053|9151|9151[9151]:9053[9053]:99[99]
MSKRVPSTLKIDFQNFPVKRGSRSETIATGKPC
W2RFS7|canuscaffold_319_pilon|-|58|1.183e-09|1|8945|9043|9043[9043]:8945[8945]:99[99]
MSKRVPSTLKMDFQNFPVKRGSLSETIANGKPC
W2RFS7|canuscaffold_210_pilon|+|40|0.0003102|1|7133|7201|7133[7133]:7201[7201]:69[69]
MSKRVPSTLKMDFQNFPVERGSR
W2RFS7|canuscaffold_272_pilon|+|55|9.468e-09|1|5125|5223|5125[5125]:5223[5223]:99[99]
MSKRVPSTLKMDFQNFPLKQGSLSETIANGKPC
W2RFS7|canuscaffold_12_pilon|+|55|9.468e-09|1|57255|57353|57255[57255]:57353[57353]:99[99]
MSKRVPSTLKIDFQNLPVKRGSRSETIANGRPC
W2RFS7|canuscaffold_88_pilon|-|59|5.917e-10|1|19769|19867|19867[19867]:19769[19769]:99[99]
MSKRVPSTLKMDFQNFPVKRGSRSETIATGKPC

The codon sequences:

W2RFS7|canuscaffold_137_pilon|-|59|5.917e-10|1|7332|7430|7430[7430]:7332[7332]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATGGACTTCCAAAACTTCCCAGTGAAGCGAGGGTCTCGGTCAGAGACAATTGCCACAGGCAAGCCGTGT
W2RFS7|canuscaffold_265_pilon|-|57|2.367e-09|1|3396|3494|3494[3494]:3396[3396]:99[99]
ATGTCCAATCGTGTGCCGAGCACCTTGAAGATGGACTTCCAAAACTTCCCAGTGAAGCGAGGATCTCTATCAGAGACAATTGCCAACGGCAAACCGTGT
W2RFS7|canuscaffold_98_pilon|-|59|5.917e-10|1|7564|7662|7662[7662]:7564[7564]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATAGACTTCCAAAACTTCCCAGTGAAGCGAGGGTCTCGGTCAGAGACAATTGCCACGGGCAAACCGTGT
W2RFS7|canuscaffold_339_pilon|+|47|2.424e-06|1|114232|114321|114232[114232]:114321[114321]:90[90]
CGGGTGCCAAGCACCTTGAAGATAAACTTCCAAAACTTTCCCGTGAAGCGAGGATCTCTATCAGAGATAATTGCCACAGGGAAACCGTGT
W2RFS7|canuscaffold_68_pilon|-|43|3.878e-05|1|13126|13224|13224[13224]:13126[13126]:99[99]
ATATCTATTCGGGTGCCAAGCCCGTTGAAGATGGACTTCCAAAATTGCCCCGTGAAGCGAGGATCTCTATCAGAGAAAATTGCGGTAGGCAGACCGTGT
W2RFS7|canuscaffold_97_pilon|+|57|2.367e-09|1|3248|3346|3248[3248]:3346[3346]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTCGAAGATGGACTTCCAAAACTTCCCAGTGAAGCGAGGGTCTCGGTCAGAGACAATTGCCACAGGCAAGCCGTGT
W2RFS7|canuscaffold_54_pilon|+|58|1.183e-09|1|4187|4285|4187[4187]:4285[4285]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATGGATTTCCAAAACTTCCCAGTGAAGCGAGGATCTCTATCAGAGACAATTGCCAACGGCAAACCGTGT
W2RFS7|canuscaffold_259_pilon|-|48|1.212e-06|1|4997|5083|5083[5083]:4997[4997]:87[87]
GTGCCAAGCACCTTGAAGATGGACTTCCAAAACTTTCCAGTGAATCGAGCATCTCTATCAGAGATAATTGCGACAGGAAAACCGTGT
W2RFS7|canuscaffold_115_pilon|-|59|5.917e-10|1|9053|9151|9151[9151]:9053[9053]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATAGACTTCCAAAACTTCCCAGTGAAGCGAGGGTCTCGGTCAGAGACAATTGCCACGGGCAAACCGTGT
W2RFS7|canuscaffold_319_pilon|-|58|1.183e-09|1|8945|9043|9043[9043]:8945[8945]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATGGACTTCCAAAACTTCCCAGTGAAGCGAGGATCTCTATCAGAGACAATTGCCAACGGCAAACCGTGT
W2RFS7|canuscaffold_210_pilon|+|40|0.0003102|1|7133|7201|7133[7133]:7201[7201]:69[69]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATGGATTTCCAAAACTTCCCAGTAGAGCGAGGATCTCGA
W2RFS7|canuscaffold_272_pilon|+|55|9.468e-09|1|5125|5223|5125[5125]:5223[5223]:99[99]
ATGTCCAAACGTGTGCCGAGCACATTGAAGATGGACTTCCAAAACTTCCCACTGAAGCAAGGATCTCTATCAGAGACAATTGCCAACGGCAAACCGTGT
W2RFS7|canuscaffold_12_pilon|+|55|9.468e-09|1|57255|57353|57255[57255]:57353[57353]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATAGACTTCCAAAACTTACCAGTGAAGCGGGGATCTCGATCAGAGACAATTGCCAACGGCAGACCGTGT
W2RFS7|canuscaffold_88_pilon|-|59|5.917e-10|1|19769|19867|19867[19867]:19769[19769]:99[99]
ATGTCCAAACGTGTGCCGAGCACCTTGAAGATGGACTTCCAAAACTTCCCAGTGAAGCGAGGGTCTCGGTCAGAGACAATTGCCACAGGCAAGCCGTGT

I'm not sure where the issue is, because it stops here randomly. I can send the protein and contig fasta if needed.

@harish0201
Copy link
Author

harish0201 commented Oct 21, 2022

Ah, got this fixed probably. My protein sequence headers were of the format:

>tr|protein|protein_species

fasta headers were:
>canuscaffold_1_pilon

Modifying the two to:
>protein_species or >protein
>canuscaffold_1

Now it seems to run without a hitch.

@pvanheus
Copy link
Owner

Thanks for the report @harish0201. I think I know where the problem is coming from - it parses the sequence header to extract MetaEuk alignment information.

I am currently quite actively developing the tool so I will try and fix this bug in a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants