You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Been testing StrainGE and been quite impressed so far. However, in testing on a complex sample I wanted to include more than one species in the reference DB, and I was confused about the suggested way to do so.
In particular, I'm confused why running StrainGR requires "StrainGST results on one or more samples". Shouldn't creating a refdb rely only on reference data i.e. not metagenomes?
I'm interested in starting with a set of reference genomes, and getting StrainGR to call variants from a metagenome de novo. Is there a recommend way of doing that?
I attempted the following, not specifying -s or -S which I believe is allowable reading the -h but perhaps am wrong:
$ ls references/
GCA_002369315.1_genomic.fna GCA_002412705.1_genomic.fna
$ straingr prepare-ref -p "references/{ref}.fna" -o refs_concat.fna
2023-04-04 13:42:33,826 - INFO:root:Determining which reference strains to include...
2023-04-04 13:42:33,826 - INFO:root:Found 0 reference strains to include.
2023-04-04 13:42:33,826 - INFO:root:Checking file paths...
2023-04-04 13:42:33,826 - INFO:root:Path template: references/{ref}.fna
2023-04-04 13:42:33,826 - INFO:root:Creating concatenated reference...
2023-04-04 13:42:33,828 - INFO:root:Wrote FASTA file to refs_concat.fna
2023-04-04 13:42:33,828 - INFO:root:Analyzing repetitiveness of concatenated reference...
/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/skbio/io/registry.py:547: FormatIdentificationWarning: <_io.TextIOWrapper name='refs_concat.fna' mode='r' encoding='UTF-8'> does not look like a fasta file
warn("%r does not look like a %s file"
2023-04-04 13:42:33,829 - WARNING:strainge.variant_caller:Could not find a metadata file for reference %s, and therefore StrainGR has no sense of the repetitiveness of the concatenated reference. Abundance metrics may be skewed.
Traceback (most recent call last):
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/bin/straingr", line 11, in <module>
sys.exit(straingr_cli())
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/main.py", line 110, in __call__
self.run(args)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/registry.py", line 83, in run
rc = subcommand_func(**args_dict)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/straingr.py", line 237, in __call__
repeat_masks = analyze_repetitiveness(str(output), minmatch)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/variant_caller.py", line 385, in analyze_repetitiveness
p = subprocess.run(cmd, capture_output=True, text=True)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 505, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 2047, in _communicate
stderr = self._translate_newlines(stderr,
File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 1031, in _translate_newlines
data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 115: invalid start byte
I'm using the conda version FWIW.
Thanks, ben
The text was updated successfully, but these errors were encountered:
Hi,
Been testing StrainGE and been quite impressed so far. However, in testing on a complex sample I wanted to include more than one species in the reference DB, and I was confused about the suggested way to do so.
In particular, I'm confused why running StrainGR requires "StrainGST results on one or more samples". Shouldn't creating a refdb rely only on reference data i.e. not metagenomes?
I'm interested in starting with a set of reference genomes, and getting StrainGR to call variants from a metagenome de novo. Is there a recommend way of doing that?
I attempted the following, not specifying -s or -S which I believe is allowable reading the
-h
but perhaps am wrong:I'm using the conda version FWIW.
Thanks, ben
The text was updated successfully, but these errors were encountered: