Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing a DB for StrainGR #29

Open
wwood opened this issue Apr 4, 2023 · 0 comments
Open

Preparing a DB for StrainGR #29

wwood opened this issue Apr 4, 2023 · 0 comments

Comments

@wwood
Copy link

wwood commented Apr 4, 2023

Hi,

Been testing StrainGE and been quite impressed so far. However, in testing on a complex sample I wanted to include more than one species in the reference DB, and I was confused about the suggested way to do so.

In particular, I'm confused why running StrainGR requires "StrainGST results on one or more samples". Shouldn't creating a refdb rely only on reference data i.e. not metagenomes?

I'm interested in starting with a set of reference genomes, and getting StrainGR to call variants from a metagenome de novo. Is there a recommend way of doing that?

I attempted the following, not specifying -s or -S which I believe is allowable reading the -h but perhaps am wrong:

$ ls references/
GCA_002369315.1_genomic.fna  GCA_002412705.1_genomic.fna
$ straingr prepare-ref -p "references/{ref}.fna" -o refs_concat.fna
2023-04-04 13:42:33,826 - INFO:root:Determining which reference strains to include...
2023-04-04 13:42:33,826 - INFO:root:Found 0 reference strains to include.
2023-04-04 13:42:33,826 - INFO:root:Checking file paths...
2023-04-04 13:42:33,826 - INFO:root:Path template: references/{ref}.fna
2023-04-04 13:42:33,826 - INFO:root:Creating concatenated reference...
2023-04-04 13:42:33,828 - INFO:root:Wrote FASTA file to refs_concat.fna
2023-04-04 13:42:33,828 - INFO:root:Analyzing repetitiveness of concatenated reference...
/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/skbio/io/registry.py:547: FormatIdentificationWarning: <_io.TextIOWrapper name='refs_concat.fna' mode='r' encoding='UTF-8'> does not look like a fasta file
  warn("%r does not look like a %s file"
2023-04-04 13:42:33,829 - WARNING:strainge.variant_caller:Could not find a metadata file for reference %s, and therefore StrainGR has no sense of the repetitiveness of the concatenated reference. Abundance metrics may be skewed.
Traceback (most recent call last):
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/bin/straingr", line 11, in <module>
    sys.exit(straingr_cli())
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/main.py", line 110, in __call__
    self.run(args)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/registry.py", line 83, in run
    rc = subcommand_func(**args_dict)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/cli/straingr.py", line 237, in __call__
    repeat_masks = analyze_repetitiveness(str(output), minmatch)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/site-packages/strainge/variant_caller.py", line 385, in analyze_repetitiveness
    p = subprocess.run(cmd, capture_output=True, text=True)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 505, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 1154, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 2047, in _communicate
    stderr = self._translate_newlines(stderr,
  File "/mnt/hpccs01/work/microbiome/lorikeet/mess/1_single_genome_benchmark/.snakemake/conda/2970e699d06b7b43623728aaca037821_/lib/python3.10/subprocess.py", line 1031, in _translate_newlines
    data = data.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 115: invalid start byte

I'm using the conda version FWIW.

Thanks, ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant