Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about using cayman #8

Open
Xinpeng021001 opened this issue Sep 10, 2024 · 4 comments
Open

Questions about using cayman #8

Xinpeng021001 opened this issue Sep 10, 2024 · 4 comments

Comments

@Xinpeng021001
Copy link

Hi,

Thank you for the excellent tool! I'm trying to use it based on the biorxiv paper but I have some questions:

  1. for the bwa index, I noticed that in the paper you mentioned you used the non-human-gut dataset but in the zenodo, I found the gut dataset also.
  2. Using bwa to create the index is quite slow, so should we create the individual index for each dataset or combine those to create a total index?
  3. could you please provide the code to plot in the paper? the link https://git.embl.de/grp-zeller/cazy_gut_microbiome/ can't be opened.

Thank you for your time and help!

Best Regards,
Xinpeng

@cschu
Copy link
Member

cschu commented Sep 10, 2024

Dear Xinpeng,

Thank you for your interest in cayman!

  1. Indeed, the non human-gut catalogues were annotated in addition to the human gut one.
  2. Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.
  3. Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best,
Christian

@Xinpeng021001
Copy link
Author

Dear Xinpeng,

Thank you for your interest in cayman!

  1. Indeed, the non human-gut catalogues were annotated in addition to the human gut one.
  2. Both ways work, but the original idea is to create an individual index for each catalogue. In theory, one could profile against the complete GMGC, but that would not perform very well. If you find the bwa indexing slow, you could use a very large value for the -K parameter (as discussed here), however that is usually not necessary for smaller catalogues.
  3. Unfortunately, the link in the preprint is out of date. The repo can be found here.

Best, Christian

Dear Christian,

Thank you for your reply! So should we use the non-human gut to make the index or for different environments you recommend we use different catalogs? For example, if I’m trying to annotate a human gut env, should I follow the paper to use the non-human gut catalogue or just use the annotated human gut catalogue? For other envs also the same question. Thank you for your reply!

Best Regards,
Xinpeng

@cschu
Copy link
Member

cschu commented Sep 10, 2024

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best,
Christian

@Xinpeng021001
Copy link
Author

Dear Xinpeng,

For a human gut environment, you'd use a bwa index created from GMGC10.human-gut.95nr.0.5.percent.prevalence.fna.gz (gene_catalogues.zip) and the cazy annotations in GMGC10.human-gut.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv (gene_catalogue_annotations.zip).

For, say, soil, you'd create a bwa index from GMGC10.soil.95nr.no-rare.0.5.percent.prevalence.fna.gz and use the annotations in GMGC10.soil.95nr.no-rare.0.5.percent.prevalence_all_v3_FINAL.csv

And so on. Different environments should be profiled using the closest fitting catalogue.

Best, Christian

Dear Christian,

Thank you for your reply! I’ll redo the index part. Thank you!

Best Regards,
Xinpeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants