Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very very large dataset recomendations #167

Open
sapuizait opened this issue Dec 11, 2023 · 2 comments
Open

very very large dataset recomendations #167

sapuizait opened this issue Dec 11, 2023 · 2 comments
Labels

Comments

@sapuizait
Copy link

Hi all

As the title says, i have a very large dataset or 1500 genomes that share 1200 single copy genes. The plan is to build a concatenated alignment (lets see if its even possible :D ) and then use raxml to build a global phylogeny.
Do you think it is even feasible or am I daydreaming and I should consider alternative approaches?

Cheers
P

ps: I have access to a cluster which can run a maximum of 7 days, has 64 nodes and 500GB RAM -

@amkozlov
Copy link
Owner

In principle, it sounds feasible.

We successfully used raxml-ng for concatenated datasets with ~1400 taxa and ~1000 genes, as well as ~350 taxa and ~64000 loci (unfortunately, both papers are not published yet).

@sapuizait
Copy link
Author

Thats excellent! Any advice/suggestions on how to do that? Do you use partitions and check models for each partition etc? Which algorithm? Thanks in advance for any tips! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants