Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUSTED-MH are running two slowly #38

Open
jinglkang opened this issue Jun 7, 2023 · 6 comments
Open

BUSTED-MH are running two slowly #38

jinglkang opened this issue Jun 7, 2023 · 6 comments
Assignees

Comments

@jinglkang
Copy link

Dear hyphy community:

I'm a new one to use hyphy and going to use BUSTED-MH to estimate positive selection anaysis.

However, it runs too slowly, is there any ideas to push the running? BTW, my data includes 14 species, and i'm going to identify the genes with nearly 7000 orthologous genes under positive selection. But it seems to take nearly 40 min for a single gene. Could you give me some suggestions to make it faster?

Thanks so much,
Kang

@spond
Copy link
Member

spond commented Jun 7, 2023

Dear @jinglkang,

This does seem too long. Could you share one of the files and the command you are using here, so I can benchmark it locally. Will help me determine if it's a code issue (something I can fix, potentially) or your system may just be relatively slow?

Best

@jinglkang
Copy link
Author

Hi Spond,

Thanks so much for your response. As you can see from the species tree (spe_hyphy_tre.txt), I hope to detect the positive selected genes of Ldin using busted-mh in orthlogous genes (such as "final_alignment.fa.txt"). And my command is "hyphy BUSTED-MH.bf --alignment final_alignment.fa.txt --tree spe_hyphy_tre.txt --branches Foreground". Is it the correct way to detect positive selected genes by busted-mh? Thanks so much if you point out the problems for the running, and extreamly appreciate if it runs in a correct way and you suggest a way to make it runing more quickly? Thanks so much!

spe_hyphy_tre.txt
final_alignment.fa.txt

@spond
Copy link
Member

spond commented Jun 8, 2023

Dear @jinglkang,

Using the current release of HyPhy on an MacBook Pro with an M1 Max processor, the analysis finishes in ~4 minutes. You could be using an outdated version of HyPhy. Also, multiple-hit support has been integrated into the standard busted command, like in the example below.

Can you check what your HyPhy version is (hyphy --version) and also what type of computer system you are running the analysis on?

Best,
Sergei

$time hyphy busted --alignment /Users/sergei/Dropbox/Swap/issue-83/final_alignment.fa.txt --tree /Users/sergei/Dropbox/Swap/issue-83/spe_hyphy_tre.txt --multiple-hits Double+Triple --starting-points 5 --branches Foreground

....

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   0.1301
* Corresponding fraction of substitutions :  0.000%
* rate at which 3 nucleotides are changed instantly within a single codon :   0.5094
* Corresponding fraction of substitutions :  0.000%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |    8.884    |                                   |
|        Negative selection         |     0.006     |   86.886    |                                   |
|      Diversifying selection       |    249.316    |    4.230    |                                   |

* For *background* branches, the following rate distribution for branch-site combinations was inferred

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.010     |    0.000    |       Not supported by data       |
|        Negative selection         |     0.018     |   100.000   |                                   |
|        Negative selection         |     0.232     |    0.000    |       Not supported by data       |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.145               |    27.446     |                                   |
|               0.951               |    62.905     |                                   |
|               3.750               |     9.649     |                                   |


### Performing the constrained (dN/dS > 1 not allowed) model fit
* Log(L) = -7972.43, AIC-c = 16055.59 (55 estimated parameters)
* For *test* branches under the null (no dN/dS > 1 model), the following rate distribution for branch-site combinations was inferred

### Partition-level rates for multiple-hit substitutions
* rate at which 2 nucleotides are changed instantly within a single codon :   0.2694
* Corresponding fraction of substitutions :  0.000%
* rate at which 3 nucleotides are changed instantly within a single codon :   0.8239
* Corresponding fraction of substitutions :  0.000%

|          Selection mode           |     dN/dS     |Proportion, %|               Notes               |
|-----------------------------------|---------------|-------------|-----------------------------------|
|        Negative selection         |     0.000     |   22.611    |                                   |
|        Negative selection         |     0.000     |   48.450    |       Collapsed rate class        |
|         Neutral evolution         |     1.000     |   28.939    |                                   |

* The following rate distribution for site-to-site **synonymous** rate variation was inferred

|               Rate                | Proportion, % |               Notes               |
|-----------------------------------|---------------|-----------------------------------|
|               0.133               |    23.502     |                                   |
|               0.857               |    60.028     |                                   |
|               2.759               |    16.470     |                                   |

----
## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0000**.

hyphy busted --alignment  --tree  --multiple-hits Double+Triple  5 --branches  1479.93s user 109.15s system 637% cpu 4:09.21 total

....

The multiple-hits option does increase run time by a factor of ~3 compared to the standard option (BUSTED+SRV).

$time hyphy busted --alignment /Users/sergei/Dropbox/Swap/issue-83/final_alignment.fa.txt --tree /Users/sergei/Dropbox/Swap/issue-83/spe_hyphy_tre.txt --starting-points 5 --branches Foreground 

....

## Branch-site unrestricted statistical test of episodic diversification [BUSTED]
Likelihood ratio test for episodic diversifying positive selection, **p =   0.0000**.

hyphy busted --alignment  --tree  --starting-points 5 --branches Foreground  497.08s user 26.93s system 643% cpu 1:21.39 total

Best,
Sergei

@spond spond self-assigned this Jun 8, 2023
@jinglkang
Copy link
Author

Dear Sergei,

Thanks so much for your reply.

The hyphy version in my own workstation is "HYPHY 2.5.48(MP) for Linux on x86_64", but i ran BUSTED-MH in the university compute clusters, whose hyphy version is "HYPHY 2.5.42(MP) for Linux on x86_64". It might be slower because the hyphy is not the latest version.

Btw, is there any difference between my commond and yours? Or i can use your command for the positive selection analysis? Thanks so much!

Best regards,
Jingliang

@spond
Copy link
Member

spond commented Jun 8, 2023

Dear @jinglkang,

There is a big difference between 2.5.42 and 2.5.48 (you would notice that). I would recommend updating to the latest version, and using the commands that I provided as examples.

Best,
Sergei

@jinglkang
Copy link
Author

Hi Sergei,

Thanks so much for your suggestions, i try running as your command (hyphy busted --alignment paml_input/OG0000065_OG8/final_alignment.fa --tree spe_hyphy.tre --multiple-hits Double+Triple --starting-points 5 --branches Foreground) for the same genes i shared, it takes around 40 minutes. However, it's much faster than the runing by 2.5.42 in the compute clusters (takes almost 2h30min).
Will suggest the administrator to update hyphy to the latest version.

Thanks so much!

Jingliang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants