Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autopeptideml efficiency #14

Open
taylorreiter opened this issue Feb 23, 2024 · 2 comments
Open

Autopeptideml efficiency #14

taylorreiter opened this issue Feb 23, 2024 · 2 comments

Comments

@taylorreiter
Copy link
Member

As @keithchev pointed out over in #10:

the use of autopeptideml here is a bit inefficient because it re-generates the ESM embeddings for each of the 12 named models. For now this is probably okay, but it may be worth optimizing if the dataset of combined peptide predictions that are input to autopeptideml becomes large (I would guess larger than ~10,000 sequences).

I think it could be simple to run each autopeptideml model in one script, which would then generate the ESM embeddings only once.

Similarly, keith mentioned:

we should look into the implications of snakemake parallelizing processes that use the GPU (in this case, all of the autopeptideml models). I assume that this is handled in a sensible way at the level of CUDA or the GPU itself, but I'm not sure.

This will be something to keep an eye out for.

@RaulFD-creator
Copy link

Hi @taylorreiter, I just found this project, it seems really cool. I've just released an update to AutoPeptideML (0.3.1) to address this issue. Now, you can calculate the representations one time with:

df_repr = re.compute_representations(df.sequence, average_pooling=True)

and then run the predictions taking the additional argument of df_repr:

    predictions = autopeptideml.predict(
        df=df, re=representation_engine, ensemble_path=model_folder, outputdir=tmp_dirname,
        df_repr=df_repr
    )

This should allow you to run the code in a loop of some sort and avoid calculating the embeddings every time.

@taylorreiter
Copy link
Member Author

Thank you @RaulFD-creator!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants