Autopeptideml efficiency #14

taylorreiter · 2024-02-23T14:47:08Z

As @keithchev pointed out over in #10:

the use of autopeptideml here is a bit inefficient because it re-generates the ESM embeddings for each of the 12 named models. For now this is probably okay, but it may be worth optimizing if the dataset of combined peptide predictions that are input to autopeptideml becomes large (I would guess larger than ~10,000 sequences).

I think it could be simple to run each autopeptideml model in one script, which would then generate the ESM embeddings only once.

Similarly, keith mentioned:

we should look into the implications of snakemake parallelizing processes that use the GPU (in this case, all of the autopeptideml models). I assume that this is handled in a sensible way at the level of CUDA or the GPU itself, but I'm not sure.

This will be something to keep an eye out for.

RaulFD-creator · 2024-08-13T14:37:04Z

Hi @taylorreiter, I just found this project, it seems really cool. I've just released an update to AutoPeptideML (0.3.1) to address this issue. Now, you can calculate the representations one time with:

df_repr = re.compute_representations(df.sequence, average_pooling=True)

and then run the predictions taking the additional argument of df_repr:

    predictions = autopeptideml.predict(
        df=df, re=representation_engine, ensemble_path=model_folder, outputdir=tmp_dirname,
        df_repr=df_repr
    )

This should allow you to run the code in a loop of some sort and avoid calculating the embeddings every time.

taylorreiter · 2024-08-19T15:16:36Z

Thank you @RaulFD-creator!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autopeptideml efficiency #14

Autopeptideml efficiency #14

taylorreiter commented Feb 23, 2024

RaulFD-creator commented Aug 13, 2024

taylorreiter commented Aug 19, 2024

Autopeptideml efficiency #14

Autopeptideml efficiency #14

Comments

taylorreiter commented Feb 23, 2024

RaulFD-creator commented Aug 13, 2024

taylorreiter commented Aug 19, 2024