Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Custom Pre-Trained PyTorch MLP as Surrogate Model #2566

Open
vinkyc opened this issue Oct 8, 2024 · 3 comments
Open

Load Custom Pre-Trained PyTorch MLP as Surrogate Model #2566

vinkyc opened this issue Oct 8, 2024 · 3 comments

Comments

@vinkyc
Copy link

vinkyc commented Oct 8, 2024

Hi everyone,

I have developed a custom Multi-Layer Perceptron (MLP) model using PyTorch, which has been pre-trained for my specific application. The model architecture has an input size of 579 features and produces an output of size 7. Additionally, I've incorporated dropout layers to enable probabilistic outputs. I would like assistance with loading this model and using it as a surrogate model.

Any guidance on how to implement this would be appreciated!

Thank you!

@Balandat
Copy link
Contributor

Balandat commented Oct 8, 2024

Fundamentally, you need to implement this in a way that's compliant with the basic Model API. You need to implement a ``posterior()](https://github.com/pytorch/botorch/blob/main/botorch/models/model.py#L81) method that returns a [Posterior`](https://github.com/pytorch/botorch/blob/main/botorch/posteriors/posterior.py#L19) API.

Since you're using dropout I assume your model will produce a bunch of samples - for that you could use the EnsembleModel which already has a posterior implementation that returns an EnsemblePosterior.

Once you have that you should be able to use the model with the standard botorch acquisition function machinery. If you're re-sampling your dropouts in each pass then you'll need to be careful to not use deterministic optimization via Sample Average Approximation (see the botorch paper for details) and use a stochastic optimizer for optimizing your acquisition function instead. Alternatively you can fix the dropouts to generate a batched ensemble model in which case you can use Sample Average Approximation.

@vinkyc
Copy link
Author

vinkyc commented Oct 9, 2024

Hi Balandat, thank you so much for the prompt reply! I would like to clarify my main objective: minimizing the time required to retrain the surrogate model, ideally keeping it to just a few minutes.

I have several questions regarding this process:

  • Loading Pretrained Weights: Can I load pretrained weights of my MLP model specifically in the forward method? Is it possible to overwrite the backpropagation method for retraining?
  • Retraining Process: How is the surrogate model being retrained? Does it retrain from scratch every time, or can it be fine-tuned? If fine-tuning is possible, are there any recommended strategies to do so over a limited number of epochs to minimize retraining time?
  • Hyperparameters: During the retraining process, does the surrogate model use the same set of hyperparameters, or are they randomized?

Thanks!

@Balandat
Copy link
Contributor

Balandat commented Oct 9, 2024

It appears your questions are mostly about the MLP with dropout and not so much about botorch, so I'm not sure how much I can help you with this.

Loading Pretrained Weights: Can I load pretrained weights of my MLP model specifically in the forward method? Is it possible to overwrite the backpropagation method for retraining?

I am not sure I understand. Yes you can always load pretrained weights, not sure why you'd want to do this in the forward method? The way I would think about this is that you train your MLP outside of the botorch Model API, and then stick it into an EnsembleModel class that you code up so that the forward method essentially draws samples from that model. So all the training would happen outside of the botorch setup.

Retraining Process: How is the surrogate model being retrained? Does it retrain from scratch every time, or can it be fine-tuned? If fine-tuning is possible, are there any recommended strategies to do so over a limited number of epochs to minimize retraining time?

It's your surrogate model so you can handle this as you like. It makes sense to me to fine-tune the model, especially later in the optimization process.

Hyperparameters: During the retraining process, does the surrogate model use the same set of hyperparameters, or are they randomized?

I don't understand this one - what hyperparameters are you referring to? Are these for the training of your MLP model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants