Skip to content
This repository has been archived by the owner on Sep 29, 2023. It is now read-only.

Create a more standard training loop interface for pretraining #8

Open
jason-fries opened this issue May 7, 2021 · 1 comment
Open
Assignees
Labels
enhancement New feature or request

Comments

@jason-fries
Copy link
Contributor

Currently, clmbr_train_model conceals more familiar training loops structure from users. In most demos and APIs, the boilerplate looks like what's outlined here https://github.com/PyTorchLightning/pytorch-lightning with this structure

dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train, val = random_split(dataset, [55000, 5000])

autoencoder = LitAutoEncoder()
trainer = pl.Trainer()
trainer.fit(autoencoder, DataLoader(train), DataLoader(val))

basically the form

  • dataloader
  • data splits
  • model architecture
  • training

Specific details around the loss are configured in the model architecture and the trainer class handles stuff like progress bars, choice of optimizer, etc.

What is the lift required to provide a demo and refactor to support this type of workflow?

@jason-fries jason-fries added the enhancement New feature or request label May 7, 2021
@woffett
Copy link
Contributor

woffett commented May 21, 2021

The refactor PR puts pre-training into this kind of API:

model = CLMBRFeaturizer(config, info)
dataset = PatientTimelineDataset(extract_path, ontology_path, info_path)
model.fit(dataset)

The original clmbr_train_model still works, just uses this API. I'll leave this issue open until piton_private is updated to reflect these changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants