Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix custom dataloader registry #2907

Open
canergen opened this issue Jul 23, 2024 · 2 comments · May be fixed by #2932
Open

Fix custom dataloader registry #2907

canergen opened this issue Jul 23, 2024 · 2 comments · May be fixed by #2932
Assignees

Comments

@canergen
Copy link
Member

CustomDataloaders currently don't support advanced capabilities like scArches or celltype prediction in scANVI. We have to create a registry without setup_anndata that contains the same elements (see below).
https://github.com/chanzuckerberg/cellxgene-census/blob/222efddf2ce82f93f76329aa353962c1dc2400ac/api/python/notebooks/experimental/pytorch_loader_scvi.ipynb is the first working example. Currently, they use the following code to save the model:

user_attributes = model._get_user_attributes()
user_attributes = {a[0]: a[1] for a in user_attributes if a[0][-1] == "_"}

user_attributes.update(
    {
        "n_batch": datamodule.n_batch,
        "n_extra_categorical_covs": 0,
        "n_extra_continuous_covs": 0,
        "n_labels": 1,
        "n_vars": datamodule.n_vars,
    }
)

We want to create a new function that fills out the registry and passes it to the model at: model = scvi.model.SCVI(n_layers=n_layers, n_latent=n_latent, gene_likelihood="nb", encode_covariates=False). You can see all necessary entries and the structure at: scvi.adata_manager.get_state_registry(scvi.REGISTRY_KEYS.X_KEY).to_dict().
After fixing this, all uses of _module_init_on_train throughout the codebase should be removed as they are not necessary anymore.

@gokceneraslan
Copy link
Contributor

Is there some documentation on what is expected of the custom dataloader's collate function? I can imagine a dict with keys like X, batch and labels just by following up on the different types of exceptions I am getting. But for poor souls like us who are not familiar with the codebase, it'd be amazing to have some documentation of what type of keys a collate function should return in the dictionary to work.

@canergen
Copy link
Member Author

Hi, we are currently still exchanging ideas with lamin and CZI to make the implementation better (and hopefully work towards support throughout all models - currently scVI works). Overall, the final requirement will be that a registry as a dictionary is created similar to https://colab.research.google.com/drive/10sXec_TicMKtLA6hMcgfkado-FgoNKxw#scrollTo=e8vZgceklGdH. We use as a discussion channel laminlabs/lamindb#1826 to work together on a better implementation. Happy to connect offline (best case scverse Zulip) to see how we can support your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants