Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Convert standard dataset into a federated dataset #206

Open
Saipraneet opened this issue Sep 14, 2021 · 5 comments
Open
Labels
contributions welcome Issues for which community contributions are welcome

Comments

@Saipraneet
Copy link

Synthetic federated datasets can constructed from standard centralized ones by artificially splitting them among clients. This is usually done using a Dirichlet distribution (e.g. Hsu et al. 2019).
Such synthetic datasets are very useful since we can explicitly control the total number of users, as well as the heterogeneity.

It would be great to have primitives which can automatically convert standard numpy dataset into a FedJax datset.

@jaehunro
Copy link
Collaborator

Thanks for filing this! I also think that this will be very useful.

A couple of clarifying questions:

  • What exactly constitutes a "standard numpy dataset"? An iterator of numpy arrays? A tf.data.Dataset? A single numpy array encapsulating the entire dataset (assuming it fits in memory)?

  • When you say "FedJax dataset", does this refer to fedjax.FederatedData?

@Saipraneet
Copy link
Author

Saipraneet commented Sep 14, 2021

I think if an iterator of numpy arrays is supported, that would be the most general. The tf.data.Dataset can be converted using as_numpy_iterator.

does this refer to fedjax.FederatedData

yes. The goal would be to be able to use this dataset with the rest of the fedjax framework.

@stheertha stheertha added the contributions welcome Issues for which community contributions are welcome label Sep 15, 2021
@BaselOmari
Copy link

Hi, has any work been done for this issue? Is there still a need for it?

More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.

@jaehunro
Copy link
Collaborator

Hi there. There hasn't been much work done for checking in a general implementation for this but it would be nice to have. We still actively use and maintain this repo and would be more than happy to have you contribute!

@kho
Copy link
Collaborator

kho commented Oct 26, 2022

Hi, has any work been done for this issue? Is there still a need for it?

More generally, what is the state of this repo? Is it still active? Is there work that needs some contribution? I am more than happy to help.

Have you checked out InMemoryFederatedData? It should be sufficient for creating synthetic datasets in most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome Issues for which community contributions are welcome
Projects
None yet
Development

No branches or pull requests

5 participants