Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrating download utils to torchdata #7549

Closed
biphasic opened this issue May 1, 2023 · 8 comments
Closed

Migrating download utils to torchdata #7549

biphasic opened this issue May 1, 2023 · 8 comments

Comments

@biphasic
Copy link

biphasic commented May 1, 2023

Hello,
following Patrick's suggestion, I'm opening an issue here to discuss if the torchvision maintainers would be open to move the download utils upstream to torchdata. I'm the maintainer of a similar package which is modeled after torchvision to a large extent and would benefit greatly from such a move, much like torchaudio/text I would assume.

cc @pmeier

@pmeier
Copy link
Collaborator

pmeier commented May 2, 2023

Thanks for the suggestion @biphasic.

be open to move the download utils upstream to torchdata.

We had an initial push for this in #6060. Unfortunately, we hit quite a few roadblocks that would need to be resolved before we could adapt that. In addition, the prototype datasets are currently on hold and thus we currently don't depend on torchdata. Meaning, we would need to add a dependency, albeit from a first-party library, just for this. Not sure if its worth it.

benefit greatly from such a move

Could you explain how exactly you would benefit as a downstream library from this? Our idea was to just switch the backend and thus the users would have no touching points with torchdata.

@biphasic
Copy link
Author

biphasic commented May 2, 2023

prototype datasets are currently on hold

Why is that if I may ask? Is there a place where your team defines a roadmap for torchvision?

benefits

The benefit would be not having to rely on torchvision for download utils and torchdata for datapipes, but just torchdata for all of that functionality. I'm not sure I understand what you mean by saying that the users would have no touching points with torchdata, isn't the goal of torchvision/prototype to use datapipes?

@NicolasHug
Copy link
Member

NicolasHug commented May 2, 2023

Hi @biphasic

Why is that if I may ask? Is there a place where your team defines a roadmap for torchvision?

We came to the conclusion that for now, there are more suitable opportunities to improve data-loading speed in torchvision. We might revisit in the future. Re roadmap: we used to disclose those more actively, but recent events have made that difficult. To give you an idea, the rough roadmap for now is to 1) release the transforms V2 work 2) improve data-loading speed via improvements to Resize() 3) revamp video decoding utilities.

Going back to the original goal of this issue: any additional dependency comes with significant maintenance cost, and there's no immediate plan to take a dependency on torchdata. So as @pmeier already mentioned, it's unlikely that we'll migrate our download functionality to torchdata (on top of the blockers @pmeier already noted).

Also @biphasic please note that while our download utils are effectively public, we really don't recommend that you rely on them in your library: those were introduced a long time ago as public but they should have been private from the beginning. We might deprecate and make them private in the future, so please migrate away from them if you can.

Regardless of whether torchvision relies on torchdata for download functionality, there shouldn't be anything preventing you from relying on torchdata on your side? The download functionalities over there are public, so perhaps this is a better alternative for you

@biphasic
Copy link
Author

biphasic commented May 2, 2023

Thanks a lot for the clarification @NicolasHug . While I understand that datapipes might not bring a big speed improvement, for me there's a big plus in their API. Would you say then there's a risk of torchdata not being actively developed anymore? In that case I might hold off as well then for the time being

@NicolasHug
Copy link
Member

Would you say then there's a risk of torchdata not being actively developed anymore

I don't think this is the case, but let me ping @ejguan and @NivekT who will be better suited to answer that

@vadimkantorov
Copy link

vadimkantorov commented Jun 20, 2023

@NicolasHug
Copy link
Member

@biphasic just making sure you saw it: https://github.com/pytorch/data/#torchdata-see-note-below-on-current-status

Since torchdata development is paused, we won't be migrating to torchdata utils anytime soon, so I'll close this issue. Feel free to provide feedback for torchdata on pytorch/data#1196

@biphasic
Copy link
Author

@NicolasHug thanks a lot for letting me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants