ModelZoo artifacts: switch to git-lfs #2508
Replies: 0 comments 15 replies
-
Pricing
First for current pricing information:
Here I've tried to include transfers too and very similar pricing. Also it is possible to host Gitlab on custom dedicated servers and have custom LFS (I tried this before). But it is not possible with Github, we have to use their solutions. Functionality
We cannot remove some old useless files without overriding those commits and ruining the git history. @ambroise-arm said:
I don't get this part, git lfs doesn't have binary delta capability, it hosts all large binary files as they are, you still download full files even if binaries are slightly different. Also there is this thing with git-lfs that it doesn't work with ssh protocol. which make it very annoying when moving the repos to a different configuration like switching https to ssh for keys. For online rant on git-lfs: https://news.ycombinator.com/item?id=27134972 I would like to use S3 for hosting large files and their interface for keeping the portability. And not be tied to 'git-lfs'. |
Beta Was this translation helpful? Give feedback.
-
cc @JWhitleyWork as you were part of earlier discussions |
Beta Was this translation helpful? Give feedback.
-
Why not do what https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md does? Have every model hosted compressed individually and versioned like that. And hosted not in git-lfs. We could do the same on S3 Bucket using folders.
|
Beta Was this translation helpful? Give feedback.
-
@xmfcx the problem in my opinion is not about the storage medium, but it's more about how to distribute them when we have pieces of software that need them, hence why storing them somehow with Git (be it If we end up using some kind of script to download large files (instead of a git-based solution), we must make sure that it's consistent across all repos and files, otherwise we at risk of using different storage solutions, with different scripts, etc. |
Beta Was this translation helpful? Give feedback.
-
I gave Pros:
Cons:
In one of my tests, I have pushed the Overall, my first impressions are good, and seems to fix the issues that |
Beta Was this translation helpful? Give feedback.
-
I don't want to talk for other people, but after the messages on this page and discussing with @xmfcx and @ambroise-arm I think we're in a good position to decide that we want to go on with S3. |
Beta Was this translation helpful? Give feedback.
-
ModelZoo hosts neural networks useful to Autoware (yolov3, apollo_lidar_segmentation, …). They are meant to be compiled with TVM ahead of time, and the resulting artifacts are to be consumed by the relevant packages in Autoware at build and run time. The ModelZoo CI does the compilation step for all supported architectures and backends, and uploads the results to an Amazon S3 bucket.
In Autoware.Auto we have a neural_networks package that handles downloading the artifacts and making the raw files available, and we have a tvm_utility package that provides an interface for other packages to access and use those files.
Currently, the download step of
neural_networks
gets the files from the S3 bucket. But it may be better to instead have the artifacts as git-lfs files.The benefits of switching to git-lfs are:
And the drawbacks are:
Notes:
vcs import
command and possible future othergit lfs pull
usage; or the script could include those).neural_networks
, which creates a bandwidth concern that can be solved by moving the download to that script.git-lfs pull
will download the files of all supported architectures (currently aarch64 and x86_64), doubling the amount of bandwidth and user storage. We could indicate in the documentation a custom--include ...
argument to use, but it may become confusing for the user. So having that script could also be useful in that case.neural_networks
, basically removing the download-related steps, without changing the interface to dependent packages (currently onlytvm_utility
).To achieve the transition, I think we will need:
git lfs pull
step, where it should appear in the documentation, if it should download the networks by default, if a script should be created.I think it will be cleaner to sort this first and then port the
neural_networks
andtvm_utility
packages from Autoware.Auto with the necessary modifications. Except if other development efforts depend on those packages, in which case they probably should be ported first "as is" and with the modifications done afterwards.Issue for porting the packages: autowarefoundation/autoware.universe#628
Beta Was this translation helpful? Give feedback.
All reactions