-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use arrow format #56
Comments
This issue was opened almost two years ago. What's changed since then:
What hasn't been done:
|
Updating this old issue, as discussed in a recent team meeting, I think we should make arrow the default, and not use or mention csv.gz in this template. We may also want to test and include an extension like this in the vscode setup for viewing arrow files also. https://marketplace.visualstudio.com/items?itemName=w568w.datasets-viewer |
I'm going to remove it from the team's board, as it's something that's on Eli's list to investigate and I don't think we should do this work until they've investigated. |
Feather format is smaller than CSV, i.e. more efficient on space/processing, and stores dtypes, helping to avoid some problems when loading the data for further processing.
We initially moved to
.csv.gz
, which was an improvement on uncompressed CSVs. However, it uses a significant amount of CPU. We believe that moving to Arrow/Feather would use much less CPU and be an overall improvement.To do:
.feather
/.arrow
filesThe text was updated successfully, but these errors were encountered: