Partial read of (binary) JSON file #2322
Replies: 3 comments 3 replies
-
What do you know about the arrays? Meaning - how do you decide what to skip? Is there something like a byte offset? |
Beta Was this translation helpful? Give feedback.
-
@Chrismarsh I don't know if this is answered already, but I think I have a very similar problem. This could be adapted into your problem, just by pre-reading all "small" info (fields
I want to partially read the first array (located at offset 1), but in my case, I don't know where it ends... and it's quite large, so I don't want to parse the whole thing.
I tried to
I know it's ugly, but it works #:joy: |
Beta Was this translation helpful? Give feedback.
-
Ah interesting! We ended up spinning out wheels on this for a bit. Eventually we punting by moving to a HDF5 data structure where the binary offsets are clear. Some of our thinking changed and once we were looking at O(100GB)+ we decided the HDF5 option was likely better than trying to shoe-horn json into it. I have other work with geojson files that partial indesing could be quite usful. I will have to come back to that later though! |
Beta Was this translation helpful? Give feedback.
-
I have text JSON files that have GB of data in them, stored as arrays of numbers. Further, I need to do distributed parallel processing (via MPI) on these data and would really like a way to read a sub-set of the arrays. That is, be able to read some sub-set of the data in memory without having to load the entire JSON file on each MPI process.
Is such a feature available or planned for either text or binary JSON files?
Beta Was this translation helpful? Give feedback.
All reactions