Replies: 4 comments
-
Variable length metadata (aka user metadata) is at the end of the file on purpose; reason is that you always need to provide space for including more meta, and doing so at the beginning of the actual data would require a rewrite of it. Now, |
Beta Was this translation helpful? Give feedback.
-
I'm just trying to work in C++ with the existing .bl2 files that were created in the following way: filename = os.path.join(folder, f'{name}.bl2')
with open(filename, 'wb') as f:
blosc2.save_array(mat, filename, mode="w") Therefore, I am given OK, |
Beta Was this translation helpful? Give feedback.
-
I just wanted to check whether you want to store the kind of tensor or if storing and retrieving a multidimensional dataset would be enough. If the latter, you can store your NDim dataset (see e.g. https://www.blosc.org/python-blosc2/getting_started/tutorials/02.ndarray-basics.html) and retrieve it from the C side quite easily too (see e.g. https://github.com/Blosc/c-blosc2/blob/main/examples/b2nd/example_serialize.c). If you still want to get the additional info about the kind of tensor you are storing and you don't want to do seeks (although my experience is that they are very effective, and you should not need more speed for most of the cases), then you can still create your own fixed-length metalayer (e.g. https://www.blosc.org/python-blosc2/getting_started/tutorials/02.ndarray-basics.html#Metalayers-and-variable-length-metalayers) and read it from the C side withouth the additional seek(s). Mind that this could be a bit too involved for the (small) benefits you can get. |
Beta Was this translation helpful? Give feedback.
-
Oh cool, perhaps ndarray is what I need, will go and study it, thanks a lot! 🙏 |
Beta Was this translation helpful? Give feedback.
-
Hi @FrancescAlted ,
I have another concern about
__pack_tensor__
. According to hexedit, the__pack_tensor__
entry is located in the end of.bl2
file. I think this is an inefficient choice for large files. Suppose I have a 10 GiB bl2 file. I don't want to read it entirely, but knowing its shapes is essential for almost any usecase. So in order to read the shape, the c-blosc2 would need tofseek()
up to the end of file. Of course, seeking is much faster than reading the content, but the file I/O would still need to hop over the inodes of the fragmented representation of big file in the filesystem. So why not to eliminate all this extra load on the filesystem by always placing metadata nodes in the beginning of the file? Is there an industry standard or practice that requires metadata to be placed in the end of file?Beta Was this translation helpful? Give feedback.
All reactions