Skip to content

Commit

Permalink
Merge pull request #291 from t20100/update-blosc2
Browse files Browse the repository at this point in the history
Updated embedded library:  c-blosc2 v2.13.0
  • Loading branch information
vasole authored Jan 24, 2024
2 parents b6fdc4d + 2b3d403 commit 4140943
Show file tree
Hide file tree
Showing 29 changed files with 861 additions and 84 deletions.
6 changes: 3 additions & 3 deletions doc/information.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ HDF5 compression filters and compression libraries sources were obtained from:
* `hdf5-blosc plugin <https://github.com/Blosc/hdf5-blosc>`_ (v1.0.0)
using `c-blosc <https://github.com/Blosc/c-blosc>`_ (v1.21.5), LZ4, Snappy, ZLib and ZStd.
* hdf5-blosc2 plugin (from `PyTables <https://github.com/PyTables/PyTables/>`_ v3.9.2)
using `c-blosc2 <https://github.com/Blosc/c-blosc2>`_ (v2.12.0), LZ4, ZLib and ZStd.
using `c-blosc2 <https://github.com/Blosc/c-blosc2>`_ (v2.13.0), LZ4, ZLib and ZStd.
* `FCIDECOMP plugin <ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/FCI_Decompression_Software_V1.0.2>`_ (v1.0.2)
using `CharLS <https://github.com/team-charls/charls>`_
(1.x branch, commit `25160a4 <https://github.com/team-charls/charls/tree/25160a42fb62e71e4b0ce081f5cb3f8bb73938b5>`_).
Expand All @@ -93,9 +93,9 @@ HDF5 compression filters and compression libraries sources were obtained from:

Sources of compression libraries shared accross multiple filters were obtained from:

* `LZ4 v1.9.4 <https://github.com/Blosc/c-blosc2/tree/v2.12.0/internal-complibs/lz4-1.9.4>`_
* `LZ4 v1.9.4 <https://github.com/Blosc/c-blosc2/tree/v2.13.0/internal-complibs/lz4-1.9.4>`_
* `Snappy v1.1.10 <https://github.com/google/snappy>`_
* `ZStd v1.5.5 <https://github.com/Blosc/c-blosc2/tree/v2.12.0/internal-complibs/zstd-1.5.5>`_
* `ZStd v1.5.5 <https://github.com/Blosc/c-blosc2/tree/v2.13.0/internal-complibs/zstd-1.5.5>`_
* `ZLib v1.2.13 <https://github.com/Blosc/c-blosc/tree/v1.21.5/internal-complibs/zlib-1.2.13>`_

When compiled with Intel IPP, the LZ4 compression library is replaced with `LZ4 v1.9.3 <https://github.com/lz4/lz4/releases/tag/v1.9.3>`_ patched with a patch from Intel IPP 2021.7.0.
Expand Down
10 changes: 5 additions & 5 deletions src/c-blosc2/ANNOUNCE.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Announcing C-Blosc2 2.12.0
# Announcing C-Blosc2 2.13.0
A fast, compressed and persistent binary data store library for C.

## What is new?

Now the `grok` codec is available globally and will be loaded dynamically. See more
info about the codec in our blog post: https://www.blosc.org/posts/blosc2-grok-release/
Furthermore, a new function has been added to get the unidimensional chunk indexes
needed to get the slice of a Blosc2 container.
A new filter for truncating integers has been added. Furthermore, the zstd codec
has been optimized specially when using dicts. And finally, the grok library
will be initialized when loading the plugin. This evicts having to import it in
some use cases.

For more info, please see the release notes in:

Expand Down
13 changes: 13 additions & 0 deletions src/c-blosc2/RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
Release notes for C-Blosc2
==========================

Changes from 2.12.0 to 2.13.0
=============================

* Added a new BLOSC_FILTER_INT_TRUNC filter for truncating integers to a
given number of bits. This is useful for compressing integers that are
not using all the bits of the type. See PR #577.

* Optimized zstd, specially when using dicts. See PR #578.

* Initialize grok library when loading the plugin. This is needed for other plugins
to be able to use it without the need of importing the package.


Changes from 2.11.3 to 2.12.0
=============================

Expand Down
2 changes: 1 addition & 1 deletion src/c-blosc2/bench/trunc_prec_schunk.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ int main(void) {
// DELTA makes compression ratio quite worse in this case
//cparams.filters[1] = BLOSC_DELTA;
// BLOSC_BITSHUFFLE is not compressing better and it quite slower here
//cparams.filters[BLOSC_LAST_FILTER - 1] = BLOSC_BITSHUFFLE;
//cparams.filters[BLOSC2_MAX_FILTERS - 1] = BLOSC_BITSHUFFLE;
// Good codec params for this dataset
cparams.compcode = BLOSC_BLOSCLZ;
cparams.clevel = 9;
Expand Down
41 changes: 41 additions & 0 deletions src/c-blosc2/blosc/blosc-private.h
Original file line number Diff line number Diff line change
Expand Up @@ -274,4 +274,45 @@ static inline void* load_lib(char *plugin_name, char *libpath) {
return loaded_lib;
}

static inline void swap_store(void *dest, const void *pa, int size) {
uint8_t *pa_ = (uint8_t *) pa;
uint8_t *pa2_ = (uint8_t*)malloc((size_t) size);
int i = 1; /* for big/little endian detection */
char *p = (char *) &i;

if (p[0] == 1) {
/* little endian */
switch (size) {
case 8:
pa2_[0] = pa_[7];
pa2_[1] = pa_[6];
pa2_[2] = pa_[5];
pa2_[3] = pa_[4];
pa2_[4] = pa_[3];
pa2_[5] = pa_[2];
pa2_[6] = pa_[1];
pa2_[7] = pa_[0];
break;
case 4:
pa2_[0] = pa_[3];
pa2_[1] = pa_[2];
pa2_[2] = pa_[1];
pa2_[3] = pa_[0];
break;
case 2:
pa2_[0] = pa_[1];
pa2_[1] = pa_[0];
break;
case 1:
pa2_[0] = pa_[0];
break;
default:
fprintf(stderr, "Unhandled nitems: %d\n", size);
}
}
memcpy(dest, pa2_, size);
free(pa2_);
}


#endif /* BLOSC_BLOSC_PRIVATE_H */
32 changes: 26 additions & 6 deletions src/c-blosc2/blosc/blosc2.c
Original file line number Diff line number Diff line change
Expand Up @@ -843,6 +843,11 @@ int fill_codec(blosc2_codec *codec) {
dlclose(lib);
return BLOSC2_ERROR_FAILURE;
}
if (codec->compcode == BLOSC_CODEC_GROK) {
// Initialize grok lib
void (*init_func)(uint32_t, bool) = dlsym(lib, "blosc2_grok_init");
(*init_func)(0, false);
}

return BLOSC2_ERROR_SUCCESS;
}
Expand Down Expand Up @@ -1750,8 +1755,7 @@ static int blosc_d(
}

/* The number of compressed data streams for this block */
if (!dont_split && !leftoverblock && !context->use_dict) {
// We don't want to split when in a training dict state
if (!dont_split && !leftoverblock) {
nstreams = (int32_t)typesize;
}
else {
Expand Down Expand Up @@ -2599,8 +2603,20 @@ int blosc2_compress_ctx(blosc2_context* context, const void* src, int32_t srcsiz
dict_maxsize = srcsize / 20;
}
void* samples_buffer = context->dest + context->header_overhead;
unsigned nblocks = 8; // the minimum that accepts zstd as of 1.4.0
unsigned sample_fraction = 1; // 1 allows to use most of the chunk for training
unsigned nblocks = (unsigned)context->nblocks;
int dont_split = (context->header_flags & 0x10) >> 4;
if (!dont_split) {
nblocks = nblocks * context->typesize;
}
if (nblocks < 8) {
nblocks = 8; // the minimum that accepts zstd as of 1.4.0
}

// 1 allows to use most of the chunk for training, but it is slower,
// and it does not always seem to improve compression ratio.
// Let's use 16, which is faster and still gives good results
// on test_dict_schunk.c, but this seems very dependent on the data.
unsigned sample_fraction = 16;
size_t sample_size = context->sourcesize / nblocks / sample_fraction;

// Populate the samples sizes for training the dictionary
Expand All @@ -2613,7 +2629,9 @@ int blosc2_compress_ctx(blosc2_context* context, const void* src, int32_t srcsiz
// Train from samples
void* dict_buffer = malloc(dict_maxsize);
BLOSC_ERROR_NULL(dict_buffer, BLOSC2_ERROR_MEMORY_ALLOC);
int32_t dict_actual_size = (int32_t)ZDICT_trainFromBuffer(dict_buffer, dict_maxsize, samples_buffer, samples_sizes, nblocks);
int32_t dict_actual_size = (int32_t)ZDICT_trainFromBuffer(
dict_buffer, dict_maxsize,
samples_buffer, samples_sizes, nblocks);

// TODO: experiment with parameters of low-level fast cover algorithm
// Note that this API is still unstable. See: https://github.com/facebook/zstd/issues/1599
Expand All @@ -2622,7 +2640,9 @@ int blosc2_compress_ctx(blosc2_context* context, const void* src, int32_t srcsiz
// fast_cover_params.d = nblocks;
// fast_cover_params.steps = 4;
// fast_cover_params.zParams.compressionLevel = context->clevel;
//size_t dict_actual_size = ZDICT_optimizeTrainFromBuffer_fastCover(dict_buffer, dict_maxsize, samples_buffer, samples_sizes, nblocks, &fast_cover_params);
// size_t dict_actual_size = ZDICT_optimizeTrainFromBuffer_fastCover(
// dict_buffer, dict_maxsize, samples_buffer, samples_sizes, nblocks,
// &fast_cover_params);

if (ZDICT_isError(dict_actual_size) != ZSTD_error_no_error) {
BLOSC_TRACE_ERROR("Error in ZDICT_trainFromBuffer(): '%s'."
Expand Down
7 changes: 5 additions & 2 deletions src/c-blosc2/blosc/stune.c
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,11 @@ int split_block(blosc2_context *context, int32_t typesize, int32_t blocksize) {

int compcode = context->compcode;
return (
// Fast codecs like blosclz and lz4 always prefer to always split
((compcode == BLOSC_BLOSCLZ) || (compcode == BLOSC_LZ4)) &&
// Fast codecs like blosclz, lz4 seems to prefer to split
((compcode == BLOSC_BLOSCLZ) || (compcode == BLOSC_LZ4)
// and low levels of zstd too
|| ((compcode == BLOSC_ZSTD) && (context->clevel <= 5))
) &&
// ...but split seems to harm cratio too much when not using shuffle
(context->filter_flags & BLOSC_DOSHUFFLE) &&
(typesize <= MAX_STREAMS) &&
Expand Down
4 changes: 4 additions & 0 deletions src/c-blosc2/doc/reference/blosc1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ Main API

.. doxygenfunction:: blosc2_set_nthreads

.. doxygentypedef:: blosc_threads_callback

.. doxygenfunction:: blosc2_set_threads_callback

.. doxygenfunction:: blosc1_get_compressor

.. doxygenfunction:: blosc1_set_compressor
Expand Down
3 changes: 2 additions & 1 deletion src/c-blosc2/doc/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ metainfo to your datasets (metalayers).
:maxdepth: 2
:caption: Contents:

utility
utility_variables
utility_functions
blosc1
context
plugins
Expand Down
4 changes: 4 additions & 0 deletions src/c-blosc2/doc/reference/metalayers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ Variable-length metalayers
.. doxygenfunction:: blosc2_vlmeta_update

.. doxygenfunction:: blosc2_vlmeta_get

.. doxygenfunction:: blosc2_vlmeta_delete

.. doxygenfunction:: blosc2_vlmeta_get_names
10 changes: 10 additions & 0 deletions src/c-blosc2/doc/reference/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@ Codecs

.. doxygenfunction:: blosc2_register_codec

Tuners
------

.. doxygenstruct:: blosc2_tuner
:members:

.. doxygenfunction:: blosc2_register_tuner


IO backends
-----------
Expand All @@ -43,3 +51,5 @@ IO backends
:members:

.. doxygenfunction:: blosc2_register_io_cb

.. doxygenfunction:: blosc2_get_io_cb
13 changes: 13 additions & 0 deletions src/c-blosc2/doc/reference/schunk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,16 @@ Dealing with chunks
.. doxygenfunction:: blosc2_schunk_insert_chunk
.. doxygenfunction:: blosc2_schunk_update_chunk
.. doxygenfunction:: blosc2_schunk_delete_chunk

Creating chunks
---------------

.. doxygenfunction:: blosc2_chunk_zeros
.. doxygenfunction:: blosc2_chunk_nans
.. doxygenfunction:: blosc2_chunk_repeatval
.. doxygenfunction:: blosc2_chunk_uninit

Frame specific functions
------------------------

.. doxygenfunction:: blosc2_frame_get_offsets
31 changes: 31 additions & 0 deletions src/c-blosc2/doc/reference/utility_functions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Utility functions
+++++++++++++++++

Timing functions
----------------

.. doxygenfunction:: blosc_set_timestamp

.. doxygenfunction:: blosc_elapsed_nsecs

.. doxygenfunction:: blosc_elapsed_secs


File and directory utilities
----------------------------

.. doxygenfunction:: blosc2_remove_dir

.. doxygenfunction:: blosc2_remove_urlpath

.. doxygenfunction:: blosc2_rename_urlpath


Slice utilities
---------------

.. doxygenfunction:: blosc2_get_slice_nchunks

.. doxygenfunction:: blosc2_unidim_to_multidim

.. doxygenfunction:: blosc2_multidim_to_unidim
File renamed without changes.
2 changes: 1 addition & 1 deletion src/c-blosc2/examples/frame_roundtrip.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ int main(void) {
// Create the original schunk
blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS;
cparams.typesize = sizeof(int32_t);
cparams.filters[BLOSC_LAST_FILTER] = BLOSC_BITSHUFFLE;
cparams.filters[BLOSC2_MAX_FILTERS - 1] = BLOSC_BITSHUFFLE;
cparams.clevel = 9;
// blosc2_remove_dir("/tmp/test.frame");
// blosc2_storage storage = {.cparams=&cparams, .contiguous=false, .urlpath="/tmp/test.frame"};
Expand Down
1 change: 1 addition & 0 deletions src/c-blosc2/include/b2nd.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ extern "C" {
#endif

#include "blosc2.h"
#include "blosc-private.h"

#include <stdio.h>
#include <stdlib.h>
Expand Down
Loading

0 comments on commit 4140943

Please sign in to comment.