Skip to content

Releases: NVIDIA/DALI

DALI v1.5.0

23 Aug 09:28
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Extended decoders.image to support WebP decoding (#3206)
  • Added indexing (NumPy-like) API for tensor slicing (#3200 and #3195)
  • Extended external_source to support source argument in TensorFlow DALI Dataset (#3215, #3193, #3177 and #3176)
  • Added examples:
    • Tensorflow YOLOv4 (#2883)
    • WebDataset usage with external_source (#3153)

Fixed issues

This DALI release includes the following fixes:

  • Fixed include paths that prevented including some parts of DALI in other C/C++ projects (#3210)
  • Fixed a crash when only anchors and no shapes were provided in multi_paste (#3166)
  • In the spectrogram operator, extracted windows are now correctly centered before FFT calculation, when the nfft argument is bigger than length of the window. (#3180)
  • Fixed a minor memory leak in decoders.image (#3148)

Improvements

  • Add documentation for indexing. (#3200)
  • Move to CUDA 11.4U1 (#3213)
  • Add WebP support to image decoder (#3206)
  • libtar API implementation (#3198)
  • Tensor indexing (#3195)
  • Make TF graph-mode tests faster (#3204)
  • Add support for ES source in TF DALI Dataset (#3177)
  • Add tensorflow YOLOv4 example (#2883)
  • Refactor Python External Source code (#3176)
  • Update third party dependencies to latest release versions (#3184)
  • Add deferred deallocation to cuda_vm_resource. (#3154)
  • Adjust test scripts and section header for webadataset notebook (#3162)
  • Add Webdataset-ExternalSource Jupyter notebook (#3153)
  • Update PR template (#3150)
  • Update PR template (#3129)

Bug Fixes

  • Fix failing TarArchive tests (#3226)
  • Build custom libtar in conda (#3223)
  • Improve validation in DALIDataset (#3215)
  • Update DALI_DEPS_VERSIOn to include NVIDIA/DALI_deps#19 (#3224)
  • Fix identity check in _is_generator_function which. Add test. (#3216)
  • Fix unused imports in test_utils.py (#3214)
  • Remove the usage of ManagedMemory from the OpticalFlow tests (#3211)
  • Suppress test using unified memory when it is not supported (#3209)
  • Remove include prefix from include paths (#3210)
  • Fix CVE-2021-3246 in libsnd (#3208)
  • Fix pytorch-lighting test (#3196)
  • Fix coverity issues + skip tests involving managed memory when not supported. (#3190)
  • Disable NVJPEG HW decoder for driver < 455 due to performance reason (#3189)
  • Fix compilation with newer GCC (#3188)
  • Disallow some types of sources for parallel ES explicitly (#3193)
  • Center windows when extracting windows to a bigger output window (#3180)
  • Add a compute cap value before running the GDS test (#3185)
  • MultiPaste to adjust the region shape to cover up to the end of the input shape (#3166)
  • Fix wording in docs (#3165)
  • Fix image decode (#3148)
  • Fix LastBatchPolicy doc and update Parallel ES wording (#3152)
  • Fix some errors (#3147)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.5.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.5.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.4.0

26 Jul 10:48
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • readers.numpy improvements:
    • Added ROI support in the GPU operator (#3034 and #3040).
    • Parallelized reading in the CPU operator (#3077).
    • Added a tutorial (#3095 and #3139).
  • DALI Dataset improvements:
  • Video reader improvements:
    • Added an option to pad missing frames at the end of sequence (#3002).
    • Added support for the VP8 and MJPEG formats (#3045).
  • Added CPU parallelization to the Slice and SliceFlipNormalizePermutePad kernels. (#3062, #3068, and #3080)
  • Added an option to readers.nemo_asr to return indices of the entries in the manifest (#3085).
  • Improved the performance in the GPU image decoder by optimizing the memory allocations. (#3067).

Fixed issues

This DALI release includes the following fixes:

  • Fixed a crash that happened when a functools.partial result was passed as a source to external_source (#3143).
  • Fixed the hardware image decoder to fall back to the hybrid implementation for unsupported file formats instead of throwing an error (#3086).

Improvements

  • Add NumpyReader tutorial to the rendered documentation page (#3139)
  • Update docs analytics tracking (#3135)
  • VM async_pool - refactoring & tests (#3117)
  • Extend the video loader error message for vfr videos on how to disable the check in case of false positives (#3125)
  • Integer literal suffixes (#3122)
  • SliceCPU kernel to run plain memcpy when applicable (#3110)
  • CUDA VM memory resource (#3114)
  • Add Numpy Reader Tutorial (#3095)
  • Bump TensorFlow version in tests (#3107)
  • Efficient det code drop (#3115)
  • Move to CUDA 11.4 build (#3109)
  • Add batch support to DALI Dataset (#3089)
  • Update third party dependencies (#3093)
  • Add bitmask::append. (#3101)
  • Free list API cleanup. (#3100)
  • NemoAsrReader to optionally return indices of the entries in the manifest. (#3085)
  • Paralellize reading in NumpyReader CPU (#3077)
  • Bit mask utility (#3083)
  • Add ExecutionEngine to SliceFlipNormalizePermutePad CPU kernel, to allow parallel execution (#3080)
  • Add an ability to pad missing frames in the Video reader sequence (#3002)
  • Rework the TF DALIDataset input API (#3063)
  • Add ExecutionEngine to Slice CPU kernel, to allow parallel execution (#3068)
  • Use HW NVJPEG decoder memory pool even if size hint is not set (#3067)
  • CUDA Virtual Memory API wrappers. (#3064)
  • Add information about installing CUDA 10.2 DALI version (#3066)
  • Add image decoder memory hints for nvJPEG in DALI examples (#3029)
  • Add split shape utility (#3062)
  • Add ROI support to NumpyReader GPU (#3034)
  • Enable no_copy mode handling in TF DALI Dataset (#3058)
  • Add support for VP8 and MJPEG videos (#3045)
  • Make pytorch lightning example work with multiple GPUs (#3037)
  • Add override flags for no_copy option of External Source (#3041)
  • Add NumpyFileWrapper to numpy loader (#3054)
  • Add a mention of CPU-only arguments inputs in docs (#3039)
  • Minor changes in Slice GPU kernels, before reusing them in NumpyReader GPU (#3040)

Bug fixes

  • Fix hint handling: (#3145)
  • Add support for functools.partial in ExternalSource. (#3143)
  • Install libcufile (for GDS) as a part of the cuda base build step (#3142)
  • Add check of strerror_r return value in CUFile HandleIOError (#3141)
  • Disable VMAsyncPool CrossStream test on incompatible platforms. (#3140)
  • Fix the lack of execution of variable batch size test (#3134)
  • Throw std::bad_alloc when ordinary host memory runs out + tests for xxx_malloc resources. (#3131)
  • Fix allocation hint handling in CUDA VM resource (#3128)
  • Revert change from python to Python_EXECUTABLE (#3126)
  • Coverity issue fixes - bulk drop, July 2021 (#3124)
  • Make nvJPEG detect corrupted stream before offloading to HW decoder (#3113)
  • Add --no-index option to TL1_tensorflow-dali_test test (#3112)
  • Minor fixes (#3119)
  • DALI TF install tool: Copy files for import check, rather than symlink (#3116)
  • minor fixes (#3108)
  • Dali TF installation: check import before completing the installation (#3104)
  • Remove no longer applicable sed command from RN50 MXNet test (#3103)
  • Use DALI_extra instead of example_audio_file in the spectrogram example (#3106)
  • Unify apt-get invocations (#3094)
  • Make DALI extra download optional in tests (#3102)
  • Remove pre CUDA 10.0 support in TL1_tensorflow-dali_test (#3099)
  • Bug fixes (#3096)
  • MMUtilFixes: (#3098)
  • Fix override no copy flags for External Source C API (#3097)
  • Fix HW decoder fallback to the hybrid decoder (#3086)
  • Fix DALI installation for python 3.9 version (#3092)
  • Fix python test on aarch64 platform (#3091)
  • Move pycocotools to regular pip packages in SSD test (#3090)
  • Use PEP 503 compatible extra url index to install PyTorch (#3079)
  • Remove compiler name subdirectory in prebuilt DALI TF prebuilt directory (#3078)
  • Disable MNIST dataset download for DALI pipelines (#3075)
  • Fix known FFmpeg n4.4 vulnerabilities (#3071)
  • Fix DALI TF Plugin build in TF 2.6 (#3074)
  • Fix error handling in Executor (#3069)
  • Fix typo inout -> input (#3070)
  • Fix error message when creating a TensorShape from iterators with more elements than expected (#3060)
  • Add warning about not using external_inputs in proto (#3057)
  • Fix usage of removed _ExternalSource in test (#3059)
  • Make the Python test utilities have local random state (#3055)
  • Fix batch size handling in PermuteBatch. (#3026)
  • Update FFmpeg to address CVE-2021-33815 (#3053)
  • Remove duplicated ExternalSource implementation (#3033)
  • Build the latest clang from source (#3025)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Note: Starting from version 1.4.0, DALI will be providing CUDA 10.2 builds instead of CUDA 10.0

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.4.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.4.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.3.0

30 Jun 13:15
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operator:
  • Added experimental support for inputs via external_source in TensorFlow DALIDataset (#2949, #2993, and #2997).
  • Numpy reader improvements:
    • ROI reading for CPU (#3011).
    • intra-sample threading on GPU (#3010).
  • Improved CPU color_space_conversion operator performance (#2987).
  • Improved brightness and contrast operators performance (#2981).
  • Added a C API call to check backend of an operator (#3031 and #3050).
  • Documentation improvements (#2936, #2960, #2979, #2972, #3013, and #3035).

Fixed issues

This DALI release includes the following fixes:

  • Fixed an issue in readers.nemo_asr that caused a system error due to keeping too many open files (#3003).
  • Fixed a bug that caused out of bound memory access in mel_filter_bank (#2986).
  • Fixed a cudaErrorLaunchOutOfResources error that appeared in transpose operator on some GPUs (#2971).
  • Fixed handling of non-existing entries in readers.tfrecord (#2952).

Improvements

  • Rework numpy reader tests (#3036)
  • Extend HW decoder bench tool (#3043)
  • Remove space from file name (#3038)
  • Add experimental input support to TF DALIDataset (#2997)
  • Use BrightnessContrast as implementation of Brightness and Contrast ops (#2981)
  • Add C API call to check backend of an operator (#3031)
  • Fix Video reader documentation (#3035)
  • Enable DALI to build for CUDA 10.2 (#3007)
  • NumpyReader: Add support for ROI (#3016)
  • Add git hooks (#3023)
  • Update third party (#3009)
  • Add channel count checking in Dump Image (#3020)
  • Add parallel chunking support in GPU variant of the numpy reader operator (#3010)
  • NumpyReader to use HostWorkspace (#3011)
  • Update documentation of random.uniform to reflect data type conversion behavior (#3013)
  • Adjust tf code for experimental Dataset with inputs (#2993)
  • Add best-fit free tree. (#2996)
  • Refine torch audio pipeline tests: adding frame splicing, fix sequence length calculation, reflect pad start/end of the signal (#2992)
  • Rename free_tree to coalescing_free_tree. (#2995)
  • Use thread_pool in ColorSpaceConversion (#2987)
  • Move to CUDA 11.3 update 1 (#2990)
  • pool_resource: upstream lock & refactoring (#2988)
  • Add tests to cover OGG Vorbis, and FLAC audio formats (#2980)
  • Add synchronization and deferred deallocation to pool_resource (#2983)
  • Update FFmpeg, fix video container tests (#2918)
  • Add Preemphasis border policy (#2984)
  • Numba function operator, docs update (#2972)
  • Add a link to the DALI roadmap in the main readme and the documentation (#2979)
  • Add BOOL_SWITCH (#2974)
  • Add libopus to the binaries distributed with the wheel (#2969)
  • Add SaltAndPepper GPU operator (#2956)
  • Update documenation about supported TensorFlow versions by DALI (#2960)
  • Guard changes to default resources with a mutex. (#2955)
  • Add Salt and Pepper noise CPU operator (#2889)
  • Core allocation functions - improve alignment handling (#2947)
  • Add portable FP16 type & tests. (#2941)
  • RNGBase: Separate noise generation and application steps (#2934)
  • Add information about Open-CE effort that provides DALI (#2936)

Bug fixes

  • Remove mixed image decoder from GetBackendTest (#3050)
  • Fix pip download folder usage (#3028)
  • Avoid pre-commit hook for merge commits (#3032)
  • Coverity issue fixes. (#3021)
  • Add more connection attempts in setup_packages.py and increase the timeout to 100s (#3024)
  • Add 60s timeout for URL request in setup_packages.py (#3018)
  • Check CUDA API return values in device-side test helper. (#3017)
  • Run baseline pipelines on separate devices (#3012)
  • Multi paste refactor & fix (#3008)
  • Remove outdated warning about not supported ROI HW decoding (#2998)
  • NemoAsrLoader: Close file handles after reading metadata (#3003)
  • Improve Element Extract Op (#3004)
  • Temporarily disable test due to incompatible free list. (#3001)
  • Work around large alignas bug - align manually. (#3000)
  • Lifts the sm limitation that is tested in the numpy reader test (#2994)
  • MultiPaste: Fix in_ids argument type in the schema (#2965)
  • Fix a buffer overrun when the trailing dimension is collapsed. (#2986)
  • Add missing #include (#2985)
  • Enable SaltAndPepper GPU variable batch size tests (#2976)
  • Add missing tests to test_dali_variable_batch_size.py (#2982)
  • Change all reference to the master branch in the documentation (#2977)
  • Add missing tests to test_dali_cpu_only.py (#2964)
  • Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources (#2971)
  • Fix deps docker with custom DALI_deps SHA (#2970)
  • Add coverage test for CPU only and variable batch size test (#2962)
  • Enable variable batch size tests (#2957)
  • Fix returning memory to upstream from pool resource #2961
  • Fix handling of non_existing entries in TFRecord reader (#2952)
  • Enable pool to return memory to the upstream upon Out-of-Memory. (#2951)
  • Fix mixed indent in tf.py (#2949)
  • Fix bug in default constructed curand_uniform_dist (#2946)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.3.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.3.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.2.0

24 May 12:59
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • noise.shot CPU and GPU operators (#2861)
    • noise.gaussian CPU and GPU operators (#2846)
    • jpeg_compression_distortion CPU and GPU operators (#2823)
  • New mathematical operations (#2853):
    • Square and cubic root (sqrt, rsqrt, and cbrt)
    • Logarithms of different bases (log2 and log10)
    • Power (** operator and pow function)
    • Absolute value (abs and fabs)
    • Roundings (ceil and floor)
    • Trigonometric functions (sin, cos, and tan)
    • Inverse trigonometric functions (asin, acos, atan, and atan2)
    • Hyperbolic functions (sinh, cosh, and tanh)
    • Inverse hyperbolic functions (asinh, acosh, and atanh)
  • Added a Python wrapper for the fn.experimental.numba_function (#2886, #2835, #2903, #2893, and #2887)
  • Image decoder improvements:
    • Enabled ROI decoding in the hardware decoder (#2734).
    • Added support for the alpha channel in PNG and JP2 decoding (#2867).
    • Added support for YCbCr and BGR in JP2 decoding (#2867).
  • Updated the CUDA version to 11.3 (#2870).
  • Improved the documentation (#2915, #2911, #2927, #2862, and #2858).

Fixed issues

This DALI release includes the following fixes:

  • Fixed the readers.numpy cache issue (#2932).
  • Fixed an error in readers.nemo_asr (#2928).
  • Fixed a bug that caused the video reader hang (#2916).

Improvements

  • Improve Tensors docs (#2915)
  • DALI core allocation functions (#2930)
  • Update FFmpeg build guide and update DALI_deps version (#2911)
  • Default memory resources (#2890)
  • Better error message when insufficient data in cache (#2924)
  • Add a link to the TensorFlow ResNet50 training script in the Readme (#2927)
  • Numba func notebook (#2886)
  • Enable HW decoder ROI support (#2734)
  • Use a custom color space conversion kernel for all conversions (#2907)
  • Update packages used for DALI tests (#2906)
  • Refactor TF Dataset code and lint it (#2909)
  • Add ShotNoise CPU and GPU operators (#2861)
  • Remove workaround for the problem with patchelf changing TLS alignment for CUDA < 10.2 and > 11.1 (#2879)
  • Add dali_data_type_vec (#2887)
  • Composite resource + renaming. (#2891)
  • Update deps in third_party and conda (#2878)
  • Python wrapper for numba (#2835)
  • Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 (#2867)
  • Better error handling in pipeline.py (#2864)
  • Update DALI deps (#2876)
  • Enable CUDA 11.3 based builds (#2870)
  • Updates MXNet plugin documentation regarding last_batch_policy (#2862)
  • README update with GTC2021 materials (#2860)
  • RNGBase to be used as base for noise augmentations + Add GaussianNoise operator (as an example) (#2846)
  • Pinned async resource (#2858)
  • Add more mathematical operations (#2853)
  • Add JpegCompressionDistortion CPU and GPU operators (#2823)
  • Split Python tests into smaller chunks (#2847)
  • Asynchronous pool memory resource (#2814)

Bug fixes

  • Add missing opencv-python dependency to TL2_FW_iterators_perf test (#2939)
  • Fix numpy reader header cache (#2932)
  • NemoAsrReader: Call Reset() on tensor vector holding the batch, to clear any previous shared data pointer. (#2928)
  • Fix DALI compilation for CUDA 11 pre 11.3 version (#2925)
  • Make dynlink_xxx use statically linked functions to load symbols. (#2931)
  • Fix test_detection_pipeline.py (#2929)
  • Add a missing av_bsf_flush call to a VideoRader seek function (#2916)
  • Run Optical Flow on stream 0 when running driver > 460. (#2914)
  • Fix nvcc warning about unused arguments in ResampleDepth_Channels (#2913)
  • Fix CUDA 10.0 compilation (#2917)
  • Use stream 0 in VideoDecoder when running driver >460 / CUDA >= 11.3. (#2902)
  • Fix docs and rename numba_func to numba_function (#2903)
  • Allow to specify optional args of Python-only types (#2898)
  • DALI TF install tool: Verify that a compatible prebuilt plugin is available for the required TF version before proceeding to attempt installation (#2882)
  • Fix coverity issues by adding lacking CUDA_CALL (#2888)
  • Fix failing test for Numba Func (#2893)
  • Fix double accumulation in horizontal resampling. Add test. (#2871)
  • Add espilon to math function tests and adjust epsilon for rsqrt. (#2865)
  • Make not schedule any pipeline run when the iterator has prepare_first_batch=False (#2859)
  • Adjust the filenames of decoder test files and update licenses (#2844)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.2.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.2.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.2.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.1.0

15 Apr 13:43
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Documentation improvements (#2834, #2824, #2831, #2758, #2820, and #2822).
  • The following operators were added:
    • The experimental numba_func operator that allows the use of Numba functions in the DALI pipeline (#2804).
    • The expand_dims and squeeze operators for shape manipulation (GPU and CPU) (#2800, #2791, #2792).
    • The multi_paste operator (GPU) (#2681).
  • The following kernels were added:
    • JPEG compression distortion (GPU) (#2801, #2830, and #2839).
    • JPEG color conversion and chroma subsampling (GPU) (#2771).
  • Enabled CUDA kernels compression to decrease the DALI binaries size (#2833).
  • Added the src_dims argument to the reshape operator (#2788).

Fixed issues

This DALI release includes the following fixes:

  • Fixed a race condition in readers.nemo_asr when pad_last_batch is set to True (#2828).
  • Fixed the optical flow initialization issue (#2816).
  • Fixed a race condition in the data loader (#2773).

Improvements

  • Remove 0 default value from mean/std arguments of normalize. (#2834)
  • Add JpegCompressionDistortionGPU kernel (#2830)
  • Updates the pipeline docs page (#2824)
  • Enable CUDA kernels compression in the final binary (#2833)
  • Updates build documentation (#2831)
  • Update key visual (#2822)
  • Add NumbaFunc operator (#2804)
  • Add JPEG distortion kernel (#2801)
  • Add AddArg overloads for enum types (#2819)
  • Update third party dependencies to latest release versions (#2811)
  • Add an ability to provide a custom DALI_extra sha via env variable (#2810)
  • Move all deps into subrepos (#2756)
  • Reshape, Reinterpret, Squeeze and ExpandDims tutorial. (#2791)
  • Separate creation of dependency creation and CUDA installation (#2786)
  • Remove intermediate stage from CUDA toolkit dockerfile (#2803)
  • Add Expand dims operator (#2800)
  • Update TensorFlow ResNet50 example to the latest horovod 21.03 (#2793)
  • Add squeeze operator (#2792)
  • Add JPEG color conversion and chroma subsampling kernel (#2771)
  • Add src_dims to reshape operator (#2788)
  • GPU MultiPaste (#2681)
  • Add --upgrade to pip install commands in documentation (#2758)
  • Use flattened view of the array for copying to shared memory. (#2783)

Bug fixes

  • Fix JPEG distortion kernel quality parameter handling (#2839)
  • Fix typo "funcions" <- "funcions" in math doc (#2820)
  • Update DALI_deps to include FLAC security patch (#2826)
  • Fix coverity issues (#2812)
  • Fix optical flow parameter initialization. (#2816)
  • Add host fallback when nvjpegDecodeJpegDevice and nvjpegDecodeJpegHost fail (#2805)
  • ExternalSource - discard data from all callbacks when one raises StopIteration (#2784)
  • Exclude PyTorch-lighting test with MNIST (#2785)
  • Fix iteration number tracking with pipeline.reset (#2777)
  • Fix a race when loader starts reading even the metadata is not ready yet (#2773)
  • Fix race condition in NemoAsrReader when pad_last_batch is set to True (#2828)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.1.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.1.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.1.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.0.0

24 Mar 15:26
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • The API documentation has been improved:
  • New operators:
    • A GridMask GPU operator for GridMask data augmentation (#2652).
    • A RandomObjectBBox operator with caching to randomly select a bounding box (#2718, #2696, #2677, and #2657).
    • A MultiPaste operator, is required to implement Mosaic augmentation (#2583).
  • External Source can now run the per-sample callbacks in parallel (#2543).
  • Added pipeline_def decorator, which is an easier to define a pipeline with the functional API (#2757 and #2629).
  • Moved all decoders to a dedicated Python module (#2741, #2743, and #2725).
  • Moved all readers to a dedicated Python module (#2720, #2721, #2717, #2715, and #2722).
  • Exposed the pipeline output names in the C API (#2665).
  • Introduced the following named Slice operator arguments (#2625):
    • start/rel_start
    • end/rel_end
    • shape/rel_shape
  • Enabled additional codecs and demuxers in FFmpeg (#2651).
  • Added an option to disable the first batch preparation during the iterator construction (#2664).

Fixed issues

This DALI release includes the following fixes:

  • Fixed the JPEG 2000 ROI decoding (#2692).
  • Fixed the layout length check in Transpose (#2693).
  • Fixed the .gpu() usage detection and error for CPU-only pipelines (#2682).

Improvements

  • Rework frameworks notebooks to fn API (#2761)
  • Bump up OpenCV-python version in tests (#2749)
  • Enhance deprecated argument documentation (#2755)
  • Convert notebooks to fn API: audio_processing, custom_operator, serialization (#2744)
  • Expose all pipeline constructor arguments as properties. (#2757)
  • Convert notebooks to fn API: sequence_processing (#2748)
  • Gridmask Gpu (#2652)
  • Run external source callback in parallel (#2543)
  • Bump up nvidia-tensorflow version to 1.15.5 21.02 (#2738)
  • Rewrite image processing examples to fn api. (#2745)
  • Update augmentation gallery (#2716)
  • Remove dynlink CUDA libs from the build image (#2739)
  • Rework getting started (#2729)
  • Adjust Python decoders tests to decoders module (#2741)
  • Adjust notebooks to new decoder module (#2743)
  • Update memory resource interfaces. (#2742)
  • Move decoders to decoders module (#2725)
  • Add Examples and Tutorials metadata title (#2730)
  • Adjust test to new readers module (#2720)
  • Adjust examples to new readers module (#2721)
  • Documentation home update (#2713)
  • Move tfrecord reader to readers module (#2722)
  • Move readers to dedicated submodule (#2717)
  • Add hash-based caching to RandomObjectBBox. (#2718)
  • Add break of VideoReader loop when keyframe past requested has been reached (#2706)
  • Improve set_outputs to accept list or tuple of data nodes as well (#2698)
  • Documentation: New layout of Examples and Tutorials section (#2710)
  • Rename test files for readers (#2715)
  • Add error checking if provided shape to tfrecord can house underlying data (#2705)
  • Documentation editorial changes: Init caps for all headings, Copyright update (#2703)
  • Add documentation to functional API (all fn.*) + New documentation layout (#2653)
  • Parallel random object BBox (#2677)
  • Rework ThreadPool and spinlock (#2696)
  • Improvements in Dockerfile.deps so that RUN commands are easily run in a non-docker environment (#2686)
  • Fix formatting of Resnet-N with Tensorflow example (#2694)
  • Operator RandomObjectBBox (#2657)
  • MultiPaste operator (#2583)
  • Add better exception granurality to memory::alloc_shared and memory::alloc_unique (#2683)
  • Make DALI pipeline use default seed (-1) when None is set to seed (#2676)
  • Make preparation of the first batch during the iterator construction optional (#2664)
  • Parallelize commands in bundle-wheel.sh (#2672)
  • Pipeline decorator (#2629)
  • Move to CUDA 11.2 update 1 (#2668)
  • Make sure that OpenCV decoding fallback follows EXIF information handling (#2666)
  • Expose names of Pipeline outputs in C API (#2665)
  • Enable named Slice arguments: start/rel_start, end/rel_end, shape/rel_shape (#2625)
  • Update nvidia-tensorflow in qa scripts to 20.12 (#2654)
  • Enable more codecs and demuxers in FFmpeg (#2651)

Bug fixes

  • Fix paddle ssd (#2765)
  • Fix Gluon example (#2764)
  • Remove redundant dimension from Optical Flow example. (#2762)
  • Fix 403 error when downloading Mnist dataset in Pytorch Lighting example (#2759)
  • Fix documentation instances of deprecated fn.image_decoder (#2754)
  • Shutdown executor when an error occurs in the executor itself, not in one of operators. (#2750)
  • Fix libcufile.so name to have *.0 sufix (#2735)
  • Fix test exclude pattern for Xavier (#2731)
  • Fix auto replacement of deprecated args for schema inheritance (#2733)
  • Fix constant input promotion for mixed backend. (#2726)
  • Fix type of slice's rel_shape argument (#2714)
  • Fix a regression in RandomObjectBBox: weights not set to default. (#2719)
  • Update TensorFlow ReseNet50 example to work with the latest TF 2.4.x version (#2704)
  • Add auto generated docs files to .gitignore (#2711)
  • Update DALI PyTorch ligthing example to work with the newest lighting (#2697)
  • Fix JPEG2K fused decoding (with ROI), add native tests for JP2k decoding (#2692)
  • Fix TL1_tensorflow-dali_test (#2687)
  • Remove unnecessary cuda runtime dependency from alloc.h (#2691)
  • Fix layout length check in Transpose. (#2693)
  • Replace eval with safer ast.literal_eval (#2690)
  • Fix .gpu usage detection and error for CPU only pipelines (#2682)
  • Add support for TensorFlow 2.4.1 in tests and for TF plugin (#2679)
  • Fix wrong early exit in function inside bundle-wheel.sh (#2675)
  • Fix apex compilation on Ubuntu 20.04 in TL1_ssd_training (#2671)
  • Fix cmake installation in TL1 for Ubuntu 20.04 (#2669)
  • Remove the split stages implementation of the hybrid image decoder (#2753)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

  • fn.audio_decoder / ops.AudioDecoder has been renamed to fn.decoders.audio / ops.decoders.Audio.
  • fn.image_decoder / ops.ImageDecoder has been renamed to fn.decoders.image / ops.decoders.Image.
  • fn.image_decoder_crop / ops.ImageDecoderCrop has been renamed to fn.decoders.image_crop / ops.decoders.ImageCrop.
  • fn.image_decoder_random_crop / ops.ImageDecoderRandomCrop has been renamed to fn.decoders.image_random_crop / ops.decoders.ImageRandomCrop.
  • fn.image_decoder_slice / ops.ImageDecoderSlice has been renamed to fn.decoders.image_slice / ops.decoders.ImageSlice.
  • fn.caffe2_reader / ops.Caffe2Reader has been renamed to fn.readers.caffe2 / ops.readers.Caffe2.
  • fn.caffe_reader / ops.CaffeReader has been renamed to fn.readers.caffe / ops.readers.Caffe.
  • fn.coco_reader / ops.CocoReader has been renamed to fn.readers.coco / ops.readers.Coco.
  • fn.file_reader / ops.FileReader has been renamed to fn.readers.file / ops.readers.File.
  • fn.mxnet_reader / ops.MXNetReader has been renamed to fn.readers.mxnet / ops.readers.MXNet.
  • fn.nemo_asr_reader / ops.NemoAsrReader has been renamed to fn.readers.nemo_asr / ops.readers.NemoAsr.
  • fn.numpy_reader / ops.NumpyReader has been renamed to fn.readers.numpy / ops.readers.Numpy.
  • fn.sequence_reader / ops.SequenceReader has been renamed to fn.readers.sequence / ops.readers.Sequence.
  • fn.tfrecord_reader / ops.TFRecordReader has been renamed to fn.readers.tfrecord / ops.readers.TFRecord.
  • fn.video_reader / ops.VideoReader has been renamed to fn.readers.video / ops.readers.Video.
  • fn.video_reader_resize/ops.VideoReaderResize has been renamed to fn.readers.video_resize / ops.readers.VideoResize.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.0.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.0.0

Or use direct download links (CUDA 10.0):

Read more

DALI v0.31.0

25 Feb 14:29
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Gridmask CPU and GridMask Data Augmentation (https://arxiv.org/abs/2001.04086), which is useful for the EfficientNet pipeline (#2582).
    • ROIRandomCrop CPU, where an operator is required to perform the biased random crop in segmentation applications (#2638).
  • Added support for the variable batch size in ExternalSource (#2481, #2641).
  • Added support for the time-major layout in the following spectrogram processing operators:
  • Refactored and unified the following RNG operators:
  • Reworked the custom operators documentation (#2568).
  • Applied performance improvements in the JPEG decoder (#2655, #2610).

Fixed issues

  • Fixed the length that was reported by DALI FW iterators when the DROP policy is used (#2611)
  • Provided a workaround for a compiler problem that caused an Invalid device function error. (#2656)
  • Fixed RandomBBoxCrop errors while using the crop_shape argument (#2605)

Improvements

  • Use pinned memory for staging buffer for HW nvJPEG decoder (#2655)
  • Find bounding boxes of multiple labels (#2650)
  • Add ROIRandomCrop operator (#2638)
  • Add FW iterators handling of variable batch size and improve ES examples (#2641)
  • Connected components (#2640)
  • Gridmask Cpu (#2582)
  • Iter-to-iter variable batch size (#2481)
  • Enable support for different layouts in the MelFilterBank (#2620)
  • Rework ops.random.CoinFlip (#2577)
  • Enable time-major layout in Spectrogram CPU (#2619)
  • Update clang format (#2524)
  • Improve Optical Flow error verbosity (#2618)
  • TF dataset tests rework (#2539)
  • Time major Spectrogram (GPU-only) (#2617)
  • Integrate RMM (#2609)
  • Propagate scalar in transform.scale (#2581)
  • Remove redundant JPEG decoder initialization from peeking shape function (#2610)
  • Rework ops.random.Uniform (#2531)
  • Rework custom operator docs (#2568)

Bug fixes

  • Workaround a compiler problem that caused Invalid device function error. (#2656)
  • Python fixes: argument inputs, external source, docs (#2646)
  • Fix SeparateQueuePolicy handling of the CPU stage (#2636)
  • Fix variable batch size for list of tensors. Make constants constant again. (#2637)
  • Fix Uniform discrete distribution (#2635)
  • Fix a double set of preserve schema arg and uninitialized var (#2632)
  • Add handling of empty inputs and tiny outputs in Resize op and Resampling kernels. (#2634)
  • Refactor functions that extract a range of samples from TLS and TLV. (#2628)
  • Fix RandomBBoxCrop errors while using crop_shape argument (#2605)
  • Update ResNet50 example to work with TensorFlow 2.x (#2537)
  • Keep reference to owner of data in Python Tensor and TensorList (#2606)
  • Enable nvJPEG2K for CUDA 11.2 builds (#2614)
  • Disable mmap based test for Xavier (#2612)
  • Fix length reported by DALI FW iterators when DROP policy is used (#2611)
  • Use smaller block in Warp (#2613)

Breaking API changes

Deprecated features

  • ops.Uniform was moved to ops.random.Uniform
  • ops.CoinFlip was moved to ops.random.CoinFlip

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.31.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.31.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.31.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.30.0

27 Jan 17:28
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Optimized CPU resampling (#2540).
  • Added the following mathematical expressions:
    • Disallowed unwanted __bool__ conversions (#2538).
    • Added exp and log math functions (#2555).
  • Added the images argument for the COCOReader, which allows for the custom ordering of images and fixed a bug in the segmentation data parsing (#2548, #2597).
  • Added support for the nvJPEG preallocate API for a batched hardware decoder (#2544).
  • Added support surfaces with strides over 2G (#2600).
  • Enabled CUDA 11.2 builds (#2553).
  • Documentation improvements:
    • Added a supported matrix to the documentation (#2519).
    • Added a geometric transform tutorial. (#2530).
  • Allowed DALI to be compiled with Clang (#2416).
  • Added CUDA API checks in utility functions (#2517) and tests (#2516).

Fixed issues

  • Fixed the autoreset option in the iterator for the DROP policy (#2567).

Improvements

  • Make Nvjpeg2kTest more verbose (#2509)
  • Compile DALI with Clang (#2416)
  • Try to actually find the library instead of arbitrarily deciding it can't be there (#2511)
  • Enable GDS for conda build by default (#2515)
  • Pool memory resource (#2518)
  • Add GTest Event Listener with CUDA validation after TEST (#2516)
  • Disable GPU numpy reader test form sm < 6.0 (#2514)
  • Mention WarpAffine in transforms.* documentation (#2527)
  • Ops rework to prepare iter-to-iter batch size variability (#2408)
  • Fix unchecked CUDA API calls in utility functions (#2517)
  • Bump up nvidia-tensorflow version in tests (#2526)
  • Cleanup warnings in CUDA code (#2523)
  • Add debug info to RN50 pipeline (#2522)
  • Add a supported matrix to the documentation (#2519)
  • Add ArgValue utility (#2528)
  • Remove pinning numpy version in TL1_ssd_training test (#2536)
  • Remove unreachable return statement (#2541)
  • Vectorize CPU resampling (#2540)
  • Remove constraint on input type for RandomResizedCrop. Update tests. (#2549)
  • Hide ArithmeticGenericOp doc and disallow bool (#2538)
  • Support for nvJPEG preallocate API for batched HW decoder (#2544)
  • Add exp and log math functions (#2555)
  • Add COCOReader files arg support and fix bug in the segmentation data parsing (#2548)
  • Event pool (#2520)
  • Rework random number generators. RNGBase operator template and NormalDistribution. (#2513)
  • Enable CUDA 11.2 builds (#2553)
  • Adjust range of tested log inputs (#2564)
  • Add geometric transform tutorial. (#2530)
  • Add synchronization after randomizer construction. (#2565)
  • Move to the upstream version of paddle paddle (#2561)
  • Move examples to fn api (#2566)
  • Remove legacy API based nvJPEG decoder implementation (#2591)
  • Support surfaces with strides over 2G (#2600)
  • COCOReader images argument can be used to provide a custom order of images (#2597)

Bug fixes

  • Fix build for Jetson platform (#2512)
  • Fix aarch64 build errors (#2529)
  • Fix broken uniform operator python tests (#2556)
  • Fix Clang build (#2560)
  • Fix Xavier test crash caused by NumPy faulty build (#2596)
  • Fix autoreset option in iterator for DROP policy (#2567)
  • Fix uniform distribution test expectations (#2589)

Breaking API changes

Deprecated features

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.30.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.30.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.30.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.29.0

30 Dec 12:56
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • NumpyReader GPU Operator with the support of GPU Direct Storage (#2477)
    • NvJpeg2K decoding was enabled in ImageDecoder operator (#2501)
    • segmentation.RandomMaskPixel operator for creating random masks containing foreground pixels (#2445)
    • OneHot for GPU (#2436)
  • Move all NVTX infrastructure into core and create DALI domain (#2472)
  • New Examples:
    • Add mask processing to COCO Reader with Augmentations example (#2426)
    • Add reductions example (#2457)
    • Example of random_mask_pixel to perform biased random crop (#2474)
    • Update ExternalSource framework examples (#2482)
  • Operator Improvements:
    • Pad: Add support for per-sample shape and alignment requirements (#2432)
    • RandomResizedCrop: enable channel-first and video support + add tests (#2430)
    • PythonFunction Operator: support for output layouts (#2486)
    • Optimize the DCT GPU kernel. (#2471)
    • COCOReader: Support for uncompressed RLE masks (#2478)
    • transforms.Rotation to accept scalar inputs (#2494)
  • Move to CUDA 11.1 update 1 (#2419)

Fixed issues

  • NumpyReader : Replace std::regex with custom implementation (#2489) - fix ABI incompatibility issues
  • Fix the dimensionality of labels in SSDRandomCrop. (#2488)

Improvements

  • Move to CUDA 11.1 update 1 (#2419)
  • RandomResizedCrop: enable channel-first and video support + add tests (#2430)
  • Pad operator: Add support for per-sample shape and alignment requirements (#2432)
  • Update clang to 10.0 (#2424)
  • Add mask processing to COCO Reader with Augmentations example (#2426)
  • Make custom nvJEPG allocator return a relevant allocation status (#2438)
  • Make the custom nvJPEG allocator not throw and return only the status (#2443)
  • Add SearchableRLEMask utility (#2441)
  • Add GPU support to OneHot operator (#2436)
  • Reduce axes names (#2425)
  • Remove CUDA headers and generate stubs in runtime (#2420)
  • TensorVector update for iter-to-iter variable batch size (#2435)
  • Fix build with all options off, relax libclang required version (#2455)
  • Add support for UINT8 and INT8 outputs in CMN + scale and shift arguments (#2458)
  • CocoReader Parse RLE masks only when piwelwise masks are requested (#2462)
  • Add reductions example (#2457)
  • Enables direct linking with libcuda.so instead of dlopen (#2459)
  • Add segmentation.RandomMaskPixel operator (#2445)
  • Skips the building of prebuilt DALI package for nvidia-tensorflow (#2451)
  • Pad to square tests (#2442)
  • Enable compile time generation of dynlink wrappers for nvml (#2463)
  • Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)
  • Hide hidden ops and improve Enum docs quality (#2470)
  • Enforce uniform rank and type of the outputs read by CPU DataReader. (#2476)
  • Move all NVTX infrastructure into core and create DALI domain (#2472)
  • MXNet Iterator: Revert to squeeze_labels=True behavior by default (#2479)
  • Example of random_mask_pixel to perform biased random crop (#2474)
  • Update DALI dependency (#2483)
  • Update ExternalSource framework examples (#2482)
  • Optimize the DCT GPU kernel. (#2471)
  • Support the output layouts in the PythonFunction Operator (#2486)
  • transforms.Rotation to accept scalar inputs (#2494)
  • Rework tutorials general (#2480)
  • Add support for GPU based numpy reader (#2477)
  • Per sample ExternalSource (#2469)
  • Use atol instead of rtol (#2499)
  • Lifts the restriction and enables enable_frame_num and enable_timestamps for filenames (#2468)
  • Reenable nvJPEG2000 (#2501)
  • Disables GDS for the default build configuration (#2502)
  • COCOReader: Support for uncompressed RLE masks (#2478)
  • Memory manager - interfaces, utilities, monotonic resources, malloc resource (#2497)
  • Update Jetson compilation guide (#2508)
  • Makes sure that cuFile and nvJPEG2k are not possible to set when not supported (#2510)

Bug fixes

  • Fix seed in RandomResizedCrop test. (#2437)
  • QNX build fix (#2440)
  • Fix lack of proper loading of best_prec1 from the checkpoint (#2466)
  • Fix the dimensionality of labels in SSDRandomCrop. (#2488)
  • NumpyReader : Replace std::regex with custom implementation (#2489)
  • Fix CPU only mode in C API (#2496)
  • Fix bugs reported by static analysis (#2491)
  • Fix typo in STYLE_GUIDE.md (#2503)
  • Fix NVJPEG2K_ENABLED test macros (#2504)

Breaking API changes

Deprecated features

  • Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.29.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.29.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.28.0

30 Nov 15:46
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Affine transform generators, which are operators that generate scale, rotate, shear, translate, crop transform matrices (#2309).
      • You can use the transforms.Combine operator to combine these matrices (#2317).
      • These transformations can be applied to data by using the CoordTransform operator.
    • Added min, max, and clamp arithmetic operators (#2298).
    • Cat and Stack Operators to concatenate and stack Tensors for the CPU and the GPU (#2301, #2339, #2350).
    • The following reductions for the CPU and the GPU (#2342, #2379 #2395):
      • Min, Max, Sum, Mean, MeanSquare, RootMeanSquare, Std, Variance
    • The MFCC operator for the GPU (#2423).
    • The SelectMasks operator (#2381).
    • Add operators for batch reordering:
      • BatchPermutation for generating random reordering of the batch.
      • PermuteBatch, which reorders tensors in a batch, based on a list of provided indices (#2417).
    • Operator Compose: PyTorch-style API to compose the operators (#2393).
  • Improvements in existing operators:
    • Added SeekFrames to the audio decoder. The redesign allows you to decide the decoded data type at runtime (#2334).
    • Added the ability to handle UTF8 text to the NemoAsrReader (#2358).
    • Added explicit file list support to the FileReader (#2389).
    • Improvements in the COCO reader API (#2406).
      • The COCOReader API now outputs relative mask polygon coordinates when the option ratio is set to True (#2375).
    • RandomBBoxCrop now optionally outputs the indices of the bounding boxes that passed the centroid filter (#2374).
  • The late initialization of torch_gpu_device in the Pytorch plugin (#2411).
  • The automatic constant-to-input promotion (#2361) and generalized handling of operator arguments (#2393).
  • Added a MNIST example for DALI and PyTorch Lightning (#2360).
  • Added the last_batch_policy to the framework iterator (#2269).
  • New builds:
    • Python 3.9 is now enabled (#2333).
    • The DALI wheels for CUDA 11 are built with CUDA 11.1 and use Enhanced Compatibility to work with CUDA 11.0 (#2302, #2356, #2367, and #2413).
    • Added support for the SM_86 architecture (#2364).
    • Added the ability to cross-build Python wheels for Jetson (#2313).

Bug fixes

  • Fix error when VideoReader is prematurely terminated (#2336)
  • Fix failure in affine transforms tests (#2337)
  • Fix the problem of output outliving the pipeline in python (#2341)
  • Fix lack of proper layout setting in the VideoReader (#2346)
  • Fix uniform generator operator (#2352)
  • Bugfixes: Default nfft value and to_snake_case implementation (#2353)
  • Fixes problems in the weekly build (#2372)
  • Fix a problem with reference to "incomplete" type (error in Clang/CUDA). (#2377)
  • Fix how DALI handles StopIteration from the ExternalSource (#2373)
  • Fix TL1_nodeps_build and TL0_cpu_only (#2391)
  • Fix CPU only mode for arithm operators (#2400)
  • Preserve shape of psuedoscalars in arithmetic ops. (#2359)

Improvements

  • Add affine transform generators: TransformScale, TransformRotation, TransformShear, TransformCrop (#2309)
  • Change code/docs language to be more inclusive (#2322)
  • Update nvidia-tensorflow test package to 20.9 and bump tensorflow-gpu minor versions (#2320)
  • Update example usage of DALIClassificationIterator in docs strings (#2306)
  • Reduce video reader memory consumption (#2308)
  • TensorJoin kernel for CPU (#2301)
  • Enable automatic python modules for operator (#2329)
  • Split GaussianBlur Python test (#2332)
  • Add CombineTransforms operator (#2317)
  • Append TensorListShapes (#2291)
  • Enable CUDA 11.1 builds (#2302)
  • Add min, max and clamp arithmetic ops (#2298)
  • Update TensorFlow plugin documentation (#2328)
  • Remove Python 3.5 support, enable Python 3.9 (#2333)
  • Enable nvJPEG2k build for CUDA 11.1 (#2343)
  • Add BUILD_DALI_NODEPS to allow building dali_core and dali_kernels without extra third party libraries present in the system (#2321)
  • Add SeekFrames to audio decoder. Redesign to allow deciding decoded data type at runtime. (#2334)
  • Add discrete mode to Uniform operator (#2340)
  • Test for utility CMake function (find_dali) (#2325)
  • Propagate new build options to other build utilities (#2349)
  • Add support for N-dim tensors to OneHot (#2345)
  • Adds a separate option to preallocate nvjPEG2k memory (#2347)
  • Tensor join GPU (#2339)
  • Reductions: min, max (#2342)
  • Tensor concatenation and stacking (#2350)
  • Use inverse (source-to-destination) matrix in WarpAffine operator (#2338)
  • Disable more dependencies for nodeps build (#2355)
  • Update DALI trademark information (#2351)
  • Reduce GPU memory fraction in TF tests to 0.5. (#2357)
  • Automatic constant-to-input promotion. (#2361)
  • Add support for SM_86 architecture (#2364)
  • Use current class next implementation in init, to avoid special handling of first batch in child classes (#2363)
  • Add ability to cross-build Python wheels for Jetson (#2313)
  • Add NemoAsrReader handling of UTF8 text (#2358)
  • Enable CUDA 11 compatibility mode (#2356)
  • Add MNIST example for DALI and PyTorch Lightning (#2360)
  • Add last_batch_policy to the framework iterator (#2269)
  • COCOReader to output relative mask polygon coordinates when the option ratio is set to True (#2375)
  • RandomBBoxCrop to optionally output the indices of the bounding boxes that passed the centroid filter (#2374)
  • Enable compatibility layer in tests for CUDA 11 (#2367)
  • Reduce Sum Op (#2379)
  • Install DALI license, copyright and acknowledgments explicitly (#2392)
  • Add layout support to OneHot operator (#2388)
  • Generalized handling of operator arguments + operator Compose. (#2393)
  • GPU DCT kernel (#2398)
  • Bump up Nvidia TF version to 20.10 (#2397)
  • More reductions (#2395)
  • Late initialization of torch_gpu_device in pytorch plugin (#2411)
  • Add a link to CUDA Enhanced Compatibility Across Minor Releases guide (#2410)
  • Add explicit file list support to FileReader. (#2389)
  • Add TransformTranslation deprecation placeholder Op (#2412)
  • Bump up the CuPy to one that supports CUDA 11.0 (#2413)
  • Add a missing include in filesystem.cc (#2414)
  • Add a warning about the Python function incompatibility with TensorFlow (#2415)
  • Improvements in COCO reader API (#2406)
  • Add operators for batch reordering (#2417)
  • Add SelectMasks operator (#2381)
  • GPU MFCC operator. (#2423)
  • Make base image for dockers customizable at the build time (#2427)

Breaking API changes

  • Python 3.5 is no longer supported by the official DALI wheels.

Deprecated feature

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.28.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.28.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code: