Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.18.0] - XXX. XX, 2024

Added

Change

Fixed

[0.17.0] - May. XX, 2024

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions, and complies with revision 2023.12 of Python Array API specification.

Added

Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Changed

Expanded documentation for dpctl: gh-1619
Expanded utils.intel_device_info functionality: gh-1656
Improved performance of elementwise operations: gh-1651
Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
dpctl uses pybind11 2.12.0: gh-1640
Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed

Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
Support use of index arrays of different integral types in indexing operations: gh-47
Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
Fixed support for out keyword in tensor.matmul: gh-1610
Fixed bug in basic slicing of empty arrays: gh-1680
Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

[0.16.1] - Apr. 10, 2024

This is a bug-fix release, which also provides a change needed by numba_dpex project to support dispatching kernels consuming instances of sycl::local_accessor template type.

Changed

Changed behavior of dpctl.tensor.usm_ndarray.__dlpack_device__ method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
Array creation functions and the usm_ndarray constructor in dpctl.tensor submodule now use cached default-selected device to improve performance: #1606
Changed treatment of axis keyword for dpctl.tensor.tensordot and dpctl.tensor.vecdot to align with Python Array API 2023.12 specification: #1608
Changed implementation of DPCTLQueue_SubmitRange, DPCTLQueue_SubmitNDRange in DPCTLSyclInterface library to support sycl::local_accessor arguments needed by numba_dpex; the enum DPCTLKernelArgType to correspond to C++ disjoint types: #1609, #1611, #1612

Fixed

Fixed a crash on Windows platform during execution of getter of dpctl.SyclPlatfom.default_context property: : #1604
Fixed kernel submission error on NVidia CUDA GPUs during dpctl.tensor.matmul operation: #1605
Fixed corruption of context cache table entries: #1607
Fixed incorrect result from dpctl.tensor.tensordot reported in issue #1570: #1608
Fixed library name output by python -m dpctl --library: #1615

[0.16.0] - Feb. 16, 2024

This release will require DPC++ 2024.1.0, which no longer supports Intel Gen9 integrated GPUs found in Intel CPUs of 10th generation and older. Featurewise, this release is identical to 0.15.1.

[0.15.1] - Feb. 10, 2024

This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.

Added

Added reduction functions dpctl.tensor.min, dpctl.tensor.max, dpctl.tensor.argmin, dpctl.tensor.argmax, and dpctl.tensor.prod per Python Array API specifications: #1399
Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of dpctl.tensor.usm_ndarray type: #1431, #1447
Added new elementwise functions dpctl.tensor.cbrt, dpctl.tensor.rsqrt, dpctl.tensor.exp2, dpctl.tensor.copysign, dpctl.tensor.angle, and dpctl.tensor.reciprocal: #1443, #1474
Added statistical functions dpctl.tensor.mean, dpctl.tensor.std, dpctl.tensor.var per Python Array API specifications: #1465
Added sorting functions dpctl.tensor.sort and dpctl.tensor.argsort, and set functions dpctl.tensor.unique_values, dpctl.tensor.unique_counts, dpctl.tensor.unique_inverse, dpctl.tensor.unique_all: #1483
Added linear algebra functions from the Array API namespace dpctl.tensor.matrix_transpose, dpctl.tensor.matmul, dpctl.tensor.vecdot, and dpctl.tensor.tensordot: #1490, #1525, #1541
Added dpctl.tensor.clip function: #1444, #1505
Added custom reduction functions dpt.logsumexp (reduction using binary function dpctl.tensor.logaddexp), dpt.reduce_hypot (reduction using binary function dpctl.tensor.hypot): #1446
Added inspection API to query capabilities of Python Array API specification implementation: #1469
Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
Added dpctl.utils.intel_device_info function to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445
Added support for two new device descriptors, dpctl.SyclDevice.max_mem_alloc_size and dpctl.SyclDevice.max_clock_frequency: #1530

Changed

Functions dpctl.tensor.result_type and dpctl.tensor.can_cast became device-aware: #1488, #1473
Implementation of method dpctl.SyclEvent.wait_for changed to use sycl::event::wait instead of sycl::event::wait_and_throw: gh-1436
dpctl.tensor.astype was changed to support device keyword as per Python Array API specification: #1511
C++ header files in libtensor/include/kernels containing implementations of SYCL kernels no longer depends on "pybind11.h": #1516

Fixed

Fixed issues with dpctl.tensor.repeat support for axis keyword: #1427, #1433
Fix for gh-1503 for bug usm_ndarray.__setitem__: #1504
Other bug fixes: #1485, #1477, #1512

[0.15.0] - Sep. 29, 2023

Added

Added dpctl.tensor.floor, dpctl.tensor.ceil, dpctl.tensor.trunc elementwise functions.
Added dpctl.tensor.hypot, dpctl.tensor.logaddexp elementwise functions.
Added trigonometric (dpctl.tensor.sin, dpctl.tensor.cos, dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh, dpctl.tensor.cosh, dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin, dpctl.tensor.asinh, dpctl.tensor.acos, dpctl.tensor.acosh, dpctl.tensor.atan, dpctl.tensor.atanh).
Added dpctl.tensor.round function.
Added dpctl.tensor.sign and dpctl.tensor.remainder elementwise functions.
Added bitwise elementwise functions dpctl.tensor.bitwise_and, dpctl.tensor.bitwise_xor, dpctl.tensor.bitwise_or, dpctl.tensor.bitwise_invert
Added bitwise shift functions dpctl.tensor.bitwise_left_shift and dpctl.tensor.bitwise_right_shift.
Added dpctl.tensor.atan2 and dpctl.tensor.signbit elementwise functions.
Added dpctl.tensor.minumum and dpctl.tensor.maximum binary elementwise functions.
Supported equality checking and hashing for dpctl.SyclPlatform.
Implemented types property for all unary and binary elementwise functions #1361
Added dpctl.tensor.repeat and dpctl.tensor.tile functions.
Added dpctl.tensor.matrix_transpose function.

Changed

Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for dpctl.tensor.usm_ndarray type #1324.
Removed dpctl.tensor.numpy_usm_shared obsolete class and associated tests which were being skipped #1310
Transitioned dpctl codebase to Cython 3.
Improved performance of boolean reduction functions dpctl.tensor.all and dpctl.tensor.any.
Improved performance of summation function dpctl.tensor.sum.
Improved in-place arithmetic operations for addition, subtraction and multiplication.
Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
Removed deprecated DPCTLDevice_GetMaxWorkItemSizes function from the SyclInterface library.
Improved performance of dpctl.tensor.reshape in the case when a copy is being made.
Improved performance of dpctl.tensor.roll function.

Fixed

Fixed issues identified by Coverity security scans.
Fixed issues #1279, #1350, #1344, #1327, #1241, #1250, #1293.

[0.14.5] - 07/17/2023

Added

Added dpctl.tensor.log2 and dpctl.tensor.log10: #1267
Added dpctl.tensor.negative, dpctl.tensor.positive, dpctl.tensor.square #1268
Added dpctl.tensor.logical_not, dpctl.tensor.logical_and, dpctl.tensor.logical_or, dpctl.tensor.logical_xor #1270

Changed

dpctl.tensor.astype behavior for newdtype=None changes #1261
dpctl.tensor.usm_ndaray constructor default value of dtype keyword argument changed to None: #1265
Support for out arguments that overlap with inputs for unary elementwise functions#1281
Copying from one array to another a no-op if both arrays view into the same memory #1284

[0.14.4] - 06/14/2023

Added

Added dpctl.tensor.less_equal, dpctl.tensor.greater, dpctl.tensor.greater_equal: #1239

Changed

Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244

Fixed

Fixed handling of 0d arrays in dpctl.tensor.sum: #1238

[0.14.3] - 06/13/2023

Added

Added support of axis=None in dpctl.tensor.concat #1125
Added caching for dpctl.SyclDevice.filter_string property #1127
Added dpctl.tensor.isdtype from array API #1133
Added dpctl.tensor.unstack, dpctl.tensor.moveaxis, dpctl.tensor.swapaxes #1137, #1174
Allow for mutation of dpctl.tensor.usm_ndarray.flags.writable #1141
Added dpctl.tensor.where from array API #1147
Include libtensor headers in dpctl installation layout #1185
Added new properties of dpctl.tensor.usm_ndarray object #1199
Added a list of unary and binary elementwise functions from array API:
- #1203: dpctl.tensor.add, dpctl.tensor.divide, dpctl.tensor.isnan, dpctl.tensor.isinf, dpctl.tensor.isfinite, dpctl.tensor.cos, dpctl.tensor.abs, dpctl.tensor.equal
- #1205: dpctl.tensor.sqrt
- #1209: implements out keyword argument
- #1211: dpctl.tensor.multiply, dpctl.tensor.subtract
- #1214: dpctl.tensor.not_equal
- #1216: dpctl.tensor.exp, dpctl.tensor.sin
- #1217: dpctl.tensor.real, dpctl.tensor.imag, dpctl.tensor.proj
- #1218: dpctl.tensor.log, dpctl.tensor.log1p, dpctl.tensor.expm1
- #1221: dpctl.tensor.floor_divide
- #1235: dpctl.tensor.less
- #1237: in-place support for addition, multiplication and subtraction
Added dpctl.tensor.all and dpctl.tensor.any #1204
Added dpctl.tensor.sum #1210

Changed

Updated examples of native Python extensions built using dpctl #1108
Used security flags to compile and link native extensions of dpctl #1109
Changed types of dpctl.tensor.finfo and dpctl.tensor.iinfo output structure per array API spec #1110
Consolidated multiple USM temporaries life-time management host_tasks to improve test suite stability #1111
MAINT: Improved cmake target dependency tracking #1112
MAINT: Improved docstrings for existing dpctl.tensor functions #1123
Changed default value of mode keyword in dpctl.tensor.take and dpctl.take.put from clip to wrap #1132
Added support for (nested) sequence of dpctl.tensor.usm_ndarray objects in dpctl.tensor.asarray #1139
Improved exception handling in dpctl.tensor.usm_ndarray.__setitem__ special method #1146
Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
Improved speed of dpctl.tensor.usm_ndarray printing functionality #1187
Require DPC++ RT 2023.1 to build and run dpctl #1195
Compile offloading native extensions with -fno-sycl-id-queries-fit-in-int fixing gh-1184, #1200
Transition to conda-forge ecosystem #1213

Fixed

Fix to add empty values check for dpctl.tensor.place #1105, #1106
Fixed gh-1089 by improving dpctl.tensor.asarray handling of NumPy arrays viewing into host-accessible USM allocation objects.
MAINT: Fixed build break with newer GCC and SYCLOS #1118
Fixed a bug in basic indexing of dpctl.tensor.usm_ndarray #1136

[0.14.2] - 03/07/2023

Fixed

Fixed a bug with boolean advanced indexing #1103

[0.14.1] - 03/06/2023

Added

Added dpctl.SyclDevice.partition_max_sub_devices property #1005
Added dpctl.program.SyclKernel.max_sub_group_size property #1028
Implemented printing of usm_ndarray #1013, #1043, #1060
Implemented support for advanced indexing for dpctl.tensor.usm_ndarray #1095, #1097, #1099, #1101
Implemented support for platform listing in dpctl.__main__ script #1014
Improved performance of dpctl.tensor.asnumpy #1026
Added UsmNDArray_Make* C-API for constructing dpctl.tensor.usm_ndarray from native allocations #1050, #1067
Added support for dpctl.SyclDevice.native_vector_width_* device descriptors #1075
Added dpctl::tensor::usm_ndarray::get_shape_vector and dpctl::tensor::usm_ndarray::get_strides_vector methods #1090

Changed

Removed dpctl.select_host_device, dpctl.has_host_device, dpctl.SyclDevice.is_host, and dpctl.SyclDevice.has_aspect_host since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028
usm_ndarrayis made writable by default #1012, and writable flag is now checked by __setitem__.
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
The dpctl.tensor.Device class supports print_device_info method #1029, equality comparison, and hashing #1048
Updated version of pybind11 used to 2.10.2 #1031
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
Changed return type of DCPCTLUSM_GetPointerType function in SyclInterface library #1061, #1065
Updated supported version of DLPack to 0.8 #1073
Implemented queue cache per context/device pair and deployed it in dpctl.memory, dpctl.tensor.from_dlpack and dpctl.tensor array creation functions #1076, #1079
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074, #1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093

Fixed

Fixed error gh-998 in forming Python exception, #999.
A small memory leak fixed, #1000
Improved dtype support in dpctl.tensor.full, PR #1002
Added missing header file #1008 fixing gh-1007
Fixed a typo in device-specific dtype mapping #1015
Fixed default device integer type to align with NumPy's behavior on Windows #1017
Fixed unexpected overflow in dpctl.tensor.linspace when one of the parameters is the largest floating point value #1034
Constructors dpctl.tensor.empty, dpctl.tensor.zeros, and usm_ndarray constructor itself no longer allow to create array with data-types not supported by targeted device #1042
Fixed parameter validation in dpctl.SyclQueue constructor #1052
Fixed usm_type of the resulting array in dpctl.tensor.tril and dpctl.tensor.triu functions #1062
Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
Fixed issue with empty argument of dpctl.tensor.meshgrid function #1080
Fixed linking problem on Windows enabling dpctl to be functional on Windows for devices not supporting some data types #1083

[0.14.0] - 11/18/2022

Added

Implemented dpctl.tensor.linspace function from array-API #875.
Implemented dpctl.tensor.eye function from array-API #896.
Implemented dpctl.tensor.tril and dpctl.tensor.triu functions from array-API #910.
Added data type objects to dpctl.tensor namespace, finfo, iinfo, can_cast, and result_type functions #913.
Implemented dpctl.tensor.meshgrid creation function from array-API #920.
Implemented convenience class to represent output of dpctl.tensor.usm_ndarray.flags property #921.
Added new device attributes and kernel's device-specific attributes #894.
Added dpctl.utils.onetrace_enabled context manager for targeted trace collection #903.
Added support for stream keyword in __dlpack__ method, enabling support for sending usm_ndarray using mpi4py #906.
dpctl.tensor.asarray can now transition data between incompatible devices, #951.
Introduced "syclinterface/dpctl_sycl_types_casters.hpp" header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960.
Added C-API to dpctl.program.SyclKernel and dpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970.
Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
Added experimental support for sharing data allocated on sub-devices via dlpack #984.
Added dpctl.SyclDevice.sub_group_sizes property to retrieve supported sizes of sub-group by the device #985.

Changed

Improved queue compatibility testing in dpctl.tensor's implementation module #900.
Added automatic measurement of array-API conformance test suite in CI #901.
Improved performance of array metadata transfer from host to device #912.
Used os.add_dll_directory on Windows to ensure that DPCTLSyclInterface library can be found #918.
Refactored dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlined dpctl::tensor::usm_ndarray class implementation.
Added debugging messaging in case when DPCTLDynamicLib::getSymbol encounters errors #956.
Updated code base according to changes in DPC++ compiler #952, #957, #958.
Changed dpctl to use pybind11 2.10.1 #967.
Extended dpctl.tensor.full to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.

Fixed

Improved SyclDevice constructor error message #893.
Fixed issue gh-890 about dpctl.tensor.reshape function #915.
Fixed unexpected UnboundLocalError exception in #922.
Fixed bugs in dpctl.tensor.arange in #945.
Fixed issue with type inferencing in dpctl.tensor.asarray in #949.
Added missing docstrings for dpctl.SyclDevice properties #964.

[0.13.0] - 07/28/2022

Added

Implemented and deployed dedicated kernels for copying with casting #781, used in __setitem__, implementaion of asarray, dpctl.tensor.copy functions.
Implemented dedicated copying kernel for dpctl.tensor.reshape function #810, added support for copy keyword #807.
Implemented dedicated kernel to copy with casting from numpy.ndarray into dpctl.tensor.usm_ndarray #817.
Implemented dpctl.tensor.permute_dims function from array-API #787.
Implemented dpctl.tensor.expand_dims function from array-API #788.
Implemented dpctl.tensor.squeeze function from array-API #790.
Implemented dpctl.tensor.broadcast_to function from array-API #791.
Implemented dpctl.tensor.broadcast_arrays function from array-API #798.
Implemented dpctl.tensor.flip function from array-API #801.
Implemented dpctl.tensor.usm_ndarray.mT property per array-API #805.
Implemented dpctl.tensor.roll function from array-API #809.
Implemented dpctl.tensor.arange function from array-API #814.
Implemented dpctl.tensor.zeros function from array-API #816.
Implemented dpctl.tensor.zeros function from array-API #816.
Implemented dpctl.tensor.ones, dpctl.tensor.full, dpctl.tensor.empty_like, dpctl.tensor.zeros_like, dpctl.tensor.ones_like, dpctl.tensor.full_like functions from array-API #822.
Implemented DPCTLQueue_Memset function in SyclInterface library #812, and exposed it for dpctl.memory.MemoryUSM* classes #815.
Implemented dpctl.utils.get_coerced_usm_type to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797.
Added dpctl.SyclDevice.profiling_timer_resolution property #825.
Added dpctl.SyclDevice.platform and dpctl.SyclPlatform.default_context properties #827.
Provided pybind11 example for functions working on dpctl.tensor.usm_ndarray container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838.
Wrote manual page about working with dpctl.SyclQueue #829.
Added cmake scripts to dpctl package layout and a way to query the location #853.
Implemented dpctl.tensor.concat function from array-API #867.
Implemented dpctl.tensor.stack function from array-API #872.

Changed

Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
Exported keep_args_alive utility in dpctl4pybind11.hpp header #820. The utility uses sycl::handler::host_task to keep given Python arguments alive until eac sycl::event from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument.
Changed the size of struct underlying dpctl.SyclEvent to avoid storing Python object previously used to keep kernel arguments scheduled with dpctl.SyclQueue.submit #823.
Fixed docstring for dpctl.SyclTimer #824.
Changed type of exceptions raised on failure to create dpctl.SyclDevice from ValueError to dpctl.SyclDeviceCreationError #826.
Improved performance of pybind11 type casters #837.
Changed implementation of dpctl.SyclProgram from using deprecated sycl::program to sycl::kernel_bundle #845.
Removed deprecated device aspects, added new supported aspects #844.
Updated vendored dlpack.h to version 0.7 #847.

Fixed

Fixed dpctl.lsplatform() to work correctly when used from within Jupyter notebook #800.
Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
Fixed filter selector string produced in outputs of dpctl.lsplatform(verbosity=2) and dpctl.SyclDevice.print_device_info #866.
Fixed issue with slicing reported in gh-870 in #871.

[0.12.0] - 03/01/2022

Added

Properties added to MemoryUSM* objects. #647
Added dpctl.tensor.asarray #646
Implemented DLPack support for usm_ndarray #682
Exported dpctl.tensor.Device class #708 #718
Added testing of examples in CI #722
Added user manuals to dpctl documentation #712 #773

Changed

Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666 #768
Added workflow to publish rendered documentation on PRs #673 #753 #726
Synchronization functions and USM allocation functions release GIL #736 #766
dpctl.SyclEvent destructor is made non-blocking #751

Fixed

Fixed for issue in code of dpctl.tensor.usm_ndarray.T #653
Fixed issue with dpctl.tensor.reshape's affect on contiguity flags of usm_ndarray #695
Fixed handling of empty list by dpctl.tensor.asarray #694
Fixed type inference with array of empty arrays in dpctl.tensor.asarray #697
Fixed issue gh-698 with dpctl.tensr.asarray #709
Fixed performance of item assignment from numpy array #724
DPCTLDeviceMgr_GetNumDevices should not operate on rejected devices #737
Fixed issue gh-729 for dpctl.tensor.reshape applied to 0-element usm_ndarray #756
Fixed issue gh-728 with dpctl.tensor.astype #757
Fixed type in memory overlapping test #770
Fixed issue with operator.pos for dpctl.tensor.usm_ndarray #783
Only call PyThread_Ensure from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721

Full Changelog: https://github.com/IntelPython/dpctl/compare/0.11.4...0.12.0

[0.11.4] - 12/03/2021

Fixed

Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in IntelPython#705

[0.11.3] - 11/30/2021

Fixed

Set the last byte in allocated char array to zero [cherry picked from #650] #699

[0.11.2] - 11/29/2021

Added

Extending dpctl.device_context with nested contexts #678

Fixed

Fixed issue #649 about incorrect behavior of .T method on sliced arrays #653

[0.11.1] - 11/10/2021

Changed

Replaced uses of clang compiler with icx executable #665

[0.11.0] - 11/01/2021

Added

Use Python 3.9 in public CI #599
Add a new C API utility function (DPCTLDeviceMgr_GetDeviceInfoStr) to return the device info as a C string object #620
New Github workflow to build dpclt with nightly Intel llvm/sycl + drivers #621
Always raise SubDeviceCreationError even when sub-device counts are zero #622
Updated OpenCL interoprability code to fix build with Intel llvm/sycl bundle #625
Enabled use of default platform context extension in SYCL compilers that implement this extension #627
Implemented dpctl.utils.get_execution_queue(queue_seq) utility to help implementing "compute-follows data" convention for offload target #632 #631

Changed

Replaced host_device device type with host in tests #616
Rework the logic in dpctl.memory's copy_from_device method to work correctly with host device #618
Use dpctl.device_type.host instead of dpctl.device_type.host_device #626
Reinstate deprecated sycl::program and that was conditionally removed from open source DPC++ toolchain #633
Use LoadLibraryExA instead of LoadLibraryA to mitigate a possible DLL injection issue when we load the Level zero DLL on windows #636
Github coverage workflow is changed to use oneAPI 2021.3 instead of latest to work around broken profiling instrumentation in DPC++ 2021.4 #614
Update build dependencies for NumPy #641
Use "readelf" on SYCL's pi_level_zero library to find out and use the exact name of ze_loader.so in SyclInterface library #617

Removed

Removed use of DPC++ features deprecated in 2021.4 and open source Intel llvm/sycl compiler #603

Fixed

Suppress errant CMake log #610
Fixes to compile dpctl using Intel llvm/sycl compiler #603
Fix for the hang is to avoid passing nullptr argument to sycl::queue::prefetch #612
Fixed the logic to return device count #623
Enabled building of C extensions with dpctl by including header defining bool type for C compilers #604

[0.10.0] - 09/28/2021

Added

Added methods bool, float, int, index, and complex to usm_ndarray #578
Added data-API required special methods to usm_ndarray class, as well as to_numpy/from_numpy, astype, reshape functions #586
Added methods to query dpctl.SyclDevice for size of global/local memory #589
Added tests for constructors with invalid capsules #577
Improved test coverage of dpctl.SyclQueue implementation #574
Added a test to exercise API exported function (get_event_ref). #570
Expanded tests in test_sycl_context to improve coverage #571
Tweaks to test_sycl_event to improve coverage #567
Improved coverage of dpctl.init file and other service functions #563
Added test for repr and test for default argument to constructor #565
Added some tests to involve capsule #564
Added workflow for Public CI on Windows #534
DPCTLQueue_Memcpy, _Prefetch, _Memadvise become asynchronous #557
Added device aspect selector, dpctl.select_device_with_aspects #558
Added test based on example from #583

Changed

Parametrized tests for executing OpenCL kernels compiled from source in types of arguments #581
Temporary disabled self-hosted CI jobs runner #559
Changed static method SyclQueue._create_from_context_and_device #579
Transitioned all Python API to use pytest over unittest, improved coverage in dpctl/memory #575
Changed dpctl.SyclEvent.profiling_info_submit from method to a property #573
Simplified arg parsing in SyclDevice constructor #572
Used tag with alignment attribute set in README #562
Moved sycl timer into dpctl.SyclTimer #555
Used clang-format off, clang-format on to avoid include reordering in pybind11 example #588

Fixed

Implemented a workaround for running conda-build using Klocwork #566
Separated pipelines for Linux and Windows #582
Fixed inconsistency in __sycl_usm_array_interface__ of usm_ndarray instance #584
Fixed memory leak: Capsule deleters now free resources for renamed capsules too #568
Fixed version test to allow for semantic versioning #569
Improved coverage of _types.pxi #556
Fixed UnboundLocalError when default queue could not be created #554

[0.9.0] - 08/25/2021

Added

Improvements to logic for working with custom DPC++ toolchain #481
Add SyclContext unit test cases #488
Consolidate configurations of tools that support PEP 518 into pyproject.toml #486
Added C-API hash function, used them in Python interface #491
Add missing extra checks to ensure unwrapped pointer is not Null
Add error messages to L0 program creation routine
Improve test coverage for dpctl_sycl_queue_interface #492
Use pytest.warns in test_lsplatform3 #495
Added test class to test DRef=nullptr case #496
Extend parameterized test in test_sycl_queue_interface #497
Use Memcpy, memadvise in tests
Expanded types tests by TestQueueSubmitRange
Added a test that retrieved DPCPP compiled kernel and submits them via DPCTLQueue_SubmitRange #499 , DPCTLEvent_GetCommandExecutionStatus #516, , DPCTLEvent_GetWaitList #510 functions
Propagate compile flags #512
Add conda package CI pipeline on GitHub Actions #515
Run tests on GPU #518
Add 3 wrapper func for event::get_profiling_info #519
Changes to build_backend.py to enable sycl-compiler-prefix on Windows
dtype keyword of usm_ndarray now supports np.double and other types #526
Implemented DPCTLQueue_SubmitBarrier, DPCTLQueue_SubmitBarrierForEvents, SyclQueue.submit_barrier #524
Added C-API DPCTLQueue_HasEnableProfiling
Added Python API SyclQueue.has_enable_profiling
Use public for data owning class definitions
Queue has enable profiling #531
Use public for data owning class definitions #533
Added logic to verify that all bits of property integer were recognized and used #494
Added support for some properties/methods of underluing device
A test for properties, method of q mirroring that of device
Conda build scripts should build wheels in the same setup invocation as install #538
Added install_requires keyword to setup call
Added requirements.txt files in dpctl/ and in dpctl/docs #540
Improved C-API for dpctl Cython classes, added example of using them in Pybind11 extension. #550
dpctl.SyclEvent acquired ability to get command status and get profiling information. #553

Changed

Moved DPCLSyclInterface library from MANIFEST.in #482
Refactored tests
Use dpcpp compiler package for Linux #514
Update conda-package.yml
Static methods _init_helper made into functions and removed from PXD files #532

Removed

Remove imports from future #485

Fixed

Fix sub devices #479
Fix addressof_ref function in SyclContext #488
Follow DPCTLDevice_CreateFromSelector which passes the check #487
Fix a typo in the pytest configuration #490
Fixed dbg_build.sh script for Linux to use L0
Reuse IntelSycl_LIBRARY_DIR variable in cmake
CXX, dpcpp used on Windows too
Update conda-recipe/bld.bat
Change to SyclQueue.repr to reflect properties #531
Static methods _init_helper made into functions and removed from PXD files #532
Fixed typo in pip installation instruction #536
Fixed dpctl_config.h, added dpctl_service.h, .cpp #539
Fixed __sycl_usm_array_interface__ output for 0d arrays #547

[0.8.0] - 05/26/2021

Added

Implemented support for constructing MemoryUSM* from object with sycl_usm_array_interface when array-info is not contiguous #400
Print the backend as part of SyclDevice.print_device_info function #409
Added dpctl/tensor/_usmarray submodule #427
Added arg checking to functions in dpctl_sycl_usm_interface.cpp #430
A static method of _Memory to create from external allocation #430
Added usm_ndarray accessors #435
Added Device class representing Data-API notion of device #440
Added free Python function as_usm_memory(obj) #443 and associated unit tests #449
Dependency for numpy 1.17 #445
Add a flag to make doxygen HTML generation optional #450
Added a feature to get the filter string for a device from Python using the new dpctl.SyclDevice.get_filter_string method. Also added the corresponding DPCTLDeviceMgr_GetPositionInDevices(DRef, device_mask) C API function #453
New options to setup.py to specify which dpcpp compiler to use, if L0 program creation is to be supported, and to generate code coverage #426
Github action to check Python code quality #422
Github action to auto-publish Sphinx docs for master #446
Github action to generate coverage report and publish to coveralls.io #459

Changed

Rename dpctl.dptensor to dpctl.tensor #407
Changed repr for Memory objects #442
Used dpctl.SyclQueue instead of manager and get current queue in tests for SyclProgram #448

Fixed

Issue #189 dpctl.memory.MemoryUSMShared(np.int64(16)) should work #392
Use size_t instead of Py_ssize_t to fit device USM pointer #405
Various code quality issues identified by flake8 (#417, #419, #420, #422)
Fixed issues in slicing and array construction #441
Fixed an issue #447 where dpctl.get_devices does not return devices in the same order as sycl::device::get_devices #451
L0 program creation support on Windows #319

Removed

Removing public keyword to get_current_queue Cython declaration #437

[0.7.0] - 05/03/2021

Added

Complete support for sycl::ONEAPI::filter_selector in dpctl. , and sycl::platform #298 creation using opaque pointers.
A DPCTLDeviceMgr module in C API that caches a default context for root devices #277.
DPCTLSyclBackendType and DPCTLSyclDeviceType have a new member ALL #287.
C API now provides helper functions to convert between dpctl and SYCL enum values #296.
Macros to help create opaque vector classes for opaque SYCL types #297. , SyclContext #334, SyclPlatform (#336, #298), SyclQueue #323 have constructors that recognize filter selectors and closely follow DPC++ interface.
Add API to get a PyCapsule from SyclQueue, SyclContext instances #350.
Added get_queue_ref_from_ptr_and_syclobj(ptr, syclobj) that creates DPCTLSyclQueueRef from a USM pointer and Python object syclobj from __sycl_usm_array_interface__ #380.
Support for SYCL sub-devices, including sub-device creation, queue, and context creation using sub-devices #343.
SyclDevice.parent_device property to indicate if an instance has a parent device #366.
Several new getter functions for device info descriptors to device interface (#300, #335, #318, #315, #308).
Support for SYCL device aspects #307.
Properties for every sycl::device info and aspect that we support in SyclDevice #324.
Support handling async errors inside SylQueue instances #346.
get_backend, get_platform, get_device_type to Python SyclDevice class #300
A _sycl_device_factory.pyx module providing SyclDevice constructors using standard sycl::device_selector classes (previously in _sycl_device.pyx) and a new get_devices #277 function to enumerate all devices.
_sycl_device_factory.pyx implements get_num_devices and has_*_device(s) functions #320.
Enable Python coverage in CI for Linux #369.
Use public keyword in _sycl_*.pxd to generate header files allowing non-Cython centric native extensions to work with dpctl's Python objects #218.
Documentation improvements #341.

Changed

Rename dpCtl to dpctl in all comments, license headers, and docs. #342
dpctl.memory.MemoryUSM* constructors now use dpctl.SyclQueue() instead of dpctl.get_current_queue() when the queue keyword argument is None (default) #382.
dpctl.set_default_queue has been renamed to dpctl.set_global_queue() #323.
Changed dpctl.dump to dpctl.lsplatform #336.
Various SyclDevice methods related to querying sycl::info::device were converted to properties #324.
Various C API functions names were changed.

Fixed

Possible crashes when a SYCL platform is not available #349.
Fix tests which fail if GPU is not available (only CPU is available) #359.
Fix breaking C API tests #358.
Bandit warning about "subprocess.check_call(shell=True)" for Windows #306.

Removed

Removed get_num_platforms, has_cpu_queues, has_gpu_queues, get_num_queues, has_sycl_platforms #320.

[0.6.1] - 2021-03-01

Fixed

Do not use POP_FRONT in FindDPCPP.cmake so that we can use a cmake version older that 3.15.

[0.6.0] - 2021-03-01

Added

Documentation improvements.
Cmake improvements and Coverage for C API, Cython and Python.
Added support for Level Zero devices and queues.
Added support for SYCL standard device_selector classes.
SyclDevice instances can now be constructed using filter selector strings.
Code of conduct.
Building wheels.
Queue manager improvements.
Adding __array_function__ so that Numpy calls with dparrays work.
Using clang-format for C/C++ code formatting.
Using pytest for running tests.
Add python and cython file coverage.
Using Bandit for finding common security issues in Python code.
Add instructions about file headers formats.

Changed

Changed compiler name usage from clang++ to dpcpp.
Reformat backend.pxd to be closer to black style.

Fixed

Remove cython from install_requires. It allows use dpCtl in numba extensions.
Incorrect import in example.
Consistency of file headers.
Klocwork issues.

[0.5.0] - 2020-12-17

Added

_Memory.get_pointer_type static method which returns kind of USM pointer.
Utility functions to transform string to device type and back.
New dpctl.dptensor.numpy_usm_shared module containing USM array. USM array extends NumPy ndarray.
A lot of new examples. Including examples of building Cython extensions with DPC++ compiler that interoperate with dpCtl.
Mechanism for registering a callback function to look and see if the object supports USM.

Changed

setup.py builds C++ backend for develop and install commands.
Building wheels.
Use DPC++ runtime from package dpcpp_cpp_rt.
All usage of DPPL in C-API functions was changed to DPCTL, e.g., DPPLQueueMgr_GetCurrentQueue to DPCTLQueueMgr_GetCurrentQueue.
Renamed the C-API directory is now called dpctl-capi instead of backends.
Refactoring the dpctl-capi functions to prepare for changes to add Level Zero program creation.
SyclProgram and SyclKernel classes were moved out of dpctl into the dpctl.program sub-module.

Fixed

Klockwork static code analysis warnings.

[0.4.0] - 2020-11-04

Added

Device descriptors "max_compute_units", "max_work_item_dimensions", "max_work_item_sizes", "max_work_group_size", "max_num_sub_groups" and "aspects" for int64 atomics inside dpctl C API and inside the dpctl.SyclDevice class.
MemoryUSM* classes moved to dpctl.memory module, added support for aligned allocation, added support for prefetch and mem_advise (sychronous) methods, implemented copy_to_host, copy_from_host and copy_from_device methods, pickling support, and zero-copy interoperability with Python objects which implement __sycl_usm_array_inerface__ protocol.
Helper scripts to generate API documentation for both C API and Python.

Fixed

Compiler warnings when building libDPPLSyclInterface and the Cython extensions.

Removed

The Legacy OpenCL interface.

[0.3.8] - 2020-10-08

Changed

How the initial active queue is populated inside DPPLQueueMgr.
dpctl.SyclQueueManager only reports the number of non-host platform.
dpctl.SyclQueueManager now raises an exception if DPCTL C API returns a nullptr instead of a valid Sycl queue.

Fixed

Several crashes in cases where an OpenCL or Level Zero platform is not available.
Fix failing platform test case. #116
Properly skip tests when no OpenCL devices are available.
Add skip tests to test_sycl_usm.py
Fix Gtests configuration.

[0.3.7] - 2020-10-08

Fixed

A crash on Windows due a Level Zero driver problem. Each device was getting enumerated twice. To handle the issue, we added a temporary fix to use only first device for each device type and backend #118.

[0.3.6] - 2020-10-06

Added

Changelog was added for dpctl.

Fixed

Windows build was fixed.

[0.3.5] - 2020-10-06

Added

Add a helper function to all Python SyclXXX classes to get the address of the base C API pointer as a long.

Changed

Rename PyDPPL to dpCtl in comments (function name renaming to come later)

Fixed

Fix bugs highlighted by tools.
Various code clean ups.

[0.3.4] - 2020-10-05

Added

Dump functions were enhanced to print back-end information.
dpctl gained support for unint_8 and unsigned long data types.
oneAPI Beta 10 tool chain support was added.

Changed

dpctl is now aware of DPC++ Sycl PI back-ends. The functionality is now exposed via the context interface.
C API's queue manager was refactored to require back-end.
dpct's device_context now requires back-end, device-type, and device-id to be provided in a string format, e.g. opencl:gpu:0.

Fixed

Fixed some important bugs found by static analysis.

[0.3.3] - 2020-10-02

Added

Add dpctl.get_curent_device_type().

[0.3.2] - 2020-09-29

Changed

Set _cpu_device and _gpu_device to None by default.

[0.3.1] - 2020-09-28

Added

Add get include and include headers.

Changed

DPPL shared objects are installed into dpctl.

Fixed

Refactor unit tests.

[0.3.0] - 2020-09-23

Added

Adds C and Cython API for portions of Sycl queue, device, context interfaces.
Implementing USM memory management.

Changed

Refactored API to expose a minimal sycl::queue interface.
Modify cpu_queues, gpu_queues and active_queues to functions.
Change static vectors to static pointers to verctors. It disables call for destructors. Destructors are also call in undefined order.
Rename package PyDPPL to dpCtl.
Use dpcpp.exe on Windows instead of dpcpp-cl.exe deleted in oneAPI beta08.

Fixed

Correct use ERRORLEVEL in conda scripts for Windows.
Fix using dppl.has_sycl_platforms() and dppl.has_gpu_queues() functions in skipIf

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[0.18.0] - XXX. XX, 2024

Added

Change

Fixed

[0.17.0] - May. XX, 2024

Added

Changed

Fixed

[0.16.1] - Apr. 10, 2024

Changed

Fixed

[0.16.0] - Feb. 16, 2024

[0.15.1] - Feb. 10, 2024

Added

Changed

Fixed

[0.15.0] - Sep. 29, 2023

Added

Changed

Fixed

[0.14.5] - 07/17/2023

Added

Changed

[0.14.4] - 06/14/2023

Added

Changed

Fixed

[0.14.3] - 06/13/2023

Added

Changed

Fixed

[0.14.2] - 03/07/2023

Fixed

[0.14.1] - 03/06/2023

Added

Changed

Fixed

[0.14.0] - 11/18/2022

Added

Changed

Fixed

[0.13.0] - 07/28/2022

Added

Changed

Fixed

[0.12.0] - 03/01/2022

Added

Changed

Fixed

[0.11.4] - 12/03/2021

Fixed

[0.11.3] - 11/30/2021

Fixed

[0.11.2] - 11/29/2021

Added

Fixed

[0.11.1] - 11/10/2021

Changed

[0.11.0] - 11/01/2021

Added

Changed

Removed

Fixed

[0.10.0] - 09/28/2021

Added

Changed

Fixed

[0.9.0] - 08/25/2021

Added

Changed

Removed

Fixed

[0.8.0] - 05/26/2021