Skip to content

Releases: ROCm/Tensile

Tensile 4.31.0 for ROCm 5.0.1

16 Feb 22:17
Compare
Choose a tag to compare

Tensile code for ROCm 5.0.1 is unchanged from Tensile for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

Tensile 4.31.0 for ROCm 5.0.0

09 Feb 20:34
Compare
Choose a tag to compare

Added

  • DirectToLds support (x2/x4)
  • DirectToVgpr support for DGEMM
  • Parameter to control number of files kernels are merged into to better parallelize kernel compilation
  • FP16 alternate implementation for HPA HGEMM on aldebaran

Optimized

  • Add DGEMM NN custom kernel for HPL on aldebaran

Changed

  • Update tensile_client executable to std=c++14

Removed

  • Remove unused old Tensile client code

Fixed

  • Fix hipErrorInvalidHandle during benchmarks
  • Fix addrVgpr for atomic GSU
  • Fix for Python 3.8: add case for Constant nodeType
  • Fix architecture mapping for gfx1011 and gfx1012
  • Fix PrintSolutionRejectionReason verbiage in KernelWriter.py
  • Fix vgpr alignment problem when enabling flat buffer load

Tensile 4.30.0 for ROCm 4.5.2

10 Dec 19:20
bb19eec
Compare
Choose a tag to compare

Tensile code for ROCm 4.5.2 is unchanged from Tensile for ROCm 4.5.0. The library was rebuilt for the updated ROCm 4.5.2 stack.

Tensile 4.30.0 for ROCm 4.5.0

27 Oct 21:30
bb19eec
Compare
Choose a tag to compare

Added

  • Custom Kernel mechanism for adding custom assembly kernels to Tensile
  • New assertions for problems sizes, alpha/beta values, and C equals D
  • Support setting VectorWidth in M dimension in MFMA SourceSwap configuration

Fixed

  • Fix merge.py keeping duplicate solutions
  • Fix ScheduleIterAlg 2,3 cases for aldebaran

Tensile 4.28.0 for ROCm 4.3.1

27 Aug 17:41
9cbabb0
Compare
Choose a tag to compare

No changes made for ROCm 4.3.1.

Tensile 4.28.0 for ROCm 4.3.0

30 Jul 22:53
9cbabb0
Compare
Choose a tag to compare

Added

  • TensileRetuneLibrary for updating existing library logic files
  • Support GFX1030
  • Support NHWC

Fixed

  • TensileCreateLibrary crash with relative output and --merge-files

Changed

  • Change cmake_minimum_required to VERSION 3.13

Tensile-4.27.0 for ROCm 4.2.0

10 May 23:17
3438af2
Compare
Choose a tag to compare

Added

  • Benchmarking and library support for CU efficiency vs. overall speed
  • support general batch GEMM
  • Support offset for each input/output buffer in Tensile
  • support support ldc != ldd for all GEMM kernel

Optimizations

  • Refactor ConvolutionVsContraction

Fixed

  • Fixed MasterSolutionLibrary having duplicated hardware rows
  • channel stride is incorrect when converting conv problem into tensor contraction problem]

Tensile-4.26.0 for ROCm 4.1.0

23 Mar 01:18
47dd2c4
Compare
Choose a tag to compare

Added

  • Make messagepack python dependency optional
  • TensileCreateLibraryFiles: auto create target for build time lib generation
  • Tensile cluster tuning tool
  • Framework for filtering solutions
  • Workflow for manually editing Kernels
  • Tuning client design doc
  • MatrixInstruction for general int8
  • Tensile integration test for TensileCreateLibrary
  • Trig float and random narrow init patterns for new client
  • Summation dimension mirroring (contributed by timlathy & Slimakanzer)
  • ROCm 4.1 TargetID support in Tensile; source kernels force xnack=OFF
  • Tensile/Utilities/merge.py revamp for merging logic yaml files
    • now merge.py requires python3
    • add -v verbosity levels (up to 2)
    • add --notrim to retain leading dimensions in sizes
  • New BoundsCheck design: Access guard page will trigger memory fault
  • Solution fitness metric
  • Auto-tuning documentation and build script improvements
  • Support for High Precision Accumulate FP16/BF16 In FP32 Out
  • CHANGELOG.md

Optimizations

  • Refine PersistentKernel: support PKn1, EPS, optimize LW-vmcnt and sMagicDiv2

Fixed

  • targets to clang-offload-bundler updated to use hipv4 prefix when appropriate
  • Fix bugs of tail-loop branch label, and LR addr restore
  • locateExe in Tensile/Common.py looks in defaultPath first
  • Honor $ENV{ROCM_PATH} to support relocatable ROCm location

Tensile 4.24.0 for rocm 3.10.0

18 Dec 15:28
ab44bf4
Compare
Choose a tag to compare

New Features

  • No new features

Known Issues

  • None

Tensile 4.24.0 for rocm 3.10.0

30 Nov 17:05
ab44bf4
Compare
Choose a tag to compare

Known Issues

  • None