[DML] Optimizations to improve DML memory utilization #221

bbernhar · 2022-03-04T23:10:50Z

The following (flame graph) is based on traces of {MobileNetv2, Resnet50v2, DirectMLSuperResolution}.

Issue 1. A large tensor resource being created takes 2x the wall-time due a re-attempt (intel/GPGMM#180).

First allocation attempts sub-allocation (blue), followed by another direct allocation (red-ish), about 2x slower. Should check the tensor size ahead of time and directly allocate it rather then attempt to sub-allocate then fallback.

Issue 2. Memory containing resources being created on-demand (intel/GPGMM#110).

Biggest cost of tensor creation is creating the heap. Should pre-fetch the next heap so a subsequent request is full-filled before being requested.

Issue 3. Tensors under utilizing memory.

Tensor allocations are 1MB whereas memory they occupy is 4MB (or 4x waste). My goal is to move away from having fixed/default memory sizes like we did in Dawn (now called PreferredResourceHeapSize) and to grow (or increase) them dynamically instead (intel/GPGMM#182).

FYI, @fujunwei, @huningxin @RafaelCintron

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DML] Optimizations to improve DML memory utilization #221

[DML] Optimizations to improve DML memory utilization #221

bbernhar commented Mar 4, 2022

[DML] Optimizations to improve DML memory utilization #221

[DML] Optimizations to improve DML memory utilization #221

Comments

bbernhar commented Mar 4, 2022