Is it better to have a module-wise `device` tag for each multi-device suported class? #5231

maki49 · 2024-10-14T08:06:05Z

Currently, device is an input parameter as global variable. When device=gpu, all the calculation will be done on GPUs.
However, "the machine has GPUs" does not need "every module calculates on GPUs".
For example, @dzzz2001 find that putting FFT module on GPU costs a lot of memory while having little accelerate effect when using LCAO basis.

A possible solution is to have a global list telling the device of each module with the class name as key-like template parameter, here's a demo:

#include <map>
#include <string>
#include <iostream>
class FFT;
class HSolver;

template<typename Tclass>
std::string modulewise_device(const std::string global_dev, const std::string basis) { return global_dev; }
template<> std::string modulewise_device<FFT>(const std::string global_dev, const std::string basis) { return basis == "pw" ? global_dev : "cpu"; }

int main()
{
    for (auto device : { "gpu", "cpu" })
    {
        std::cout << "global device=" << device << std::endl;
        for (auto basis : { "pw", "lcao" })
        {
            std::cout << "FFT[" << basis << "]:" << modulewise_device<FFT>(device, basis) << std::endl;
        }
    }
    return 0;
}

global device=gpu
FFT[pw]:gpu
FFT[lcao]:cpu
global device=cpu
FFT[pw]:cpu
FFT[lcao]:cpu

The text was updated successfully, but these errors were encountered:

dzzz2001 · 2024-10-14T08:11:34Z

The issue of allocating a large amount of irrelevant GPU memory under the LCAO basis set due to setting the device to GPU can be referenced in issue #4442.

Cstandardlib · 2024-10-15T08:09:22Z

The kernel op currently used by hsolver is undergoing a round of refactoring.
It also faces the same issues of supporting heterogeneous computing across multiple devices and the actual devices in use.
The interface is a crucial matter, and we should standardize this in all our functionalities that utilize multiple devices such as FFT and kernel operators.
I suggest we have a further discussion to align on a unified standard to ensure scalability and facilitate portability across various devices, including supercomputers.
@Critsium-xy

Critsium-xy · 2024-10-15T08:52:14Z

@mohanchen provided an idea that directly implement heterogeneous computing in blas_connector. blas_connector.h now includes linking cblas kernels and encapsuling blas kernels. He want to sparate the two part. For example we can leave the declaration of BlasConnector::gemm in blas_connector.h, and implement this function in blas_connector.cpp. He also added that to make BlasConnector::gemm able to support different platforms, we can add a parameter in its parameters as a device flag, which default value represents CPU, and using this parameter to decide which kernel exactly to use (Cublas kernel? Cblas kernel? or hipblas kernel?). This may have performance cost but I may not so huge (I havent tested it yet). After finishing this you can directly throw away other blas encapsultations such as ops in module_hsolver or in @denghuilu 's tensor. But in fact I dont know whether it is exactly a good idea.

maki49 changed the title ~~Is it better to have device tag in each multi-device suported module?~~ Is it better to have a module-wise device tag for each multi-device suported class? Oct 14, 2024

mohanchen added the GPU & DCU & HPC GPU and DCU and HPC related any issues label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it better to have a module-wise `device` tag for each multi-device suported class? #5231

Is it better to have a module-wise `device` tag for each multi-device suported class? #5231

maki49 commented Oct 14, 2024 •

edited

Loading

dzzz2001 commented Oct 14, 2024

Cstandardlib commented Oct 15, 2024

Critsium-xy commented Oct 15, 2024

Is it better to have a module-wise device tag for each multi-device suported class? #5231

Is it better to have a module-wise device tag for each multi-device suported class? #5231

Comments

maki49 commented Oct 14, 2024 • edited Loading

dzzz2001 commented Oct 14, 2024

Cstandardlib commented Oct 15, 2024

Critsium-xy commented Oct 15, 2024

Is it better to have a module-wise `device` tag for each multi-device suported class? #5231

Is it better to have a module-wise `device` tag for each multi-device suported class? #5231

maki49 commented Oct 14, 2024 •

edited

Loading