Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it better to have a module-wise device tag for each multi-device suported class? #5231

Open
maki49 opened this issue Oct 14, 2024 · 3 comments
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues

Comments

@maki49
Copy link
Collaborator

maki49 commented Oct 14, 2024

Currently, device is an input parameter as global variable. When device=gpu, all the calculation will be done on GPUs.
However, "the machine has GPUs" does not need "every module calculates on GPUs".
For example, @dzzz2001 find that putting FFT module on GPU costs a lot of memory while having little accelerate effect when using LCAO basis.

A possible solution is to have a global list telling the device of each module with the class name as key-like template parameter, here's a demo:

#include <map>
#include <string>
#include <iostream>
class FFT;
class HSolver;

template<typename Tclass>
std::string modulewise_device(const std::string global_dev, const std::string basis) { return global_dev; }
template<> std::string modulewise_device<FFT>(const std::string global_dev, const std::string basis) { return basis == "pw" ? global_dev : "cpu"; }

int main()
{
    for (auto device : { "gpu", "cpu" })
    {
        std::cout << "global device=" << device << std::endl;
        for (auto basis : { "pw", "lcao" })
        {
            std::cout << "FFT[" << basis << "]:" << modulewise_device<FFT>(device, basis) << std::endl;
        }
    }
    return 0;
}
global device=gpu
FFT[pw]:gpu
FFT[lcao]:cpu
global device=cpu
FFT[pw]:cpu
FFT[lcao]:cpu
@dzzz2001
Copy link
Collaborator

The issue of allocating a large amount of irrelevant GPU memory under the LCAO basis set due to setting the device to GPU can be referenced in issue #4442.

@maki49 maki49 changed the title Is it better to have device tag in each multi-device suported module? Is it better to have a module-wise device tag for each multi-device suported class? Oct 14, 2024
@Cstandardlib
Copy link
Collaborator

The kernel op currently used by hsolver is undergoing a round of refactoring.
It also faces the same issues of supporting heterogeneous computing across multiple devices and the actual devices in use.
The interface is a crucial matter, and we should standardize this in all our functionalities that utilize multiple devices such as FFT and kernel operators.
I suggest we have a further discussion to align on a unified standard to ensure scalability and facilitate portability across various devices, including supercomputers.
@Critsium-xy

@Critsium-xy
Copy link
Collaborator

@mohanchen provided an idea that directly implement heterogeneous computing in blas_connector. blas_connector.h now includes linking cblas kernels and encapsuling blas kernels. He want to sparate the two part. For example we can leave the declaration of BlasConnector::gemm in blas_connector.h, and implement this function in blas_connector.cpp. He also added that to make BlasConnector::gemm able to support different platforms, we can add a parameter in its parameters as a device flag, which default value represents CPU, and using this parameter to decide which kernel exactly to use (Cublas kernel? Cblas kernel? or hipblas kernel?). This may have performance cost but I may not so huge (I havent tested it yet). After finishing this you can directly throw away other blas encapsultations such as ops in module_hsolver or in @denghuilu 's tensor. But in fact I dont know whether it is exactly a good idea.

@mohanchen mohanchen added the GPU & DCU & HPC GPU and DCU and HPC related any issues label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues
Projects
None yet
Development

No branches or pull requests

5 participants