Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwloc_get_proc_cpubind not associated on windows sever 2022 with two processors groups #697

Open
VincentDarrigrand opened this issue Nov 15, 2024 · 1 comment

Comments

@VincentDarrigrand
Copy link

Hello,

I am facing an issue on a call to hwloc_get_proc_cpubind on a windows sever that has two processor groups.

Indeed, the function topology->binding_hooks.get_proc_cpubind is a null ptr since it does get associated during the call of hwloc_set_windows_hooks.

Indeed we can see that if nr_processor_groups is not 1, the hook is not set:
https://github.com/open-mpi/hwloc/blob/0474e06f6cc7c9020517fee223b1a73c1e6b2af4/hwloc/topology-windows.c#L1330C3-L1339C4

if (nr_processor_groups == 1) {
    hooks->set_proc_cpubind = hwloc_win_set_proc_cpubind;
    hooks->get_proc_cpubind = hwloc_win_get_proc_cpubind;
    hooks->set_thisproc_cpubind = hwloc_win_set_thisproc_cpubind;
    hooks->get_thisproc_cpubind = hwloc_win_get_thisproc_cpubind;
    hooks->set_proc_membind = hwloc_win_set_proc_membind;
    hooks->get_proc_membind = hwloc_win_get_proc_membind;
    hooks->set_thisproc_membind = hwloc_win_set_thisproc_membind;
    hooks->get_thisproc_membind = hwloc_win_get_thisproc_membind;
  }

It came as a surprise since the rest of this function seems to be equipped for such cases (as mentioned in the release notes on version 3.7.0, the support for such machines has been implemented)

Is this fixable? Or is it due to some hardware/software limitations?

Thanks in advance,
Best regards.

Vincent Darrigrand

@bgoglin
Copy link
Contributor

bgoglin commented Nov 15, 2024

Hello
Process binding on Windows is limited by its own API. It was designed for machines with single processor groups and never extended (while the thread binding API was extended). Basically you only have access to the first processor group. There are some other APIs (job objects and cpusets iirc) but each of these also has different limitations, especially when used inside an intermediate library like hwloc which doesn't know if another software piece used one of these APIs earlier. I asked Microsoft multiple times about this but I don't expect any good solution anymore.
IIRC hwloc 2.7.0 improvements on Windows were on the topology discovery side (being able to query objects that are larger than a single processor group).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants