-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocminfo fails when amdgpu is built into the kernel #42
Comments
I'm not sure what your question is about. Do you want to find out whether amdgpu loaded successfully? Or are you asking whether rocminfo could use some alternative way to detect amdgpu? For your own trouble-shooting, check dmesg or "journalctl -k -b". |
Sorry I should have been more clear rocminfo doesn't get past https://github.com/RadeonOpenCompute/rocminfo/blob/10da0a71da6700c91e8cd204927cca0d9461b586/rocminfo.cc#L1041
|
rocminfo tests for a working kfd and installation very cautiously since many unrelated things can go wrong. If you've baked amdgpu into your kernel then you could skip the lsmod check since you know you attempted to load the driver. If for some reason the driver fails to initialize then /dev/kfd will not be present and the next check will detect that. If you're looking for assistance with your local, custom, build you could simply remove the lsmod check. On the other hand if you're looking to contribute a rocminfo PR then recording the lsmod failure and continuing on is probably the right direction. This way rocminfo can print all the possible causes for failure encountered along the way, yet remain quiet if hsa_init actually succeeds despite the failed checks. |
I'll think of another way of detecting amdgpu / amdkfd being available
Get's rocminfo working locally for now |
Is it enough to check that /sys/module/amdgpu exists? |
Same issue here.
I think it's a better way. On my machine, both loaded module and builtin module provide /sys/module/amdgpu (linux-5.13), while on another machine without amdgpu this path doesn't exists (5.10). |
I created a PR to implement this: #43 |
Closes: ROCm#42 Signed-off-by: YiyangWu <[email protected]>
fixed in 94b4b3f |
I think this commit does not fix the issue. The builtin amdgpu kernel module does not have
|
Closes: ROCm#42 Signed-off-by: YiyangWu <[email protected]>
Hm. I will need to compile the kernel with the driver built-in to test that this approach works. Thank you @littlewu2508 for being insistent :) |
@dmitrii-galantsev Any update on this issue? Thanks! |
So it works on gentoo because it includes this patch https://gitweb.gentoo.org/repo/gentoo.git/tree/dev-util/rocminfo/files/rocminfo-6.0.0-detect-builtin-amdgpu.patch #65 |
Closes: ROCm#42 Signed-off-by: YiyangWu <[email protected]>
@ppanchad-amd The issue persists #43 can be still applied and fix this issue, so I rebased it to amd-staging. Please reopen the PR and have a review. |
Apologies all. rocminfo fell off my radar. All effort on https://github.com/ROCm/amdsmi...
@FireBurn Thanks for that, I will try to apply it. |
@littlewu2508 I'm on a modern kernel (6.8.0-1-default+) and was able to rebuild it with amdgpu built in. However it crashed on boot, which is fine.. Even though it crashed, the On your system could you please check if amdgpu at all works? Specifically, is there |
@littlewu2508 |
Is there another way of detecting amdgpu is loaded then running lsmod?
The text was updated successfully, but these errors were encountered: