You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been recently facing some power issues in some machines with 8 Nvidia RTX 3090.
While performing the Tensorflow unit tests (when compiling tensorflow by hand) leads me consistently to kernel panics on this machine (because the power consumption is so huge, one of the gpu dies and the kernel doesn't know what to do).
I just ran gpu-burn test for a couple hours consistently, with and without doubles, with and without tensor cores. The machine runs just fine. I run tensorflow's tests again, the machine has a higher power consumption and goes dead again.
Other than that, great test for the throughput and temperature. Thanks!
The text was updated successfully, but these errors were encountered:
Yep -- I have found with P40 and P100 I'm not able to reach their rated power consumption using this tool. K80, M10, M40, M60 are all able to get to their TDP.
I have been recently facing some power issues in some machines with 8 Nvidia RTX 3090.
While performing the Tensorflow unit tests (when compiling tensorflow by hand) leads me consistently to kernel panics on this machine (because the power consumption is so huge, one of the gpu dies and the kernel doesn't know what to do).
I just ran gpu-burn test for a couple hours consistently, with and without doubles, with and without tensor cores. The machine runs just fine. I run tensorflow's tests again, the machine has a higher power consumption and goes dead again.
Other than that, great test for the throughput and temperature. Thanks!
The text was updated successfully, but these errors were encountered: