From 0018650c31110005cd3323e1d911129546d0ce2f Mon Sep 17 00:00:00 2001 From: Michael Godfrey Date: Tue, 11 Jun 2024 14:14:44 +0000 Subject: [PATCH 1/8] updated 'deploy-gpu-node-pool.md' with instuctions for using Headless OS --- AKS-Hybrid/deploy-gpu-node-pool.md | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index a5f86e7637..59f7e0c887 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -62,11 +62,36 @@ Install the Azure Stack HCI, version 23H2 operating system locally on each serve ### Step 2: uninstall the NVIDIA host driver -On each host machine, navigate to **Control Panel > Add or Remove programs**, uninstall the NVIDIA host driver, then reboot the machine. After the machine reboots, confirm that the driver was successfully uninstalled. Open an elevated PowerShell terminal and run the following command: +Open a remote powershell session to each host, or run the following local in Powershell.You will need to start by uninstalling the NVIDIA host driver, then reboot the machine. After the machine reboots, confirm that the driver was successfully uninstalled. + +```powershell +PNPUTIL /enum-drivers +``` + + + Open an elevated PowerShell terminal and run the following command: ```powershell Get-PnpDevice | select status, class, friendlyname, instanceid | findstr /i /c:"3d video" ``` +You should see the Installed Drivers in the PNPUTIL output, if the "Provider Name" is listed as "NVIDIA Corporation" that is the driver you need to target for uninstall, note the "Published Name" you will be using that in the next command. + +```output +Published Name: oem15.inf +Original Name: nvlwswi.inf +Provider Name: NVIDIA +Class Name: Display +Class GUID: {4d36e968-e325-11ce-bfc1-08002be10318} +Driver Version: 03/05/2024 31.0.15.5178 +Signer Name: Microsoft Windows Hardware Compatibility Publisher +``` + +Run the following in your Powershell session, replacing the ".\oem1.inf" with the value in "Published Name" from the "PNPUTIL Enum-Devices" output from earlier. + +```powershell +pnputil /delete-driver .\oem1.inf /uninstall /reboot +``` +After the reboot is complete, reconnect via Powershell or RDP Session. You should see the GPU devices appear in an error state as shown in this example output: @@ -82,7 +107,8 @@ When you uninstall the host driver, the physical GPU goes into an error state. Y For each GPU (3D Video Controller) device, run the following commands in PowerShell. Copy the instance ID; for example, `PCI\VEN_10DE&DEV_1EB8&SUBSYS_12A210DE&REV_A1\4&32EEF88F&0&0000` from the previous command output: ```powershell -$id1 = "" +$gpu=Get-PnpDevice -FriendlyName "3D Video Controller" +$id1 =$gpu[0].InstanceId $lp1 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id1).Data[0] Disable-PnpDevice -InstanceId $id1 -Confirm:$false Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force From 412c5835844bd5456cbb8b0225528cf5be35a75a Mon Sep 17 00:00:00 2001 From: Seth Manheim Date: Tue, 11 Jun 2024 08:47:22 -0700 Subject: [PATCH 2/8] Update deploy-gpu-node-pool.md --- AKS-Hybrid/deploy-gpu-node-pool.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index 59f7e0c887..fb11046ddd 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -62,19 +62,19 @@ Install the Azure Stack HCI, version 23H2 operating system locally on each serve ### Step 2: uninstall the NVIDIA host driver -Open a remote powershell session to each host, or run the following local in Powershell.You will need to start by uninstalling the NVIDIA host driver, then reboot the machine. After the machine reboots, confirm that the driver was successfully uninstalled. +Open a remote powershell session to each host, or run the following local in Powershell. Start by uninstalling the NVIDIA host driver, then reboot the machine. After the machine reboots, confirm that the driver was successfully uninstalled: ```powershell PNPUTIL /enum-drivers ``` - - Open an elevated PowerShell terminal and run the following command: +Open an elevated PowerShell prompt and run the following command: ```powershell Get-PnpDevice | select status, class, friendlyname, instanceid | findstr /i /c:"3d video" ``` -You should see the Installed Drivers in the PNPUTIL output, if the "Provider Name" is listed as "NVIDIA Corporation" that is the driver you need to target for uninstall, note the "Published Name" you will be using that in the next command. + +You should see the installed drivers in the **PNPUTIL** output. If the **Provider Name** is listed as **NVIDIA Corporation**, that is the driver you need to target for uninstalling. Make a note of the **Published Name**, as you must use that in the next command: ```output Published Name: oem15.inf @@ -86,12 +86,12 @@ Driver Version: 03/05/2024 31.0.15.5178 Signer Name: Microsoft Windows Hardware Compatibility Publisher ``` -Run the following in your Powershell session, replacing the ".\oem1.inf" with the value in "Published Name" from the "PNPUTIL Enum-Devices" output from earlier. +Run the following command in your Powershell session, and replace `.\oem1.inf` with the value in **Published Name** from the previous **PNPUTIL Enum-Devices** output. ```powershell pnputil /delete-driver .\oem1.inf /uninstall /reboot ``` -After the reboot is complete, reconnect via Powershell or RDP Session. +After the reboot is complete, reconnect via Powershell or an RDP Session. You should see the GPU devices appear in an error state as shown in this example output: From 03d48f096c113aea21dfb2b73efb2dff3311375d Mon Sep 17 00:00:00 2001 From: Seth Manheim Date: Tue, 11 Jun 2024 08:47:55 -0700 Subject: [PATCH 3/8] Update deploy-gpu-node-pool.md --- AKS-Hybrid/deploy-gpu-node-pool.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index fb11046ddd..233b8841c5 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -107,7 +107,7 @@ When you uninstall the host driver, the physical GPU goes into an error state. Y For each GPU (3D Video Controller) device, run the following commands in PowerShell. Copy the instance ID; for example, `PCI\VEN_10DE&DEV_1EB8&SUBSYS_12A210DE&REV_A1\4&32EEF88F&0&0000` from the previous command output: ```powershell -$gpu=Get-PnpDevice -FriendlyName "3D Video Controller" +$gpu=Get-PnpDevice -FriendlyName "3D Video Controller" $id1 =$gpu[0].InstanceId $lp1 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id1).Data[0] Disable-PnpDevice -InstanceId $id1 -Confirm:$false From 0c066635e87eb1ce8c30d525da3fbb8ebd1128dc Mon Sep 17 00:00:00 2001 From: Seth Manheim Date: Tue, 11 Jun 2024 08:48:29 -0700 Subject: [PATCH 4/8] Update deploy-gpu-node-pool.md --- AKS-Hybrid/deploy-gpu-node-pool.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index 233b8841c5..a6d65ffd65 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -86,7 +86,7 @@ Driver Version: 03/05/2024 31.0.15.5178 Signer Name: Microsoft Windows Hardware Compatibility Publisher ``` -Run the following command in your Powershell session, and replace `.\oem1.inf` with the value in **Published Name** from the previous **PNPUTIL Enum-Devices** output. +Run the following command in your Powershell session, and replace `.\oem1.inf` with the value in **Published Name** from the previous **PNPUTIL Enum-Devices** output: ```powershell pnputil /delete-driver .\oem1.inf /uninstall /reboot From ca8cf705019aa20ba327b78b2f98f9c38e15122b Mon Sep 17 00:00:00 2001 From: Michael Godfrey Date: Fri, 21 Jun 2024 17:43:19 +0000 Subject: [PATCH 5/8] made suggested changes --- AKS-Hybrid/deploy-gpu-node-pool.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index a6d65ffd65..535f4c2d86 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -71,7 +71,7 @@ PNPUTIL /enum-drivers Open an elevated PowerShell prompt and run the following command: ```powershell -Get-PnpDevice | select status, class, friendlyname, instanceid | findstr /i /c:"3d video" +Get-PnpDevice | Where-Object FriendlyName -Like '3D Video*' | Select-Object Status, FriendlyName, InstanceId ``` You should see the installed drivers in the **PNPUTIL** output. If the **Provider Name** is listed as **NVIDIA Corporation**, that is the driver you need to target for uninstalling. Make a note of the **Published Name**, as you must use that in the next command: @@ -104,11 +104,21 @@ Error 3D Video Controller PCI\VEN_10DE&DEV_1EB8&SUBSYS_1 When you uninstall the host driver, the physical GPU goes into an error state. You must dismount all the GPU devices from the host. -For each GPU (3D Video Controller) device, run the following commands in PowerShell. Copy the instance ID; for example, `PCI\VEN_10DE&DEV_1EB8&SUBSYS_12A210DE&REV_A1\4&32EEF88F&0&0000` from the previous command output: +For each GPU (3D Video Controller) device, run the following commands in PowerShell. This command will create a variable named "ID1" and "lp1" and populate the instance ID of the first GPU; for example, `PCI\VEN_10DE&DEV_1EB8&SUBSYS_12A210DE&REV_A1\4&32EEF88F&0&0000` from the previous command output. ```powershell $gpu=Get-PnpDevice -FriendlyName "3D Video Controller" -$id1 =$gpu[0].InstanceId +$id0 =$gpu[0].InstanceId +$lp0 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id1).Data[0] +Disable-PnpDevice -InstanceId $id1 -Confirm:$false +Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force +``` + +IF you have more then one GPU, please use the additonal command: + +```powershell +$gpu=Get-PnpDevice -FriendlyName "3D Video Controller" +$id1 =$gpu[1].InstanceId $lp1 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id1).Data[0] Disable-PnpDevice -InstanceId $id1 -Confirm:$false Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force @@ -117,7 +127,7 @@ Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force To confirm that the GPUs were correctly dismounted from the host, run the following command. You should put GPUs in an `Unknown` state: ```powershell -Get-PnpDevice | select status, class, friendlyname, instanceid | findstr /i /c:"3d video" +Get-PnpDevice | Where-Object FriendlyName -Like '3D Video*' | Select-Object Status, FriendlyName, InstanceId ``` ```output From 8da0cdd9eb6d01f4cd39c463495dfec9575e82cc Mon Sep 17 00:00:00 2001 From: Michael Godfrey Date: Fri, 21 Jun 2024 13:51:01 -0400 Subject: [PATCH 6/8] Update AKS-Hybrid/deploy-gpu-node-pool.md Co-authored-by: Seth Manheim --- AKS-Hybrid/deploy-gpu-node-pool.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index 535f4c2d86..69da0171e5 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -114,7 +114,7 @@ Disable-PnpDevice -InstanceId $id1 -Confirm:$false Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force ``` -IF you have more then one GPU, please use the additonal command: +If you have more than one GPU, use the following additional command: ```powershell $gpu=Get-PnpDevice -FriendlyName "3D Video Controller" From 0557b31780adb7c4f3a8753c0a0acfbe4fd5a8aa Mon Sep 17 00:00:00 2001 From: Michael Godfrey Date: Fri, 21 Jun 2024 17:56:55 +0000 Subject: [PATCH 7/8] made suggestion from JameLiang29 --- AKS-Hybrid/deploy-gpu-node-pool.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index 69da0171e5..dccb116314 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -93,6 +93,10 @@ pnputil /delete-driver .\oem1.inf /uninstall /reboot ``` After the reboot is complete, reconnect via Powershell or an RDP Session. +```powershell +Get-PnpDevice | Where-Object FriendlyName -Like '3D Video*' | Select-Object Status, FriendlyName, InstanceId +``` + You should see the GPU devices appear in an error state as shown in this example output: ```output From b447080bb04ba9bd00ee72938aca84b45d5d0e6b Mon Sep 17 00:00:00 2001 From: Michael Godfrey Date: Fri, 21 Jun 2024 18:05:06 +0000 Subject: [PATCH 8/8] corrected lines 113-118 --- AKS-Hybrid/deploy-gpu-node-pool.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/AKS-Hybrid/deploy-gpu-node-pool.md b/AKS-Hybrid/deploy-gpu-node-pool.md index dccb116314..5c57dc8311 100644 --- a/AKS-Hybrid/deploy-gpu-node-pool.md +++ b/AKS-Hybrid/deploy-gpu-node-pool.md @@ -113,9 +113,9 @@ For each GPU (3D Video Controller) device, run the following commands in PowerSh ```powershell $gpu=Get-PnpDevice -FriendlyName "3D Video Controller" $id0 =$gpu[0].InstanceId -$lp0 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id1).Data[0] -Disable-PnpDevice -InstanceId $id1 -Confirm:$false -Dismount-VMHostAssignableDevice -LocationPath $lp1 -Force +$lp0 = (Get-PnpDeviceProperty -KeyName DEVPKEY_Device_LocationPaths -InstanceId $id0).Data[0] +Disable-PnpDevice -InstanceId $id0 -Confirm:$false +Dismount-VMHostAssignableDevice -LocationPath $lp0 -Force ``` If you have more than one GPU, use the following additional command: