Deep investigation on GPU Passthrough not working anymore after upgraded from 6.5 to 6.7, what’s different on PCIe resetting?

This post was originally published on this site

I have been tried for a month to investigate on the GPU passthrugh issue of 6.7, Here is what I found.

 

Motherboard: MX32-L40 (a Gigabyte Serverboard which officially announced support ESXi 6.5, All ESXi passthrough requirements are meet by this MB)

VM OS: Windows 10 1809 Oct

ESXi Version: ESXi6.5u2(with latest patch), ESXi6.7u1(with latest patch)

GPU: I tried both AMD RX590 and Nvidia 1660Ti

Passthroughed Devices: All sub devices of the GPU, including HDMI audio and related bus.

 

Issue:

Basically,

if I start the VM the first time after ESXi host started, the GPU just works like a charm.

If I restart or stop/start the VM, the GPU device stopped working with a warning in device manager, error code 43.

If I disable the GPU before a VM restart/stop-start in device manager, then I’m able to re-enable the GPU after the VM reboot.

 

First, I’m pretty sure all of the following tweak doesn’t help:

 

  1. UEFI or Legacy boot of ESXi host
  2. UEFI or BIOS boot of Windows 10 VM
  3. ESXi 6.5(with latest patch) or ESXi 6.7(with latest patch)
  4. AMD Rx590 or Nvidia 1660 Ti
  5. pciPassthru.use64bitMMIO
  6. hypervisor.cpuid.v0
  7. pciHole.start/end
  8. svga.present

 

I tried them one by one, with ALL combinations, which took me several days, since server MBs are really slow to boot.

The conclusion is the same,

If it’s the first time starting the VM after ESXi boot, the GPUs just works. If I reboot/stop-start the VM, then the GPUs stopped working with error code 43.

 

Then I realized it’s a PCIe resetting issue. so I tried the following /etc/vmware/passthrough.conf combinations:

 

# NVIDIA

 

10de  ffff  link   false

10de  ffff  bridge   false

10de  ffff  d3d0   false

10de  2182  link   false

10de  2182  bridge   false

10de  2182  d3d0   false

 

# AMD Video Card

 

1002 ffff link false

1002 ffff bridge false

1002 ffff d3d0 false

 

It took me a whole week to try ALL those combinations. Finally, I found that, ONLY ONE combination works for me:

  • ESXi 6.5
  • 10de  2182  d3d0   false

 

Then I tried to upgrade the ESXi to 6.7u1 with the SAME settings, it just doesn’t work anymore.

 

I found something interesting in the log. When resetting the PCIe devices,

 

ESXi 6.5 resets them ONE BY ONE, with 4 seconds interval:

 

2019-03-07T05:56:29.586Z| vcpu-0| I125: UHCI: HCReset

2019-03-07T05:56:29.593Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.0    // This is my GPU

2019-03-07T05:56:33.603Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.1    // This is my GPU

2019-03-07T05:56:37.613Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.2    // This is my GPU

2019-03-07T05:56:41.622Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.3    // This is my GPU

2019-03-07T05:56:45.632Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:72:00.0

2019-03-07T05:56:49.692Z| vcpu-0| I125: NVME-PCI: PCI reset on controller nvme0.

 

while ESXi 6.7 resets them in a batch, without intervals:

 

2019-03-07T09:08:05.219Z| vcpu-0| I125: UHCI: HCReset

2019-03-07T09:08:05.223Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.0    // This is my GPU

2019-03-07T09:08:05.224Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.1    // This is my GPU

2019-03-07T09:08:05.225Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.2    // This is my GPU

2019-03-07T09:08:05.225Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.3    // This is my GPU

2019-03-07T09:08:05.227Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:72:00.0

2019-03-07T09:08:09.258Z| vcpu-0| I125: NVME-PCI: PCI reset on controller nvme0.

 

There must be some different between 6.5 and 6.7 the way they reset the PCIe devices.

Anyone know what’s the difference and how to make it work in 6.7?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.