I have been tried for a month to investigate on the GPU passthrugh issue of 6.7, Here is what I found.
Motherboard: MX32-L40 (a Gigabyte Serverboard which officially announced support ESXi 6.5, All ESXi passthrough requirements are meet by this MB)
VM OS: Windows 10 1809 Oct
ESXi Version: ESXi6.5u2(with latest patch), ESXi6.7u1(with latest patch)
GPU: I tried both AMD RX590 and Nvidia 1660Ti
Passthroughed Devices: All sub devices of the GPU, including HDMI audio and related bus.
Issue:
Basically,
if I start the VM the first time after ESXi host started, the GPU just works like a charm.
If I restart or stop/start the VM, the GPU device stopped working with a warning in device manager, error code 43.
If I disable the GPU before a VM restart/stop-start in device manager, then I’m able to re-enable the GPU after the VM reboot.
First, I’m pretty sure all of the following tweak doesn’t help:
- UEFI or Legacy boot of ESXi host
- UEFI or BIOS boot of Windows 10 VM
- ESXi 6.5(with latest patch) or ESXi 6.7(with latest patch)
- AMD Rx590 or Nvidia 1660 Ti
- pciPassthru.use64bitMMIO
- hypervisor.cpuid.v0
- pciHole.start/end
- svga.present
I tried them one by one, with ALL combinations, which took me several days, since server MBs are really slow to boot.
The conclusion is the same,
If it’s the first time starting the VM after ESXi boot, the GPUs just works. If I reboot/stop-start the VM, then the GPUs stopped working with error code 43.
Then I realized it’s a PCIe resetting issue. so I tried the following /etc/vmware/passthrough.conf combinations:
# NVIDIA
10de ffff link false
10de ffff bridge false
10de ffff d3d0 false
10de 2182 link false
10de 2182 bridge false
10de 2182 d3d0 false
# AMD Video Card
1002 ffff link false
1002 ffff bridge false
1002 ffff d3d0 false
It took me a whole week to try ALL those combinations. Finally, I found that, ONLY ONE combination works for me:
- ESXi 6.5
- 10de 2182 d3d0 false
Then I tried to upgrade the ESXi to 6.7u1 with the SAME settings, it just doesn’t work anymore.
I found something interesting in the log. When resetting the PCIe devices,
ESXi 6.5 resets them ONE BY ONE, with 4 seconds interval:
2019-03-07T05:56:29.586Z| vcpu-0| I125: UHCI: HCReset
2019-03-07T05:56:29.593Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.0 // This is my GPU
2019-03-07T05:56:33.603Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.1 // This is my GPU
2019-03-07T05:56:37.613Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.2 // This is my GPU
2019-03-07T05:56:41.622Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.3 // This is my GPU
2019-03-07T05:56:45.632Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:72:00.0
2019-03-07T05:56:49.692Z| vcpu-0| I125: NVME-PCI: PCI reset on controller nvme0.
while ESXi 6.7 resets them in a batch, without intervals:
2019-03-07T09:08:05.219Z| vcpu-0| I125: UHCI: HCReset
2019-03-07T09:08:05.223Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.0 // This is my GPU
2019-03-07T09:08:05.224Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.1 // This is my GPU
2019-03-07T09:08:05.225Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.2 // This is my GPU
2019-03-07T09:08:05.225Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:50:00.3 // This is my GPU
2019-03-07T09:08:05.227Z| vcpu-0| I125: PCIPassthru: Resetting Device at 0000:72:00.0
2019-03-07T09:08:09.258Z| vcpu-0| I125: NVME-PCI: PCI reset on controller nvme0.
There must be some different between 6.5 and 6.7 the way they reset the PCIe devices.
Anyone know what’s the difference and how to make it work in 6.7?