Description of problem: VM with nvidia vGPU configured with vCPU configuration of: 1 Virtual Socket 16 cores 1 thread Fails to start with: ~~~ 2021-10-13 11:59:19,288-0400 ERROR (vm/3b9aacbe) [virt.vm] (vmId='3b9aacbe-25de-4f47-a9c3-dc2e81b85980') The vm start process failed (vm:992) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 919, in _startUnderlyingVm self._run() File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2967, in _run dom.createWithFlags(flags) File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1385, in createWithFlags raise libvirtError('virDomainCreateWithFlags() failed') libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-10-13T15:59:14.416352Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/b7abb 3ae-382b-4bdc-a524-08f89b484e52,display=on,bus=pci.5,addr=0x0,ramfb=on: warning: vfio b7abb3ae-382b-4bdc-a524-08f89b484e52: Could not enable error recovery for the device 2021-10-13T15:59:14.452831Z qemu-kvm: We need to set caching-mode=on for intel-iommu to enable device assignment with IOMMU protection. 2021-10-13 11:59:19,289-0400 INFO (vm/3b9aacbe) [virt.vm] (vmId='3b9aacbe-25de-4f47-a9c3-dc2e81b85980') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2021-10 -13T15:59:14.416352Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/b7abb3ae-382b-4bdc-a524-08f89b484e52,display=on,bus=pci.5,addr=0x0,ramfb=on: warning: vf io b7abb3ae-382b-4bdc-a524-08f89b484e52: Could not enable error recovery for the device ~~~ In /var/log/libvirt/qemu we see the same as above. - VM fails to start with this configuration (sockets:cores:threads) 1:16:1 nvidia-258 Q35 Chipset with BIOS Fails to run - VM starts fine with these configs: 1:16:1 nvidia-258 I440FX Chipset with BIOS Runs 1:15:1 nvidia-258 Q35 Chipset with BIOS Runs 16:15:1 nvidia-258 Q35 Chipset with BIOS Runs 1:16:1 none Q35 Chipset with BIOS Runs - The issue began after upgrading from RHV-M 4.4.5 -> 4.4.8. - Even a host with 4.4.5 and manager of 4.4.8 fails, which seems to indicate the engine is doing sending something different in 4.4.8 compared to 4.4.5. - Current workaround is to use i440 machine type instead of Q35. Version-Release number of selected component (if applicable): 4.4.8 How reproducible: 100% Steps to Reproduce: 1. Create VM with 1 socket, 16 cores per socket, 1 thread 2. assign vGPU device 3. start VM Actual results: VM fails to start Expected results: VM starts. Logs will be attached soon.
According to https://wiki.qemu.org/Features/VT-d, caching-mode="on" should be indeed set in <iommu> <driver> when vfio-pci devices are present. I could reproduce the problem and the VM starts for me when I added caching-mode option. I don't know why the error occurs only with some CPU topologies, most likely due to luck. I'll prepare a patch to add the option.
Makes sense, that would also explain why it happens with a 4.4.8 engine and 4.4.5 host as iommu was added as part of the fix for bz 1946231
Verified: ovirt-engine-4.4.9.4-0.1.el8ev qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64 libvirt-daemon-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64 vdsm-4.40.90.4-1.el8ev.x86_64 Nvidia Driver Version: 460.73.02 Verification scenario: 1. Reproduce issue (try to run VM with 1 socket and 16 cores per socket). 2. Upgrade ovirt-engine and RHV host. 3. run VM again Verify VM is running with Nvidia vGPU instance.
(In reply to Nisim Simsolo from comment #10) > Verified: > ovirt-engine-4.4.9.4-0.1.el8ev > qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64 > libvirt-daemon-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64 > vdsm-4.40.90.4-1.el8ev.x86_64 > Nvidia Driver Version: 460.73.02 > > Verification scenario: > 1. Reproduce issue (try to run VM with 1 socket and 16 cores per socket). > 2. Upgrade ovirt-engine and RHV host. > 3. run VM again > Verify VM is running with Nvidia vGPU instance. looks good to me, thanks
Not sure if its fixed or not, but still seeing it on vdsm-4.40.90.4-1.el8.x86_64.
(In reply to Gilboa Davara from comment #12) > Not sure if its fixed or not, but still seeing it on > vdsm-4.40.90.4-1.el8.x86_64. The fix is on the engine side - what's the version of ovirt-engine?
$ rpm -q ovirt-engine ovirt-engine-4.4.9.4-1.el8.noarch
(In reply to Gilboa Davara from comment #14) > $ rpm -q ovirt-engine > ovirt-engine-4.4.9.4-1.el8.noarch ok interesting, can you please provide engine.log?
Created attachment 1841255 [details] Engine log Please note that I attempted to run the VM in 3 different ways: - Q35/BIOS with all pass-through devices (GPU, audio, USB): Memory allocation failure (NUMA?). - Q35/BIOS with one audio device: IOMMU caching mode error. - i440FX with all pass-through devices (GPU, audio, USB): Works out of the box, including nVidia GPU driver. - Gilboa
I could reproduce the problem with a passthrough audio device. QEMU apparently requires the caching mode for any vfio-pci device and we should enable it if any host device is present.
While I do agree that having sane defaults is preferable, I would suggest you expose the iommu and vfio flags to the UI instead. Out of 5 machines (4 Intel, one AMD) in 3 different oVirt clusters that export GPU/Audio/USB pass-through devices only one machine requires caching mode. 4 others simply work as advertised. I wonder if enabling it by default won't break existing setups.
(In reply to Gilboa Davara from comment #18) > While I do agree that having sane defaults is preferable, I would suggest > you expose the iommu and vfio flags to the UI instead. This would add a complexity that would be most likely really useful only as a workaround when something gets broken. > Out of 5 machines (4 Intel, one AMD) in 3 different oVirt clusters that > export GPU/Audio/USB pass-through devices only one machine requires caching > mode. 4 others simply work as advertised. The problem is currently known to exhibit only if the VM's maximum number of vCPUs >= 256. Depending on the cluster level, the VM CPU topology and the VM firmware type, the limit may or may not be reached. There can be also different QEMU versions on different hosts and the problem may perhaps occur only on certain hardware. > I wonder if enabling it by default won't break existing setups. QEMU documentation says: ``caching-mode=on|off`` (default: off) This enables caching mode for the VT-d emulated device. When caching-mode is enabled, each guest DMA buffer mapping will generate an IOTLB invalidation from the guest IOMMU driver to the vIOMMU device in a synchronous way. It is required for ``-device vfio-pci`` to work with the VT-d device, because host assigned devices requires to setup the DMA mapping on the host before guest DMA starts. Which means enabling the flag is non-optional with vfio-pci and it works without the flag only due to some tolerance or coincidence.
Many thanks for the detailed response. If you need QA services to test the fix, please let me know..
Since it's too late to handle the additional problem in 4.4.9, I opened a new bug for PCI host devices: BZ 2023313
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (0-day RHV Manager (ovirt-engine) [ovirt-4.4.9]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4699