Bug 2013752 - VM with vGPU and vCPU config of 1 socket 16 cores fails to start
Summary: VM with vGPU and vCPU config of 1 socket 16 cores fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.9-1
: ---
Assignee: Milan Zamazal
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-13 16:14 UTC by amashah
Modified: 2022-05-31 14:54 UTC (History)
10 users (show)

Fixed In Version: ovirt-engine-4.4.9.4
Doc Type: Bug Fix
Doc Text:
Previously, certain CPU topologies would cause virtual machines with vGPU to fail. The current release fixes this issue.
Clone Of:
Environment:
Last Closed: 2021-11-16 13:54:29 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:
mavital: needinfo+


Attachments (Terms of Use)
Engine log (672.44 KB, application/octet-stream)
2021-11-11 16:51 UTC, Gilboa Davara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43804 0 None None None 2021-10-13 16:14:41 UTC
Red Hat Knowledge Base (Solution) 6414771 0 None None None 2021-10-13 16:24:17 UTC
Red Hat Product Errata RHBA-2021:4699 0 None None None 2021-11-16 13:54:33 UTC
oVirt gerrit 117225 0 master MERGED core: Enable IOMMU caching_mode when mdev devices are present 2021-10-21 09:27:31 UTC
oVirt gerrit 117247 0 ovirt-engine-4.4 MERGED core: Enable IOMMU caching_mode when mdev devices are present 2021-10-25 10:40:12 UTC
oVirt gerrit 117264 0 ovirt-engine-4.4.9.z MERGED core: Enable IOMMU caching_mode when mdev devices are present 2021-10-25 10:41:24 UTC

Description amashah 2021-10-13 16:14:14 UTC
Description of problem:

VM with nvidia vGPU configured with vCPU configuration of:
1 Virtual Socket
16 cores
1 thread

Fails to start with:

~~~
2021-10-13 11:59:19,288-0400 ERROR (vm/3b9aacbe) [virt.vm] (vmId='3b9aacbe-25de-4f47-a9c3-dc2e81b85980') The vm start process failed (vm:992)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 919, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2967, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1385, in createWithFlags
    raise libvirtError('virDomainCreateWithFlags() failed')
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-10-13T15:59:14.416352Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/b7abb
3ae-382b-4bdc-a524-08f89b484e52,display=on,bus=pci.5,addr=0x0,ramfb=on: warning: vfio b7abb3ae-382b-4bdc-a524-08f89b484e52: Could not enable error recovery for the device
2021-10-13T15:59:14.452831Z qemu-kvm: We need to set caching-mode=on for intel-iommu to enable device assignment with IOMMU protection.
2021-10-13 11:59:19,289-0400 INFO  (vm/3b9aacbe) [virt.vm] (vmId='3b9aacbe-25de-4f47-a9c3-dc2e81b85980') Changed state to Down: internal error: qemu unexpectedly closed the monitor: 2021-10
-13T15:59:14.416352Z qemu-kvm: -device vfio-pci-nohotplug,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/b7abb3ae-382b-4bdc-a524-08f89b484e52,display=on,bus=pci.5,addr=0x0,ramfb=on: warning: vf
io b7abb3ae-382b-4bdc-a524-08f89b484e52: Could not enable error recovery for the device
~~~



In /var/log/libvirt/qemu we see the same as above.


- VM fails to start with this configuration (sockets:cores:threads)
1:16:1    nvidia-258    Q35 Chipset with BIOS       Fails to run


- VM starts fine with these configs:
1:16:1    nvidia-258    I440FX Chipset with BIOS    Runs
1:15:1    nvidia-258    Q35 Chipset with BIOS       Runs
16:15:1   nvidia-258    Q35 Chipset with BIOS       Runs
1:16:1    none          Q35 Chipset with BIOS       Runs


- The issue began after upgrading from RHV-M 4.4.5 -> 4.4.8.

- Even a host with 4.4.5 and manager of 4.4.8 fails, which seems to indicate the engine is doing sending something different in 4.4.8 compared to 4.4.5.

- Current workaround is to use i440 machine type instead of Q35.


Version-Release number of selected component (if applicable):
4.4.8

How reproducible:
100%

Steps to Reproduce:
1. Create VM with 1 socket, 16 cores per socket, 1 thread 
2. assign vGPU device 
3. start VM

Actual results:
VM fails to start

Expected results:
VM starts.


Logs will be attached soon.

Comment 3 Milan Zamazal 2021-10-19 13:07:49 UTC
According to https://wiki.qemu.org/Features/VT-d, caching-mode="on" should be indeed set in <iommu> <driver> when vfio-pci devices are present. I could reproduce the problem and the VM starts for me when I added caching-mode option. I don't know why the error occurs only with some CPU topologies, most likely due to luck. I'll prepare a patch to add the option.

Comment 4 Arik 2021-10-20 07:23:52 UTC
Makes sense, that would also explain why it happens with a 4.4.8 engine and 4.4.5 host as iommu was added as part of the fix for bz 1946231

Comment 10 Nisim Simsolo 2021-11-04 12:25:20 UTC
Verified:
ovirt-engine-4.4.9.4-0.1.el8ev
qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64
libvirt-daemon-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64
vdsm-4.40.90.4-1.el8ev.x86_64
Nvidia  Driver Version: 460.73.02 

Verification scenario:
1. Reproduce issue (try to run VM with 1 socket and 16 cores per socket).
2. Upgrade ovirt-engine and RHV host.
3. run VM again
   Verify VM is running with Nvidia vGPU instance.

Comment 11 Arik 2021-11-09 08:01:32 UTC
(In reply to Nisim Simsolo from comment #10)
> Verified:
> ovirt-engine-4.4.9.4-0.1.el8ev
> qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6.x86_64
> libvirt-daemon-7.6.0-6.module+el8.5.0+13051+7ddbe958.x86_64
> vdsm-4.40.90.4-1.el8ev.x86_64
> Nvidia  Driver Version: 460.73.02 
> 
> Verification scenario:
> 1. Reproduce issue (try to run VM with 1 socket and 16 cores per socket).
> 2. Upgrade ovirt-engine and RHV host.
> 3. run VM again
>    Verify VM is running with Nvidia vGPU instance.

looks good to me, thanks

Comment 12 Gilboa Davara 2021-11-09 15:57:34 UTC
Not sure if its fixed or not, but still seeing it on vdsm-4.40.90.4-1.el8.x86_64.

Comment 13 Arik 2021-11-09 17:04:18 UTC
(In reply to Gilboa Davara from comment #12)
> Not sure if its fixed or not, but still seeing it on
> vdsm-4.40.90.4-1.el8.x86_64.

The fix is on the engine side - what's the version of ovirt-engine?

Comment 14 Gilboa Davara 2021-11-10 07:47:19 UTC
$ rpm -q ovirt-engine
ovirt-engine-4.4.9.4-1.el8.noarch

Comment 15 Arik 2021-11-10 15:30:48 UTC
(In reply to Gilboa Davara from comment #14)
> $ rpm -q ovirt-engine
> ovirt-engine-4.4.9.4-1.el8.noarch

ok interesting, can you please provide engine.log?

Comment 16 Gilboa Davara 2021-11-11 16:51:02 UTC
Created attachment 1841255 [details]
Engine log

Please note that I attempted to run the VM in 3 different ways:
- Q35/BIOS with all pass-through devices (GPU, audio, USB): Memory allocation failure (NUMA?).
- Q35/BIOS with one audio device:  IOMMU caching mode error.
- i440FX with all pass-through devices (GPU, audio, USB): Works out of the box, including nVidia GPU driver.

- Gilboa

Comment 17 Milan Zamazal 2021-11-12 10:58:23 UTC
I could reproduce the problem with a passthrough audio device. QEMU apparently requires the caching mode for any vfio-pci device and we should enable it if any host device is present.

Comment 18 Gilboa Davara 2021-11-12 13:49:38 UTC
While I do agree that having sane defaults is preferable, I would suggest you expose the iommu and vfio flags to the UI instead.
Out of 5 machines (4 Intel, one AMD) in 3 different oVirt clusters that export GPU/Audio/USB pass-through devices only one machine requires caching mode. 4 others simply work as advertised.
I wonder if enabling it by default won't break existing setups.

Comment 19 Milan Zamazal 2021-11-15 06:36:59 UTC
(In reply to Gilboa Davara from comment #18)
> While I do agree that having sane defaults is preferable, I would suggest
> you expose the iommu and vfio flags to the UI instead.

This would add a complexity that would be most likely really useful only as a workaround when something gets broken.

> Out of 5 machines (4 Intel, one AMD) in 3 different oVirt clusters that
> export GPU/Audio/USB pass-through devices only one machine requires caching
> mode. 4 others simply work as advertised.

The problem is currently known to exhibit only if the VM's maximum number of vCPUs >= 256. Depending on the cluster level, the VM CPU topology and the VM firmware type, the limit may or may not be reached. There can be also different QEMU versions on different hosts and the problem may perhaps occur only on certain hardware.

> I wonder if enabling it by default won't break existing setups.

QEMU documentation says:

  ``caching-mode=on|off`` (default: off)
      This enables caching mode for the VT-d emulated device.  When
      caching-mode is enabled, each guest DMA buffer mapping will generate an
      IOTLB invalidation from the guest IOMMU driver to the vIOMMU device in
      a synchronous way.  It is required for ``-device vfio-pci`` to work
      with the VT-d device, because host assigned devices requires to setup
      the DMA mapping on the host before guest DMA starts.

Which means enabling the flag is non-optional with vfio-pci and it works without the flag only due to some tolerance or coincidence.

Comment 20 Gilboa Davara 2021-11-15 10:10:08 UTC
Many thanks for the detailed response.
If you need QA services to test the fix, please let me know..

Comment 21 Milan Zamazal 2021-11-15 12:53:16 UTC
Since it's too late to handle the additional problem in 4.4.9, I opened a new bug for PCI host devices: BZ 2023313

Comment 25 errata-xmlrpc 2021-11-16 13:54:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (0-day RHV Manager (ovirt-engine) [ovirt-4.4.9]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4699


Note You need to log in before you can comment on or make changes to this bug.