Currently we can assign only 1 mdev_type to a VM from oVirt web UI For supporting multiple Nvidia vGPU devices assigned to 1 VM, we should be able to assign multiple mdev_type to VM from the web UI. Please note that the NVIDIA drivers are not supported by NVIDIA for oVirt.
Please note that current Vdsm code doesn't support using multiple mdev Nvidia types on a single host with the explanation that it is not supported by Nvidia. There can be run multiple instances of the same type though. That should be clarified and Vdsm code should be adjusted if needed. Other than that current Vdsm code should work with multiple mdev devices, but it'll need some testing of course.
OK, the limitation is only for a single device, it should work fine with multiple devices.
Milan, are you sure vdsm supports this? I've tried a couple of different permutations of the XML, and the VDSM testcases fail in all instances. I need another patch to the engine side in any case, since the engine also doesn't expect to have duplicate custom properties, but I'd like to either confirm how VDSM expects it to look, or to know whether I should submit a patch to VDSM to support this.
Ryan, when I try to start a VM with two vGPU devices, it passes Vdsm preparation, but the VM fails to start on QEMU level: qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f68f492a-be98-4581-8453-21a2772d87ad,bus=pci.0,addr=0x7: vfio error: f68f492a-be98-4581-8453-21a2772d87ad: error getting device from group 1: Operation not permitted Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use So Vdsm looks ready, but it doesn't work. I don't know whether I'm doing something wrong in my testing or whether there is something wrong in Vdsm or elsewhere -- it would require further investigation.
Thanks. Can you share an example of the vmxml? I haven't found one which passes the prep tests. The elementtree search appears to be looking for a single ovirt-vm:custom/mdevType node
Created attachment 1481074 [details] Example domain XML
Attached. It uses the same mdev type for both the devices, but the result is the same if I use a different mdev type for the added device.
(In reply to Milan Zamazal from comment #8) > Attached. It uses the same mdev type for both the devices, but the result is > the same if I use a different mdev type for the added device. In qemu-kvm-rhev test, multi vgpus inside one guest only supported on M60-8Q vgpu type.
Verification builds: kernel-3.10.0-954.el7.x86_64 ovirt-engine-4.2.7.2-0.1.el7ev vdsm-4.20.42-1.el7ev.x86_64 qemu-kvm-rhev-2.12.0-18.el7.x86_64 libvirt-client-4.5.0-10.el7.x86_64 Host: NVIDIA-vGPU-rhel-7.6-410.62.x86_64 VMs: GRID6.3-GA-392.05-Windows-Guest-Drivers NVIDIA-Linux-x86_64-390.96-grid Hardware: 2 X Tesla M60 under the same host. Verification scenario: 1. Browse Webadmin -> compute -> VMs -> edit RHEL7 VM -> custom properties -> select mdev_type and Assign 4 mdev devices to VM. for example: nvidia-22,nvidia-22,nvidia-22,nvidia-22 2. Run VM. 3. Observe host nvidia-smi and verify 4 GPUs attached to VM with the same PID. 4. Open VM and verify there are 4 M60 PCI devices. 5. install Nvidia drivers on the VM and verify drivers functionality. 6. Repeat test steps 1-5 using different Windows and Linux VM OS types. 7. Power off VMs and verify mdev device removed from /sys/class/mdev_bus/0000\:8X\:00.0/ Polarion test case added to external trackers.
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.