Bug 1620599 - [RFE] Assign more than one mdev device to a VM from RHV web UI
Summary: [RFE] Assign more than one mdev device to a VM from RHV web UI
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: x86_64
OS: All
high
high
Target Milestone: ovirt-4.2.7
: ---
Assignee: Ryan Barry
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-23 09:49 UTC by Martin Tessun
Modified: 2018-11-02 14:38 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.2.7.2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1609139
Environment:
Last Closed: 2018-11-02 14:38:21 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.2+
mtessun: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
Example domain XML (6.74 KB, application/xml)
2018-09-05 10:50 UTC, Milan Zamazal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 94217 0 master MERGED backend: allow multiple values for mdev_type 2021-02-08 14:45:45 UTC
oVirt gerrit 94218 0 master ABANDONED [DRAFT] backend: extend VM properties to show repeatable keys in the UI 2021-02-08 14:45:45 UTC
oVirt gerrit 94575 0 ovirt-engine-4.2 MERGED backend: allow multiple values for mdev_type 2021-02-08 14:45:45 UTC

Description Martin Tessun 2018-08-23 09:49:40 UTC
Currently we can assign only 1 mdev_type to a VM from oVirt web UI

For supporting multiple Nvidia vGPU devices assigned to 1 VM, we should be able to assign multiple mdev_type to VM from the web UI.

Please note that the NVIDIA drivers are not supported by NVIDIA for oVirt.

Comment 1 Milan Zamazal 2018-08-27 12:33:08 UTC
Please note that current Vdsm code doesn't support using multiple mdev Nvidia types on a single host with the explanation that it is not supported by Nvidia. There can be run multiple instances of the same type though.

That should be clarified and Vdsm code should be adjusted if needed. Other than that current Vdsm code should work with multiple mdev devices, but it'll need some testing of course.

Comment 2 Milan Zamazal 2018-08-27 13:34:26 UTC
OK, the limitation is only for a single device, it should work fine with multiple devices.

Comment 4 Ryan Barry 2018-09-05 01:56:37 UTC
Milan, are you sure vdsm supports this? I've tried a couple of different permutations of the XML, and the VDSM testcases fail in all instances.

I need another patch to the engine side in any case, since the engine also doesn't expect to have duplicate custom properties, but I'd like to either confirm how VDSM expects it to look, or to know whether I should submit a patch to VDSM to support this.

Comment 5 Milan Zamazal 2018-09-05 10:27:29 UTC
Ryan, when I try to start a VM with two vGPU devices, it passes Vdsm preparation, but the VM fails to start on QEMU level:

  qemu-kvm: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/f68f492a-be98-4581-8453-21a2772d87ad,bus=pci.0,addr=0x7: vfio error: f68f492a-be98-4581-8453-21a2772d87ad: error getting device from group 1: Operation not permitted
  Verify all devices in group 1 are bound to vfio-<bus> or pci-stub and not already in use

So Vdsm looks ready, but it doesn't work. I don't know whether I'm doing something wrong in my testing or whether there is something wrong in Vdsm or elsewhere -- it would require further investigation.

Comment 6 Ryan Barry 2018-09-05 10:37:51 UTC
Thanks. Can you share an example of the vmxml? I haven't found one which passes the prep tests. The elementtree search appears to be looking for a single ovirt-vm:custom/mdevType node

Comment 7 Milan Zamazal 2018-09-05 10:50:38 UTC
Created attachment 1481074 [details]
Example domain XML

Comment 8 Milan Zamazal 2018-09-05 10:51:53 UTC
Attached. It uses the same mdev type for both the devices, but the result is the same if I use a different mdev type for the added device.

Comment 9 Guo, Zhiyi 2018-09-06 05:21:30 UTC
(In reply to Milan Zamazal from comment #8)
> Attached. It uses the same mdev type for both the devices, but the result is
> the same if I use a different mdev type for the added device.

In qemu-kvm-rhev test, multi vgpus inside one guest only supported on M60-8Q vgpu type.

Comment 14 Nisim Simsolo 2018-10-11 10:32:36 UTC
Verification builds: 
kernel-3.10.0-954.el7.x86_64
ovirt-engine-4.2.7.2-0.1.el7ev
vdsm-4.20.42-1.el7ev.x86_64
qemu-kvm-rhev-2.12.0-18.el7.x86_64
libvirt-client-4.5.0-10.el7.x86_64
Host: NVIDIA-vGPU-rhel-7.6-410.62.x86_64
VMs: GRID6.3-GA-392.05-Windows-Guest-Drivers
     NVIDIA-Linux-x86_64-390.96-grid
Hardware: 2 X Tesla M60 under the same host.

Verification scenario: 
1. Browse Webadmin -> compute -> VMs -> edit RHEL7 VM -> custom properties -> select mdev_type and Assign 4 mdev devices to VM. for example: 
nvidia-22,nvidia-22,nvidia-22,nvidia-22
2. Run VM.
3. Observe host nvidia-smi and verify 4 GPUs attached to VM with the same PID. 
4. Open VM and verify there are 4 M60 PCI devices. 
5. install Nvidia drivers on the VM and verify drivers functionality.
6. Repeat test steps 1-5 using different Windows and Linux VM OS types.
7. Power off VMs and verify mdev device removed from /sys/class/mdev_bus/0000\:8X\:00.0/

Polarion test case added to external trackers.

Comment 15 Sandro Bonazzola 2018-11-02 14:38:21 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.