Bug 1844274 - [vGPU] Virtual Machine is left without video device after removing nvidia-xxx mdev custom property
Summary: [vGPU] Virtual Machine is left without video device after removing nvidia-xxx...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.4.1
: ---
Assignee: Liran Rotenberg
QA Contact: Nisim Simsolo
URL:
Whiteboard:
: 1819346 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-04 23:40 UTC by Germano Veit Michel
Modified: 2023-12-15 18:05 UTC (History)
7 users (show)

Fixed In Version: rhv-4.4.1-10
Doc Type: Bug Fix
Doc Text:
Previously, when the user used VGPU custom property to the VM and removed it the VM would miss having video device, preventing the usage of the VM console. This was fixed in libvirt-5.9.0-1.el8, and now the video device will be added as expected.
Clone Of:
Environment:
Last Closed: 2020-09-23 16:15:11 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-37543 0 None None None 2022-05-31 12:25:35 UTC
Red Hat Knowledge Base (Solution) 5133791 0 None None None 2020-06-05 03:17:13 UTC
Red Hat Product Errata RHBA-2020:3820 0 None None None 2020-09-23 16:15:31 UTC

Description Germano Veit Michel 2020-06-04 23:40:21 UTC
Description of problem:

When using vGPU, if the user removes the custom property to disable the vGPU for the VM the engine will create an XML without a video device, so the VM runs with no video device at all (even though VNC/SPICE definitions are present, but not much they can do without a video device).

1. Add mdev custom property "nvidia-XXX" for vGPU.

2. Start the VM, engine creates XML with:

<devices>
  ...
  <video>
    <model type="none"/>
  </video>
  ...
  <hostdev mode="subsystem" type="mdev" model="vfio-pci" display="on">
    <source>
      <address uuid="bd50bf7d-4c9d-441d-9caa-c5d5cf311b6d"/>
    </source>
  </hostdev>
  ...
</devices>

3. Remove the custom property

4. Start the VM, where is my video device?

<devices>
  ...
  <video>
    <model type="none"/>
  </video>
  ...
</devices>

5. Console doesn't work, VM is headless....

6. Looks like the VM is left with an unplugged video card:

engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = '71d97df8-e74f-4a92-8037-03519a619936' and type='video';
 type  | device | address | is_managed | is_plugged | is_readonly 
-------+--------+---------+------------+------------+-------------
 video | qxl    |         | t          | f          | f

I can make the proper device come back by editing the VM and changing to VGA->QXL or QXL->VGA.

How reproducible:
Always

Steps to Reproduce:
As above

Comment 1 Ryan Barry 2020-06-05 01:16:46 UTC
As discussed on the other bug, kind of intended behavior. The additional video device is used only for installing, then the vgpu drivers handle

Comment 2 Germano Veit Michel 2020-06-05 01:20:00 UTC
(In reply to Ryan Barry from comment #1)
> As discussed on the other bug, kind of intended behavior. The additional
> video device is used only for installing, then the vgpu drivers handle

Actually this is a little bit different. 

The problem here is that if one decides to remove the vGPU so the VM goes
back to non-accelerated it ends up without a vGPU and without a QXL/VGA,
there is no video card at all here.

Comment 3 Michal Skrivanek 2020-06-05 09:30:20 UTC
so the "design" is supposed to be:

VM without mdev: <video> set to qxl or vga, <graphics> set to spice or vnc or both.
VM with mdev property: on start replace <video> with <model type="none"/> and add the corresponding mdev hostdev
VM with mdev,nodisplay: keep <video> as configured, add mdev. This results in secondary card being available.

VMs are supposed to have graphics protocol set to VNC or VNC+SPICE(shouldn't really matter), they shouldn't be headless or only SPICE(otherwise VNC won't be available).


I guess it could be a problem with our hack we had to do in https://gerrit.ovirt.org/#/c/100525/6/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/libvirt/VmDevicesConverter.java

Comment 4 Arik 2020-06-09 15:05:56 UTC
*** Bug 1819346 has been marked as a duplicate of this bug. ***

Comment 5 Liran Rotenberg 2020-06-11 14:52:30 UTC
The engine did use the hack as pointed out in comment #3.
For long time there was a change in this code: https://gerrit.ovirt.org/#/c/103302/4/backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/libvirt/VmDevicesConverter.java
Therefore we skipped the hack and it made us see this current bug.

But, this hack is due to: BZ 1720612.
I tried to reproduce and the video device did have the alias, therefore, it stayed on the engine DB as managed and plugged (used with libvirt-daemon-6.0.0-22.module+el8.2.1+6815+1c792dc8.x86_64).

From the libvirt bugs it should be fine since libvirt-5.9.0-1.el8, what version of libvirt does the customer use?

Comment 6 Germano Veit Michel 2020-06-11 23:00:06 UTC
(In reply to Liran Rotenberg from comment #5)
> From the libvirt bugs it should be fine since libvirt-5.9.0-1.el8, what
> version of libvirt does the customer use?

RHEL 7.8, libvirt-4.5.0-33.el7_8.1.x86_64
I've reproduced this on a 4.4 Beta RHV-M with a RHEL 7.8 host with that same libvirt under 4.3 compat.

So are you saying this won't happen on 4.4? If so feel free to close on 4.4 GA.

Comment 8 Arik 2020-06-15 16:16:04 UTC
Per comment 5, worth checking if this still happens on 4.4.1

Comment 12 Nisim Simsolo 2020-07-13 17:03:27 UTC
Verified:
Verified:
ovirt-engine-4.4.1.8-0.7.el8ev
vdsm-4.40.22-1.el8ev.x86_64
libvirt-daemon-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64
Nvidia GRID 11.0 GA drivers

Verification scenario:
1. Run VM with mdev_type - nvidia-xx and Nvidia drivers installed inside the VM.
   Verify VM console is showing screen properly
   Observe VM and verify qxl video device is plugged:
engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = 'c35c7c86-448b-4700-8d36-304edb0f0079' and type='video';
 type  | device |                           address                            | is_managed | is_plugged | is_readonly 
-------+--------+--------------------------------------------------------------+------------+------------+-------------
 video | qxl    | {type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x0} | t          | t          | f

2. Power off VM, remove mdev_type custom property and run VM.
   Verify VM console is showing screen properly
   Observe VM and verify qxl video device is plugged:
engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = 'c35c7c86-448b-4700-8d36-304edb0f0079' and type='video';
 type  | device |                           address                            | is_managed | is_plugged | is_readonly 
-------+--------+--------------------------------------------------------------+------------+------------+-------------
 video | qxl    | {type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x0} | t          | t          | f

Comment 13 Nisim Simsolo 2020-07-13 17:15:27 UTC
correction to step 1 from https://bugzilla.redhat.com/show_bug.cgi?id=1844274#c12, video device should be with empty address, unless nodisplay custom property is in use:

when using mdev_type: nvidia-xx
engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = 'c35c7c86-448b-4700-8d36-304edb0f0079' and type='video';
 type  | device | address | is_managed | is_plugged | is_readonly 
-------+--------+---------+------------+------------+-------------
 video | qxl    |         | t          | t          | f

When using mdev_type: nodisplay,nvidia-xx
engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = 'c35c7c86-448b-4700-8d36-304edb0f0079' and type='video';
 type  | device |                           address                            | is_managed | is_plugged | is_readonly 
-------+--------+--------------------------------------------------------------+------------+------------+-------------
 video | qxl    | {type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x0} | t          | t          | f


step 2, result should be the same after removing mdev_type: nvidia-xx or mdev_type: nodisplay,nvidia-xx
engine=# select type,device,address,is_managed,is_plugged,is_readonly from vm_device where vm_id = 'c35c7c86-448b-4700-8d36-304edb0f0079' and type='video';
 type  | device |                           address                            | is_managed | is_plugged | is_readonly 
-------+--------+--------------------------------------------------------------+------------+------------+-------------
 video | qxl    | {type=pci, slot=0x01, bus=0x00, domain=0x0000, function=0x0} | t          | t          | f

Comment 14 Robert McSwain 2020-07-15 16:58:56 UTC
On other addition from the customer about this:

I think the only remaining issue for us from this ticket is the fact that the nodisplay option isn't working for the Windows VMs we have.  We're able to RDP into the Windows VM, but the console on the vGPU enabled Windows VM we have does not work even with the nodisplay option.  From within the VM I can see the second QXL display adapter, but the console doesn't work.

Comment 21 errata-xmlrpc 2020-09-23 16:15:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages 4.4.z [ovirt-4.4.2]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3820

Comment 22 Red Hat Bugzilla 2023-09-15 01:29:49 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.