Bug 2052557

Summary: RHV fails to release mdev vGPU device after VM shutdown
Product: Red Hat Enterprise Virtualization Manager Reporter: Sam Wachira <swachira>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED ERRATA QA Contact: Nisim Simsolo <nsimsolo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4.8CC: ahadas, emarcus, nsimsolo
Target Milestone: ovirt-4.5.0   
Target Release: 4.5.0   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, vGPU devices were not released when stateless VMs or VMs that were started in run-once mode were shut down. This sometimes caused the system to forbid running the VMs again, although the vGPU devices were available. IN this release, vGPU devices are properly released when stateless VMs or VMs that were started in run-once mode are shut down.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-26 16:23:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1987121    
Bug Blocks:    

Comment 5 Arik 2022-02-14 12:36:55 UTC
At this point, we would like to try to reproduce it on QE's (Nisim's) environment
what doesn't add up here - is that the output from vdsm changes upon restart of ovirt-engine

Comment 12 Sam Wachira 2022-02-28 13:31:41 UTC
Thanks Arik and Nisim for reproducing the issue and posting a fix so quickly.
The workaround to refresh host capabilities seems better than restarting the ovirt-engine every time.

Comment 13 Arik 2022-03-22 11:56:32 UTC
There is no point in testing this before the fix for bz 1987121 gets in

Comment 14 RHEL Program Management 2022-03-22 11:56:40 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 18 Nisim Simsolo 2022-04-04 15:23:10 UTC
Verified:
ovirt-engine-4.5.0-0.237.el8ev
vdsm-4.50.0.10-1.el8ev.x86_64
qemu-kvm-6.2.0-10.module+el8.6.0+14540+5dcf03db.x86_64
libvirt-daemon-8.0.0-5.module+el8.6.0+14480+c0a3aa0f.x86_64
NVIDIA-vGPU-rhel-8.5-470.103.02.x86_64

Verification scenario:
1. Run 3 VMs with nvidia-22 instance (Max. available nvidia-22 in this host is 4).
2. Run another VM using "Run once" option.
3. Shut down VM and run it again.
Verify VM is running properly with vGPU instance.
(In the reproduction of the bug we saw the next error: Running VM failed with "unavailableMDevs")
4. Repeat steps 2-3 few more times.

Comment 23 errata-xmlrpc 2022-05-26 16:23:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711