2114020 – RHV vGPU placement policy for separated not working

Bug 2114020 - RHV vGPU placement policy for separated not working

Summary: RHV vGPU placement policy for separated not working

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.5.3
Target Release:	---
Assignee:	Milan Zamazal
QA Contact:	Nisim Simsolo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-08-02 15:15 UTC by Frank DeLorey
Modified:	2022-09-09 07:59 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-09-09 07:59:00 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHV-47789	0	None	None	None	2022-08-02 15:25:07 UTC

Comment 3 Milan Zamazal 2022-08-02 16:24:04 UTC

It is important to know that the separate placement is not smart -- it doesn't attempt to do any balancing and if it cannot find any completely empty physical card then it switches to the compact placement. For example, if there are two physical cards then the first vGPU is placed on one of them, the second vGPU is placed on the other one, and all additional vGPUs may be placed on any of them arbitrarily (assuming nothing is detached in the meantime). 

According to the provided nvidia-smi output, there is something running on each of the cards, which satisfies the separate placement. The provided Vdsm log snippet doesn't contain the VM start, I'd expect to see "Separate mdev placement failed, trying compact placement." messages during VM startups once all the available cards have some vGPU assigned.

Frank, can you confirm whether this is the case (and thus not a bug)?

Comment 6 Milan Zamazal 2022-08-05 18:14:38 UTC

I cannot reproduce the problem and the QE confirms they have run it many times without experiencing any such problem.

Frank, we need the Vdsm logs from starting the VMs. Even better, it would be great if it was possible to run a VM in the situation when there is a free (i.e. nothing running there) physical card available and it is not used for the VM's vGPU, with Vdsm debug logging enabled. Then the Vdsm log would very likely pointed us to the particular problem. (With the default logging level, we can only know whether Vdsm switches to the consolidated placement but if it does we won't know why.)

Comment 8 Casper (RHV QE bot) 2022-09-05 07:31:04 UTC

This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).

Comment 9 Milan Zamazal 2022-09-09 07:59:00 UTC

The bug is not reproducible and the customer case was closed, so closing the bug. Please reopen if it still needs attention, with the additional information as requested in Comment 6.

Note You need to log in before you can comment on or make changes to this bug.