1641125 – [RFE] add a configuration policy for vGPU placement

Bug 1641125 - [RFE] add a configuration policy for vGPU placement

Summary: [RFE] add a configuration policy for vGPU placement

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	General
Sub Component:
Version:	---
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.3.0
Target Release:	---
Assignee:	Milan Zamazal
QA Contact:	Nisim Simsolo
Docs Contact:	Rolfe Dlugy-Hegwer
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-10-19 17:01 UTC by Michael Shen (NVIDIA)
Modified:	2023-03-24 14:18 UTC (History)
CC List:	9 users (show)
Fixed In Version:	vdsm-4.30.5, ovirt-engine-4.3.0_rc
Clone Of:
Environment:
Last Closed:	2019-02-13 07:43:17 UTC
oVirt Team:	Virt
Embargoed:
Dependent Products:
Flags:	rdlugyhe: needinfo+ rdlugyhe: needinfo+ rdlugyhe: needinfo+ rule-engine: ovirt-4.3+ nsimsolo: testing_plan_complete+ mtessun: planning_ack+ rule-engine: devel_ack? rule-engine: testing_ack?

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3669861	None	None	None	2018-10-30 10:38:51 UTC
oVirt gerrit	95237	master	MERGED	tests: Add vGPU placement test	2021-02-11 06:10:41 UTC
oVirt gerrit	95238	master	MERGED	virt: Support for vGPU placement	2021-02-11 06:10:41 UTC
oVirt gerrit	95440	master	MERGED	core: Support for vGPU placement	2021-02-11 06:10:41 UTC
oVirt gerrit	95441	master	MERGED	webadmin: Support for vGPU placement	2021-02-11 06:10:41 UTC
oVirt gerrit	95551	master	MERGED	restapi: vGPU placement added	2021-02-11 06:10:41 UTC
oVirt gerrit	95552	master	MERGED	Add vGPU placement to Host	2021-02-11 06:10:41 UTC
oVirt gerrit	95970	master	MERGED	restapi: Update to model 4.3.19	2021-02-11 06:10:41 UTC

Description Michael Shen (NVIDIA) 2018-10-19 17:01:25 UTC

Description of problem:
vGPU assignment is currently fixed - it is always breadth first when there are multiple physical GPUs. Breadth first can benefit the performance but it can also decrease the flexibility - for example assigning 2Q profile twice in a dual GPU system can make both GPU not able to serve 1Q because both GPUs are now only ready for 2Q. 

Version-Release number of selected component (if applicable):


How reproducible:
Just assign vGPU in a RHV. 100% repro

Steps to Reproduce:
1. Assign 2Q vGPU profile twice in a dual physical GPU system
2. Try to assign 1Q vGPU profile

Actual results:
1Q cannot be assigned

Expected results:
An admin can configure a depth first policy that allows the vGPU assignment is always happening on the first available physical GPU until it runs out of the instances.
With this policy, the dual GPU system will be able to be provisioned with 1Q profile after twice 2Q assignments.  

Additional info:
The depth first policy can be negative in terms of performance. We recommend a perf data comparing both policies when the new one is in place. 
Reference vendor feature can be found: https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#modify-gpu-assignment-gpu-enabled-vms-vmware-vsphere

Comment 1 Ryan Barry 2018-10-26 13:08:47 UTC

Just to be clear, Michael:

You'd like RHV to try to maximally utilize the profiles on a single card before assignment to additional cards?

Comment 2 Milan Zamazal 2018-10-26 15:57:42 UTC

Michael, how do you assign the profiles in RHV -- by setting mdev_type custom property for a VM and starting the VM? And what RHV version do you use?

Current RHV implementation should use depth first rather than breadth first assignment and should be deterministic. The case you present as a failing reproducer works for me fine and all 3 VMs are assigned to 2 cards.

Of course, that doesn't change anything on the request to support two different profile assignment policies. I'm just wondering why the observed current policy is different and I'd like to be sure that we understand each other correctly.

Comment 3 Michael Shen (NVIDIA) 2018-10-26 16:08:27 UTC

Milan, I have not tested any this case. This comes directly from the customer. I don't believe they did anything to any settings. They are using RHV 4.2 the latest. 

Your statement could be another advocacy to this requirement if you both have different behaviors on multi GPU system. In this case we really need a way to specify the depth first or breadth first. 

The requirement is simple - provide a way (can either be a setting or a policy) for admins to decide if they want to go with depth first or breadth first in a multiple GPU system. 
Depth first means that when assigning a certain vGPU profile, RHV always attempts to find the physical GPU already with most of the same profile assigned; and then the first available GPU. 
Breadth first means that when assigning a certain vGPU profile, RHV always attempts to find the first eligible physical GPU with least profile assigned.
 

(In reply to Milan Zamazal from comment #2)
> Michael, how do you assign the profiles in RHV -- by setting mdev_type
> custom property for a VM and starting the VM? And what RHV version do you
> use?
> 
> Current RHV implementation should use depth first rather than breadth first
> assignment and should be deterministic. The case you present as a failing
> reproducer works for me fine and all 3 VMs are assigned to 2 cards.
> 
> Of course, that doesn't change anything on the request to support two
> different profile assignment policies. I'm just wondering why the observed
> current policy is different and I'd like to be sure that we understand each
> other correctly.

Comment 4 Milan Zamazal 2018-10-26 18:04:52 UTC

Thank you Michael for clarification, the requirement is fully clear now.

Comment 6 Milan Zamazal 2018-12-13 12:30:54 UTC

The only missing bit is Ansible support. The corresponding pull request has been posted (https://github.com/ansible/ansible/pull/49718), but it can be merged only after RHV 4.3 is released since it requires 4.3 SDK.

Comment 7 Nisim Simsolo 2019-01-02 13:55:28 UTC

What is the expected behavior when using only 1 physical GPU available with host Seperated vGPU placement?
Currently, after assigning 2Q profile to GPU, trying to add 1Q profile failed with vdsm.log:
 ERROR (jsonrpc/1) [virt.vm] (vmId='210d538b-05b2-4f7c-bfea-5aa33baa3c3f') Failed to tear down device 1ae01fc9-df70-40b3-b1be-59c30bd1b3eb, device 
in inconsistent state (vm:2380)
raise exception.ResourceUnavailable('vgpu: No mdev found')
ResourceUnavailable: Resource unavailable

HW in use: 
2 X Tesla M60

Scenario:
1. Run 3 VMs with nvidia-22 mdev_type (8Q) on each VM (max_instance=1, so it leaves only 1 available physical GPU).
2. Run 2 VMs with nvidia-18 mdev_type (2Q) on each VM.
3. Try to run VM with nvidia-15 (1Q).

Comment 8 Milan Zamazal 2019-01-02 16:20:50 UTC

Nvidia doesn't support mixing different mdev_type's on the same physical card. If I understand it correctly, you have 4 physical GPUs. Then after steps 1. + 2. you have only one physical GPU available, but it's already assigned with two instances of nvidia-18. So you could add only another nvidia-18 to it, and not nvidia-15.

Attempt to add nvidia-15 should fail with error message "vgpu: No device with type nvidia-15 is available" in vdsm.log, before the log excerpt above.

Thus what you observe is expected.

Comment 9 Michael Shen (NVIDIA) 2019-01-02 18:40:47 UTC

Just to confirm what Milan said in comment #8. 

NVIDIA vGPU does not support heterogeneous instances on 1 physical GPU, meaning 1 physical GPU can only be assigned with a certain vGPU profile at 1 time. 

This is the main driver of this feature - one admin may want to assign a certain vGPU profile to 1 physical GPU as many as he can in a multiple-GPU system, so that he may be able to assign a second vGPU profile to a second physical GPU for potential system versatility.

Comment 10 Nisim Simsolo 2019-01-08 12:48:31 UTC

Verification builds:
ovirt-engine-4.3.0-0.4.master.20181230173049.gitef04cb4.el7
vdsm-4.30.4-81.gitad6147e.el7.x86_64
libvirt-client-4.5.0-10.el7_6.3.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
Nvidia M60, Driver Version: 410.62

Verification scenario:
Polarion test case added to external trackers.

Comment 11 Raz Tamir 2019-01-10 14:31:48 UTC

QE verification bot: the bug was verified upstream

Comment 12 Rolfe Dlugy-Hegwer 2019-02-07 18:05:17 UTC

Hi Michael,

I would like to confirm that I have understood NVIDIA's support for vGPU and passthrough mode.

Reading NVIDIA's release notes for Red Hat support at https://docs.nvidia.com/grid/6.0/grid-vgpu-release-notes-red-hat-el-kvm/index.html, I see:
 
"Since 6.1: Red Hat Virtualization (RHV)	4.1, 4.2	All NVIDIA GPUs that support NVIDIA vGPU software are supported with vGPU and in pass-through mode."

The NVIDIA doc you linked to in the Description# above shows users how to select between:
* Shared (i.e., vGPU)
* Shared Direct (i.e., Passthrough)  

And, if the user picks Shared Direct (Passthrough), how to select between:
* Spread VMs across GPUs (i.e., 
* Group VMs on GPU until full (i.e., 

This topic in the NVIDIA documentation, https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#pass-through-gpu-use-introduction, also states:

"In GPU pass-through mode, an entire physical GPU is directly assigned to one VM, bypassing the NVIDA Virtual GPU Manager. In this mode of operation, the GPU is accessed exclusively by the NVIDIA driver running in the VM to which it is assigned. The GPU is not shared among VMs."

Clarifying questions to help me better understand these features:

* Where it says "pass-through mode [...] bypass[es] the NVIDIA Virtual GPU Manager," is "Virtual GPU Manager" the name of the GUI or is it the same thing as vGPU?

* Where it says "the GPU is not shared among VMs," does this contradict the first topic which says "passthrough spreads VMs across GPUs"?

Thank you,

Rolfe

Comment 15 Rolfe Dlugy-Hegwer 2019-02-07 18:40:57 UTC

(In reply to Rolfe Dlugy-Hegwer from comment #12)

> * Where it says "the GPU is not shared among VMs," does this contradict the
> first topic which says "passthrough spreads VMs across GPUs"?


I think I understand this now. Given multiple GPUs and multiple VMs, passthrough mode allocates each VM to a unique GPU. 

Sorry for the confusion.

Comment 16 Rolfe Dlugy-Hegwer 2019-02-07 19:30:52 UTC

Hi Milan, 

Would you please review the updated content in the Doc Text field and suggest additional content to replace the <do xyz or see topic abc>. placeholder.

Thank you,

Rolfe

Comment 17 Milan Zamazal 2019-02-08 08:36:31 UTC

Hi Rolfe, some corrections:

> Previously, version 4.2.0 added support for vGPUs and used the "breadth-first" allocation policy.

Actually, "depth-first" allocation policy was used. Yes, it's to the contrary what's written in this RFE, but I don't have information why "breadth-first" was applied in the given case, it's likely to be a different problem.

> It assigns each new vGPUs to the physical GPU with most vGPUs already on it.

It's a bit too strong claim. While the essence is correct, there are additional restrictions such as there must be still a free space on the physical GPU or the vGPUs there must be of the same type. Something like

  It assigns each new vGPUs to the physical GPU with vGPUs already on it if possible.

would be both more vague and more accurate. :-)

> The physical GPU must also support the new vGPU type.

And the vGPUs already assigned to it, if any, may be required to be of the same type, depending on GPU vendor's restrictions.

> To configure vGPU to use either allocation policy, <do xyz or see topic abc>.

It's not configured per vGPU but per host. The reason is that different hosts can have different GPU hardware and topologies and the preferred placement policy may depend on that.

In "Console and GPU" tab of Host Edit dialog, there is "vGPU Placement" selector, where the user can choose between "Consolidated" ("depth-first") and "Separated" ("breadth-first") placement. The default is "Consolidated", consistent with the former (4.2) behavior. The policy of the host the VM runs on is used. If the user wants to use a specific placement policy for a given VM when there are different policies on the hosts in the given cluster, then the user can pin the VM to host(s) with the desired policy.

Comment 25 Rolfe Dlugy-Hegwer 2019-02-11 11:55:27 UTC

Ryan, would you review the content of the Doc Text field above? Thanks.

Comment 26 Sandro Bonazzola 2019-02-13 07:43:17 UTC

This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.