Bug 1641125 - [RFE] add a configuration policy for vGPU placement
Summary: [RFE] add a configuration policy for vGPU placement
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: ---
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.3.0
: ---
Assignee: Milan Zamazal
QA Contact: Nisim Simsolo
Rolfe Dlugy-Hegwer
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-19 17:01 UTC by Michael Shen (NVIDIA)
Modified: 2023-03-24 14:18 UTC (History)
9 users (show)

Fixed In Version: vdsm-4.30.5, ovirt-engine-4.3.0_rc
Doc Type: Enhancement
Doc Text:
Previously, version 4.2.0 added support for vGPUs and used a Consolidated ("depth-first") allocation policy. The current release adds support for a Separated ("breadth-first") allocation policy. The default policy is the Consolidated allocation policy.
Clone Of:
Environment:
Last Closed: 2019-02-13 07:43:17 UTC
oVirt Team: Virt
Embargoed:
rdlugyhe: needinfo+
rdlugyhe: needinfo+
rdlugyhe: needinfo+
rule-engine: ovirt-4.3+
nsimsolo: testing_plan_complete+
mtessun: planning_ack+
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3669861 0 None None None 2018-10-30 10:38:51 UTC
oVirt gerrit 95237 0 master MERGED tests: Add vGPU placement test 2021-02-11 06:10:41 UTC
oVirt gerrit 95238 0 master MERGED virt: Support for vGPU placement 2021-02-11 06:10:41 UTC
oVirt gerrit 95440 0 master MERGED core: Support for vGPU placement 2021-02-11 06:10:41 UTC
oVirt gerrit 95441 0 master MERGED webadmin: Support for vGPU placement 2021-02-11 06:10:41 UTC
oVirt gerrit 95551 0 master MERGED restapi: vGPU placement added 2021-02-11 06:10:41 UTC
oVirt gerrit 95552 0 master MERGED Add vGPU placement to Host 2021-02-11 06:10:41 UTC
oVirt gerrit 95970 0 master MERGED restapi: Update to model 4.3.19 2021-02-11 06:10:41 UTC

Description Michael Shen (NVIDIA) 2018-10-19 17:01:25 UTC
Description of problem:
vGPU assignment is currently fixed - it is always breadth first when there are multiple physical GPUs. Breadth first can benefit the performance but it can also decrease the flexibility - for example assigning 2Q profile twice in a dual GPU system can make both GPU not able to serve 1Q because both GPUs are now only ready for 2Q. 

Version-Release number of selected component (if applicable):


How reproducible:
Just assign vGPU in a RHV. 100% repro

Steps to Reproduce:
1. Assign 2Q vGPU profile twice in a dual physical GPU system
2. Try to assign 1Q vGPU profile

Actual results:
1Q cannot be assigned

Expected results:
An admin can configure a depth first policy that allows the vGPU assignment is always happening on the first available physical GPU until it runs out of the instances.
With this policy, the dual GPU system will be able to be provisioned with 1Q profile after twice 2Q assignments.  

Additional info:
The depth first policy can be negative in terms of performance. We recommend a perf data comparing both policies when the new one is in place. 
Reference vendor feature can be found: https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#modify-gpu-assignment-gpu-enabled-vms-vmware-vsphere

Comment 1 Ryan Barry 2018-10-26 13:08:47 UTC
Just to be clear, Michael:

You'd like RHV to try to maximally utilize the profiles on a single card before assignment to additional cards?

Comment 2 Milan Zamazal 2018-10-26 15:57:42 UTC
Michael, how do you assign the profiles in RHV -- by setting mdev_type custom property for a VM and starting the VM? And what RHV version do you use?

Current RHV implementation should use depth first rather than breadth first assignment and should be deterministic. The case you present as a failing reproducer works for me fine and all 3 VMs are assigned to 2 cards.

Of course, that doesn't change anything on the request to support two different profile assignment policies. I'm just wondering why the observed current policy is different and I'd like to be sure that we understand each other correctly.

Comment 3 Michael Shen (NVIDIA) 2018-10-26 16:08:27 UTC
Milan, I have not tested any this case. This comes directly from the customer. I don't believe they did anything to any settings. They are using RHV 4.2 the latest. 

Your statement could be another advocacy to this requirement if you both have different behaviors on multi GPU system. In this case we really need a way to specify the depth first or breadth first. 

The requirement is simple - provide a way (can either be a setting or a policy) for admins to decide if they want to go with depth first or breadth first in a multiple GPU system. 
Depth first means that when assigning a certain vGPU profile, RHV always attempts to find the physical GPU already with most of the same profile assigned; and then the first available GPU. 
Breadth first means that when assigning a certain vGPU profile, RHV always attempts to find the first eligible physical GPU with least profile assigned.
 

(In reply to Milan Zamazal from comment #2)
> Michael, how do you assign the profiles in RHV -- by setting mdev_type
> custom property for a VM and starting the VM? And what RHV version do you
> use?
> 
> Current RHV implementation should use depth first rather than breadth first
> assignment and should be deterministic. The case you present as a failing
> reproducer works for me fine and all 3 VMs are assigned to 2 cards.
> 
> Of course, that doesn't change anything on the request to support two
> different profile assignment policies. I'm just wondering why the observed
> current policy is different and I'd like to be sure that we understand each
> other correctly.

Comment 4 Milan Zamazal 2018-10-26 18:04:52 UTC
Thank you Michael for clarification, the requirement is fully clear now.

Comment 6 Milan Zamazal 2018-12-13 12:30:54 UTC
The only missing bit is Ansible support. The corresponding pull request has been posted (https://github.com/ansible/ansible/pull/49718), but it can be merged only after RHV 4.3 is released since it requires 4.3 SDK.

Comment 7 Nisim Simsolo 2019-01-02 13:55:28 UTC
What is the expected behavior when using only 1 physical GPU available with host Seperated vGPU placement?
Currently, after assigning 2Q profile to GPU, trying to add 1Q profile failed with vdsm.log:
 ERROR (jsonrpc/1) [virt.vm] (vmId='210d538b-05b2-4f7c-bfea-5aa33baa3c3f') Failed to tear down device 1ae01fc9-df70-40b3-b1be-59c30bd1b3eb, device 
in inconsistent state (vm:2380)
raise exception.ResourceUnavailable('vgpu: No mdev found')
ResourceUnavailable: Resource unavailable

HW in use: 
2 X Tesla M60

Scenario:
1. Run 3 VMs with nvidia-22 mdev_type (8Q) on each VM (max_instance=1, so it leaves only 1 available physical GPU).
2. Run 2 VMs with nvidia-18 mdev_type (2Q) on each VM.
3. Try to run VM with nvidia-15 (1Q).

Comment 8 Milan Zamazal 2019-01-02 16:20:50 UTC
Nvidia doesn't support mixing different mdev_type's on the same physical card. If I understand it correctly, you have 4 physical GPUs. Then after steps 1. + 2. you have only one physical GPU available, but it's already assigned with two instances of nvidia-18. So you could add only another nvidia-18 to it, and not nvidia-15.

Attempt to add nvidia-15 should fail with error message "vgpu: No device with type nvidia-15 is available" in vdsm.log, before the log excerpt above.

Thus what you observe is expected.

Comment 9 Michael Shen (NVIDIA) 2019-01-02 18:40:47 UTC
Just to confirm what Milan said in comment #8. 

NVIDIA vGPU does not support heterogeneous instances on 1 physical GPU, meaning 1 physical GPU can only be assigned with a certain vGPU profile at 1 time. 

This is the main driver of this feature - one admin may want to assign a certain vGPU profile to 1 physical GPU as many as he can in a multiple-GPU system, so that he may be able to assign a second vGPU profile to a second physical GPU for potential system versatility.

Comment 10 Nisim Simsolo 2019-01-08 12:48:31 UTC
Verification builds:
ovirt-engine-4.3.0-0.4.master.20181230173049.gitef04cb4.el7
vdsm-4.30.4-81.gitad6147e.el7.x86_64
libvirt-client-4.5.0-10.el7_6.3.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64
Nvidia M60, Driver Version: 410.62

Verification scenario:
Polarion test case added to external trackers.

Comment 11 Raz Tamir 2019-01-10 14:31:48 UTC
QE verification bot: the bug was verified upstream

Comment 12 Rolfe Dlugy-Hegwer 2019-02-07 18:05:17 UTC
Hi Michael,

I would like to confirm that I have understood NVIDIA's support for vGPU and passthrough mode.

Reading NVIDIA's release notes for Red Hat support at https://docs.nvidia.com/grid/6.0/grid-vgpu-release-notes-red-hat-el-kvm/index.html, I see:
 
"Since 6.1: Red Hat Virtualization (RHV)	4.1, 4.2	All NVIDIA GPUs that support NVIDIA vGPU software are supported with vGPU and in pass-through mode."

The NVIDIA doc you linked to in the Description# above shows users how to select between:
* Shared (i.e., vGPU)
* Shared Direct (i.e., Passthrough)  

And, if the user picks Shared Direct (Passthrough), how to select between:
* Spread VMs across GPUs (i.e., 
* Group VMs on GPU until full (i.e., 

This topic in the NVIDIA documentation, https://docs.nvidia.com/grid/6.0/grid-vgpu-user-guide/index.html#pass-through-gpu-use-introduction, also states:

"In GPU pass-through mode, an entire physical GPU is directly assigned to one VM, bypassing the NVIDA Virtual GPU Manager. In this mode of operation, the GPU is accessed exclusively by the NVIDIA driver running in the VM to which it is assigned. The GPU is not shared among VMs."

Clarifying questions to help me better understand these features:

* Where it says "pass-through mode [...] bypass[es] the NVIDIA Virtual GPU Manager," is "Virtual GPU Manager" the name of the GUI or is it the same thing as vGPU?

* Where it says "the GPU is not shared among VMs," does this contradict the first topic which says "passthrough spreads VMs across GPUs"?

Thank you,

Rolfe

Comment 15 Rolfe Dlugy-Hegwer 2019-02-07 18:40:57 UTC
(In reply to Rolfe Dlugy-Hegwer from comment #12)

> * Where it says "the GPU is not shared among VMs," does this contradict the
> first topic which says "passthrough spreads VMs across GPUs"?


I think I understand this now. Given multiple GPUs and multiple VMs, passthrough mode allocates each VM to a unique GPU. 

Sorry for the confusion.

Comment 16 Rolfe Dlugy-Hegwer 2019-02-07 19:30:52 UTC
Hi Milan, 

Would you please review the updated content in the Doc Text field and suggest additional content to replace the <do xyz or see topic abc>. placeholder.

Thank you,

Rolfe

Comment 17 Milan Zamazal 2019-02-08 08:36:31 UTC
Hi Rolfe, some corrections:

> Previously, version 4.2.0 added support for vGPUs and used the "breadth-first" allocation policy.

Actually, "depth-first" allocation policy was used. Yes, it's to the contrary what's written in this RFE, but I don't have information why "breadth-first" was applied in the given case, it's likely to be a different problem.

> It assigns each new vGPUs to the physical GPU with most vGPUs already on it.

It's a bit too strong claim. While the essence is correct, there are additional restrictions such as there must be still a free space on the physical GPU or the vGPUs there must be of the same type. Something like

  It assigns each new vGPUs to the physical GPU with vGPUs already on it if possible.

would be both more vague and more accurate. :-)

> The physical GPU must also support the new vGPU type.

And the vGPUs already assigned to it, if any, may be required to be of the same type, depending on GPU vendor's restrictions.

> To configure vGPU to use either allocation policy, <do xyz or see topic abc>.

It's not configured per vGPU but per host. The reason is that different hosts can have different GPU hardware and topologies and the preferred placement policy may depend on that.

In "Console and GPU" tab of Host Edit dialog, there is "vGPU Placement" selector, where the user can choose between "Consolidated" ("depth-first") and "Separated" ("breadth-first") placement. The default is "Consolidated", consistent with the former (4.2) behavior. The policy of the host the VM runs on is used. If the user wants to use a specific placement policy for a given VM when there are different policies on the hosts in the given cluster, then the user can pin the VM to host(s) with the desired policy.

Comment 25 Rolfe Dlugy-Hegwer 2019-02-11 11:55:27 UTC
Ryan, would you review the content of the Doc Text field above? Thanks.

Comment 26 Sandro Bonazzola 2019-02-13 07:43:17 UTC
This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.