1362557 – [RFE] Use the improved VM overhead calculation in VM scheduling

Bug 1362557 - [RFE] Use the improved VM overhead calculation in VM scheduling

Summary: [RFE] Use the improved VM overhead calculation in VM scheduling

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	Backend.Core
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	4.2.0
Assignee:	Martin Sivák
QA Contact:	Artyom
Docs Contact:
URL:
Whiteboard:	FutureFeature
Depends On:	1304346
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-02 13:43 UTC by Martin Sivák
Modified:	2019-04-28 13:15 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-12-22 06:50:45 UTC
oVirt Team:	SLA
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.2+ gklein: testing_plan_complete- mgoldboi: planning_ack+ msivak: devel_ack+ mavital: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	57390	0	master	MERGED	Use the new VmOverheadCalculator in scheduler	2017-10-04 13:45:26 UTC
oVirt gerrit	72141	0	master	MERGED	Use the overhead calculator to get commited memory in HostMonitoring	2017-10-04 15:35:56 UTC

Description Martin Sivák 2016-08-02 13:43:38 UTC

Description of problem:

The virt team added new method of computing a VM memory overhead (the extra memory needed by Qemu and the host on top of the configured memory of the VM).

We should start utilizing it.


This requires some QA and maybe even community involvement to make sure we are not rejecting VMs when there is still plenty of RAM.

Comment 1 Martin Sivák 2016-12-20 08:52:56 UTC

We have the patch ready (well it will probably have to be rebased), but we need to see whether we have QE capacity for normal and scale testing of the updated behaviour.

Comment 4 eberman 2017-01-05 13:24:36 UTC

before i can ack
i need to understand what exactly we need to do here

like:

what should the VM template hold?
what the VM template parameters?
what exactly should i monitor?
what is limit expected on what host HW?

etc....

thx

Comment 5 Martin Sivák 2017-01-05 13:29:32 UTC

Test scenario (both before and after the fix):

- Disable host swap (optional)
- Disable overcommit, KSM and balloon
- Start as many VMs (1) as possible to fill up the cluster
- Consume all possible memory inside many (or all) of the VMs
- Check for host memory and swap consumption


(1) Make the VM configuration slightly non trivial
- add multiple disks (can be small, the count is more important than size)
- configure multiple displays and/or high resolution display device
- enable memory hotplug (especially on PPC!)


Now, the assumption is that all VMs can use their memory fully when there is no overcommit (memory utilization 100%, no KSM and no balloon). I believe it won't be possible before this bug is fixed, because the Qemu overhead will consume some of that memory. The swap will probably be used if it is enabled.

You (probably) won't be able to start as many VMs once this is fixed, but all of those VMs should be able to utilize their full memory allocation. There should be no significant swapping in this case.

Comment 6 Moran Goldboim 2017-06-26 11:07:20 UTC

This implantation is very relevant for vGPU use cases.

Comment 7 Ilan Zuckerman 2017-08-29 10:39:36 UTC

(In reply to Martin Sivák from comment #5)
> Test scenario (both before and after the fix):
> 
> - Disable host swap (optional)
> - Disable overcommit, KSM and balloon
> - Start as many VMs (1) as possible to fill up the cluster
> - Consume all possible memory inside many (or all) of the VMs
> - Check for host memory and swap consumption
> 
> 
> (1) Make the VM configuration slightly non trivial
> - add multiple disks (can be small, the count is more important than size)
> - configure multiple displays and/or high resolution display device
> - enable memory hotplug (especially on PPC!)
> 
> 
> Now, the assumption is that all VMs can use their memory fully when there is
> no overcommit (memory utilization 100%, no KSM and no balloon). I believe it
> won't be possible before this bug is fixed, because the Qemu overhead will
> consume some of that memory. The swap will probably be used if it is enabled.
> 
> You (probably) won't be able to start as many VMs once this is fixed, but
> all of those VMs should be able to utilize their full memory allocation.
> There should be no significant swapping in this case.

Hi Martin, just a few questions regarding this issue:
1. Which two engine versions this needs to be validated on (before fix and after the fix)?
2. Do we need to manually enable the fix or is it automatically applied on fixed engine version?

Comment 8 Martin Sivák 2017-08-29 12:18:08 UTC

The patches are not merged yet.

Comment 9 Ilan Zuckerman 2017-10-03 13:05:45 UTC

(In reply to Martin Sivák from comment #8)
> The patches are not merged yet.

Hi Martin, can you please relate to the questions asked by me in comment7?
We need this information in order to validate the scenario, and we cannot plan testing before that.

Comment 10 Martin Sivák 2017-10-04 08:38:40 UTC

Ilan, I can't give you the engine version (except saying it will be 4.2.x), because it is not merged yet (it is ready, but we want to do some preliminary tests first). The second answer is yes, it is automatically applied and does not need to be enabled.

Comment 11 Ilan Zuckerman 2017-10-16 14:41:35 UTC

(In reply to Martin Sivák from comment #5)
Hi Martin, please see my questions inline.
> 
> (1) Make the VM configuration slightly non trivial
> - add multiple disks (can be small, the count is more important than size)
Which disks? size? provision (iscsi/nfs/thin)?

> - configure multiple displays and/or high resolution display device
Can you elaborate? i am not familiar with this feature.

> - enable memory hotplug (especially on PPC!)
Same thing here. Please elaborate / refer me.


Currently i have the following environment up and ready. please tell me whether it is suitable:
- 50 nested hosts
- 200 vms on one actual host. So overall 250 vms (including nested hosts)
- Each vm has one Thin disk 100GB

Comment 13 Martin Sivák 2017-10-17 13:43:58 UTC

> > - add multiple disks (can be small, the count is more important than size)
> Which disks? size? provision (iscsi/nfs/thin)?

Not important actually. The amount of attached disks can influence memory consumption, the type and size do not as I already said..

> > - configure multiple displays and/or high resolution display device
> Can you elaborate? i am not familiar with this feature.

Just configure the amount of displays for the VMs to be bigger than 1.

> > - enable memory hotplug (especially on PPC!)
> Same thing here. Please elaborate / refer me.

Define maximum memory to be 4x the configured memory for a VM (do not touch minimum guaranteed). This is probably even the default.

> Currently i have the following environment up and ready. please tell me
> whether it is suitable:
> - 50 nested hosts
> - 200 vms on one actual host. So overall 250 vms (including nested hosts)
> - Each vm has one Thin disk 100GB

The goal is to find out the VMs per host  density we can achieve and compare the numbers before and after the change (both with disabled memory overcommit).

Comment 16 Martin Sivák 2017-10-30 12:17:54 UTC

Some additional info I got from libvirt folks:

The QEMU process will for example eat some memory for every active VNC connection and we can't really tell how many users will be connected to a VM when we are starting it. But it increases the overhead.

Comment 17 Artyom 2017-11-09 07:03:59 UTC

Can you please provide feature page?

Comment 18 Artyom 2017-12-21 05:56:09 UTC

Verified on rhvm-4.2.0.2-0.1.el7.noarch

1) Create 4 VM's when at least one VM has additional 10 disks and 4 monitors and total VM's memory consume all host scheduling memory.
2) Start memory load on all VM's(VM memory consumption near 100%, you can use memtester for this purpose)

4.1 - one of VM's crashed after 1-2 minutes
4.2 - I ran memtester for 6 hours when the host had 97% of memory consumption, no VM's crashed on it.

Comment 19 Sandro Bonazzola 2017-12-22 06:50:45 UTC

This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.