Bug 1228543

Summary: [RFE] hot-unplug memory
Product: [oVirt] ovirt-engine Reporter: Michal Skrivanek <michal.skrivanek>
Component: RFEsAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: high    
Version: ---CC: bugs, eheftman, jniederm, lsurette, mavital, mgoldboi, michal.skrivanek, msivak, mtessun, mzamazal, pdwyer, pstehlik, rbalakri, s.kieske, srevivo, tjelinek
Target Milestone: ovirt-4.2.0Keywords: FutureFeature
Target Release: 4.2.0Flags: rule-engine: ovirt-4.2+
rule-engine: exception+
ipinto: testing_plan_complete+
mtessun: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+
Hardware: All   
OS: Linux   
URL: http://www.ovirt.org/Features/Memory_Hotplug
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Previously, it was only possible to add memory to a running virtual machine (VM). To remove memory, the VM had to be shut down. In this release, it is now possible to hot unplug memory from a running VM, provided that certain requirements are met and limitations are considered.
Story Points: ---
Clone Of: 1224886 Environment:
Last Closed: 2018-02-12 10:11:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 515840, 822996, 1224886, 1245892, 1265880, 1314306, 1320447, 1320534, 1323417, 1325121, 1402880, 1482042, 1482076, 1482474, 1563532    
Bug Blocks: 515839, 962053, 1502671    

Description Michal Skrivanek 2015-06-05 07:21:06 UTC
+++ This bug was initially created as a clone of Bug #1224886 +++

add support for dynamically plugging and unplugging of memory

--- Additional comment from Michal Skrivanek on 2015-06-05 09:19:41 CEST ---

splitting as unplug is delayed

Comment 2 Tomas Jelinek 2015-12-16 09:09:54 UTC
*** Bug 1228546 has been marked as a duplicate of this bug. ***

Comment 3 Red Hat Bugzilla Rules Engine 2015-12-16 21:35:14 UTC
This request has been proposed for two releases. This is invalid flag usage. The ovirt-future release flag has been cleared. If you wish to change the release flag, you must clear one release flag and then set the other release flag to ?.

Comment 4 Sven Kieske 2016-01-25 11:37:27 UTC
will this make it in ovirt 4 ?

Comment 5 Sven Kieske 2016-02-11 09:59:08 UTC
BZ 1245892 is restricted, I can not view it.

Would it be possible to open this BZ to the public?

Thank you in advance.

Comment 6 Michal Skrivanek 2016-02-12 09:57:19 UTC
(In reply to Sven Kieske from comment #5)
> BZ 1245892 is restricted, I can not view it.
> 
> Would it be possible to open this BZ to the public?

sure, seems like it was private by mistake. done

Comment 7 Michal Skrivanek 2016-02-17 16:29:41 UTC
still looking at feasibility as it seems unplug doesn't work that well in Linux in general. There might be some annoying constraints. Let's see...

Comment 8 Milan Zamazal 2016-03-18 15:25:24 UTC
Here is a summary of currently known issues:

- In order to make some reasonable guarantees about being able to remove a previously hotplugged memory, the memory must be enabled as `online_movable' instead of just `online'.  However, it's possible to do so only when plugged memory blocks are onlined in particular order and it doesn't work with current udev rule for memory onlining.  See https://bugzilla.redhat.com/1314306, that kernel bug must be fixed to make memory hotplug usable.

- Currently, the udev rule for memory hotplug (/usr/lib/udev/rules.d/40-redhat.rules) enables the hotplugged memory as `online' instead of `online_movable', see the bug above.  This often results in the inserted memory blocks being used for kernel (non-movable) memory, preventing hotunplug of memory devices containing any of those blocks.  That must be changed, but only after the kernel bug mentioned above is fixed.

- When I try to remove more memory than it can be freed, it results in OOM kills, not in failure.

- I'm not sure whether memory onlined as `online_movable' is guaranteed to be removable and I doubt anybody knows for sure.  I once (and only once so far) met a situation when even `online_movable' memory wasn't removable.  I couldn't reproduce it later but we should probably be prepared for occasional hotunplug failures.

- As with other hotplug/hotunplug devices, libvirt doesn't report hotunplug failure and we have to rely on timeouts.  I was able to make memory hotunplug last for a few seconds even in my simple testing environment (when swapping out the used memory was involved).

- An additional issue is that kernel reports invalid information about used and free memory (e.g. claiming there's more available memory than physical memory) under some hotplug-hotunplug scenarios, see https://bugzilla.redhat.com/1265880.

Comment 9 Sven Kieske 2016-03-21 13:55:47 UTC
Hi,

Thanks for all those details.

https://bugzilla.redhat.com/show_bug.cgi?id=1314306 is also marked as private, would you mind open it to the public?

Thank you!

Comment 10 Moran Goldboim 2016-03-24 10:48:59 UTC
postponing for 4.1 due to the lack of readiness for production in lower level of the stack.

Comment 11 Milan Zamazal 2016-05-04 11:17:35 UTC
As explained in https://bugzilla.redhat.com/show_bug.cgi?id=1314306#c15, reliable memory hotunplug functionality in the kernel is a complicated matter and is unlikely to be available in a foreseeable future.  It is suggested to utilize memory ballooning mechanism instead, which should be more reliable, flexible and providing better performance.  So we discussed possible alternatives to the DIMM device based memory hotunplug and we consider the proposal described below.

The VM memory sizes can be defined by the following values:

- Minimum guaranteed memory ("minimum").
- Maximum memory ("maximum").
- Maximum memory actually assigned to the guest in libvirt ("assigned").
- Absolute memory limit ("limit").
- Free memory as reported by the guest operating system ("free"), not including caches and buffers.

The values must satisfy the following constraint:

  minimum <= maximum <= assigned <= limit

"Limit" is a hardcoded value and total guest RAM may not exceed it.

"Maximum" and "minimum" are already present in Engine UI as Memory Size and Physical Memory Guaranteed respectively.  User can currently change the values, but the only supported operation in runtime is increasing Memory Size (i.e. performing memory hotplug).

"Assigned" is a newly introduced value.  Wrt. current situation there is no distinction between "assigned" and "maximum".  But we want to emulate memory hotunplug by decreasing "maximum" below "assigned".  This doesn't change the current memory size set in libvirt/QEMU, which we track as "assigned". "Assigned" can't be set by the user, it's only changed indirectly by actual memory hotplug.

With this concept "maximum" and "minimum" could be modifiable in runtime in
either direction.  The following rules would apply:

- "Assigned" is initially set to "maximum" as defined in the VM.
- When the user increases "maximum" above "assigned", the amount of memory above "assigned" must be hotplugged and "assigned" is set to "maximum" or some higher value (depending on hotplug granularity) on success.
- Otherwise when the user adjusts any of the "maximum" or "minimum" values, the balloon must be adjusted accordingly (with the help of MoM).
- "Minimum" is guaranteed memory and can't be taken by the balloon.
- The memory between "maximum" and "minimum" is not a guaranteed memory and may be taken by the balloon.
- The memory between "assigned" and "maximum" is always taken by the balloon.
- If the user decreases "maximum" value, the balloon driver in the guest is instructed to react accordingly.  If it doesn't fulfill the order then we may kill the guest as it refuses to cooperate.

We must be careful about MoM.  Vdsm should report "minimum", "maximum" and "free" values to MoM.  But the actual balloon value as set by MoM must be increased by "assigned" minus "maximum" in Vdsm.  We must check whether this trick is safe wrt. MoM operation and its interaction with the host.

To know "free", oVirt guest agent must be running within the guest.  This is an extra requirement on the VMs so it would be better if we could obtain the value even without oVirt guest agent (could libvirt provide it?).

Comment 12 Sven Kieske 2016-05-09 13:56:50 UTC
(In reply to Milan Zamazal from comment #11)
> As explained in https://bugzilla.redhat.com/show_bug.cgi?id=1314306#c15,
> reliable memory hotunplug functionality in the kernel is a complicated
> matter and is unlikely to be available in a foreseeable future.

It seems that BZ 1314306 is not available to the public, could you maybe mark it as not private, if it's possible?

kind regards

Sven

Comment 15 Michal Skrivanek 2016-12-16 15:26:51 UTC
likely will be ready in 4.1.z

Comment 20 Milan Zamazal 2017-02-14 10:19:45 UTC
The Engine part is missing.

Comment 22 Israel Pinto 2017-08-17 10:59:34 UTC
Failed QA,
See BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1482076 and
https://bugzilla.redhat.com/show_bug.cgi?id=1482042

Reasons:
1. Defined memory is not update after each hot unplug of memory device
2. After reboot the VM the memory is restore to the value before unplug memory

Comment 23 Red Hat Bugzilla Rules Engine 2017-08-17 10:59:43 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 24 Michal Skrivanek 2017-09-07 12:33:28 UTC
related bugs are ON_QA, please next time either open those new bugs and close this one, or reopen this one and do not open new bugs

Comment 27 Israel Pinto 2017-10-01 06:09:48 UTC
Tested with:
4.2.0-0.0.master.20170917124606.gita804ef7.el7.centos
Memory hot unplug status:
I opened the following BZ:
1. BZ1496395
[Memory hot unplug] After commit snapshot with memory hot unplug failed since device not found
2. BZ1496366
[Memory hotplug] [UI] The Memory size in edit vm dialog is not updated after failing hotplug the 17th device
3. RFE for REST API BZ1496382
[RFE][REST API] Add support for VM devices under VM resource

Test run summary:
https://polarion.engineering.redhat.com/polarion/#/project/RHEVM3/testrun?id=42-123

Comment 34 Sandro Bonazzola 2018-02-12 10:11:07 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.