Bug 1578831 - RFE: Enhance qemu to support freeing memory before exit when using memory-backend-file
Summary: RFE: Enhance qemu to support freeing memory before exit when using memory-bac...
Keywords:
Status: CLOSED DUPLICATE of bug 1541570
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Sahid Ferdjaoui
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On: 1480668 1541570 1594272 1795933
Blocks: 1558125
TreeView+ depends on / blocked
 
Reported: 2018-05-16 12:52 UTC by Jaroslav Suchanek
Modified: 2023-03-21 18:50 UTC (History)
28 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1480668
Environment:
Last Closed: 2018-05-31 10:59:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 563704 0 None None None 2018-05-18 14:11:24 UTC

Description Jaroslav Suchanek 2018-05-16 12:52:40 UTC
+++ This bug was initially created as a clone of Bug #1480668 +++

This is a clone of Bug #1460848, to include in libvirt the ability to specify in Domain XML the new flag or option introduced in Bug #1460848

+++ This bug was initially created as a clone of Bug #1460848 +++

Description of problem:

This is a request to enhance qemu to support optionally freeing memory provided by memory-backend-file prior to exit.

Currently, when using memory via memory-backend-file, qemu does not free the memory at exit, leaving the guest's memory on disk after exit and forcing any dirty pages to be written to the backing store.

In some situations, it may be advantageous to clear the memory from the backing store before exit, and prevent the memory from being flushed to the backing store at exit time. 

I believe this can be implemented as an additional flag for memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to exiting.

--- Additional comment from Eduardo Habkost on 2017-06-13 14:16:07 EDT ---

(In reply to Zack Cornelius from comment #0)
> Description of problem:
> 
> This is a request to enhance qemu to support optionally freeing memory
> provided by memory-backend-file prior to exit.
> 
> Currently, when using memory via memory-backend-file, qemu does not free the
> memory at exit, leaving the guest's memory on disk after exit and forcing
> any dirty pages to be written to the backing store.
> 
> In some situations, it may be advantageous to clear the memory from the
> backing store before exit, and prevent the memory from being flushed to the
> backing store at exit time. 
> 
> I believe this can be implemented as an additional flag for
> memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to
> exiting.

We might have a mechanism that already has a similar effect in QEMU: the "share=on|off" option in memory-backend-file.  This enables the MAP_PRIVATE mmap() flag, and it is supposed ensure the kernel don't try to keep page contents even if QEMU is terminated before calling madvise(MADV_REMOVE).  I believe the "share" option is already enabled by default.

Would mmap(..., MAP_PRIVATE) have the the desired effect on your use case?

--- Additional comment from Zack Cornelius on 2017-06-13 14:58:52 EDT ---

In our use case, we need the memory to be backed by the file while the guest is running, but do not need or want the data to stay around after the guest has exited. 

Because of this, we'll need the MAP_SHARED flag to ensure the file is used as the backing for the memory.

--- Additional comment from Eduardo Habkost on 2017-06-14 16:59:21 EDT ---

Series submitted to qemu-devel:

Subject: [PATCH 0/5] hostmem-file: Add "persistent" option
Date: Wed, 14 Jun 2017 17:29:55 -0300
Message-Id: <20170614203000.19984-1-ehabkost>

--- Additional comment from Eduardo Habkost on 2017-07-13 11:27:35 EDT ---

Implementing this unfortunately would take more effort than expected: we don't have a working mechanism to ensure memory region data is freed before QEMU quits, so the machine state initialization/finalization code would need to be refactored first.

More info at the upstream discussion thread:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg462900.html

--- Additional comment from Michal Privoznik on 2017-10-14 14:27:58 CEST ---

Nobody mentioned this in the linked discussion but we have to be cautious about qemu crashing too, i.e. we cannot rely on qemu calling some syscalls just before it exits, because if qemu crashes those syscall wouldn't be called and this feature would not work.


--- Additional comment from Michal Privoznik on 2018-05-14 09:48:39 CEST ---

I've just pushed patches upstream:

commit 2c4affd57e1746183ff410c023face80866bbe0f
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 12 17:16:40 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:21 2018 +0200

    qemu: Implement memoryBacking/discard
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1480668
    
    QEMU has this new feature memory-backend-file.discard-data=yes
    which is a nifty optimization. Basically, when qemu is quitting
    or on memory hotplug it calls munmap() and close() on the file
    that is backing the memory. However, this does not mean kernel
    won't stop touching that part of memory. It still might. With
    this feature enabled we tell kernel: "we don't need this memory
    nor data stored in it". This makes kernel drop the memory
    immediately without trying to sync memory with the mapped file.
    
    Unfortunately, this cannot be turned on by default because we
    can't be sure when users really don't care about what happens to
    data after qemu dies. So it has to be opt-in. As usual, there are
    three places where one can configure memory attributes. This
    patch adds the feature to all of them.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit 2300c92fe0d596258ed2f7da1b867d0ce2112f90
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri May 11 15:08:53 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    conf: Introduce memoryBacking/discard
    
    QEMU has possibility to call madvise(.., MADV_REMOVE) in some
    cases. Expose this feature to users by new element/attribute
    discard.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit 0329075733839a51ff9b61606dc720f726d03fb0
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri May 11 14:48:59 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    conf: Move virDomainMemtune formatting into a separate function
    
    At the same time convert the code to use virXMLFormatElement.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit 72c1770aa0545034d020dd564af12a9c8262d088
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 10:25:49 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_capabilities: Introduce QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD
    
    This capability tracks if memory-backend-file has discard-data
    attribute or not.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit 8a94501e8cb66ccd57babb128af9ed2715f2d412
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 10:31:54 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_capabilities: Introduce QEMU_CAPS_QOM_LIST_PROPERTIES
    
    This capability tracks if qemu has "qom-list-properties" monitor
    command.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit d81cf424cabe08d9af8d6628943dd6e05afbb911
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 12 15:04:07 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_monitor: Introduce qemuMonitorGetObjectProps
    
    Now that we've gotten rid of misleading names we can introduce
    qemuMonitorGetObjectProps() function which queries -object
    properties. Again, some parts of code can be reused.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

commit 036120209b60d6974cd5bb912ee5f890a873cb45
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 09:20:36 2018 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemuMonitorJSONGetDeviceProps: Separate props processing
    
    The code that processes list of device properties is going to be
    reused. Therefore put it into a separate function.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Ján Tomko <jtomko>

v4.3.0-169-g2c4affd57e

Comment 1 Matthew Booth 2018-05-17 15:22:55 UTC
Would Nova have to do anything to enable this? If so, can you say what?

Comment 2 Kashyap Chamarthy 2018-05-18 09:34:37 UTC
(In reply to Matthew Booth from comment #1)
> Would Nova have to do anything to enable this? If so, can you say what?

Hi Matt,

Near as I see, there is some Nova work to be done here to expose (but 
Nova shouldn't enable it by default) the just-added libvirt tunable
'discard' for the 'MemoryBacking' element[1].

Broader context: It is related to this in-progress specification:

    https://review.openstack.org/#/c/563704 -- Libvirt file backed memory

FWIW, talking to the libvirt developer Michal Privoznik, he said they
have implemented it based on a requirement that came in via OpenStack.

What does 'discard' do? From[1]:

    "When set and supported by hypervisor the memory content is discarded
    just before guest shuts down (or when DIMM module is unplugged). 
    Please note that this is just an optimization and is not guaranteed
    to work in all cases (e.g. when hypervisor crashes). Since 4.4.0
    (QEMU/KVM only)"

In QEMU parlance, libvirt has wired up the following part of the QEMMU
command-line "memory-backend-file.discard-data=yes":

    "The new option can be used to indicate that the file contents can
    be destroyed and don't need to be flushed to disk when QEMU exits
    or when the memory backend object is removed."

[1] https://libvirt.org/formatdomain.html#elementsMemoryBacking
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04899.html

Comment 3 Jaroslav Suchanek 2018-05-24 12:14:29 UTC
(In reply to Matthew Booth from comment #1)
> Would Nova have to do anything to enable this? If so, can you say what?

Kashyap summarized it in comment 2. Thanks!

Comment 4 Matthew Booth 2018-05-25 09:36:10 UTC
As we don't support memoryBacking at all yet, this is best handled along with that feature.

*** This bug has been marked as a duplicate of bug 1541570 ***

Comment 5 Sahid Ferdjaoui 2018-05-25 09:46:07 UTC
We do support memory backend in Nova to handle hugepages. discard will will be available for libvirt 4.4.0.

Comment 6 Sahid Ferdjaoui 2018-05-25 11:54:01 UTC
We need to understand whether discard attribute is only available for source file or if in future that will be enhanced for all kind of sources.

If not we should probably put that BZ dependent of bug 1541570

Comment 7 Sahid Ferdjaoui 2018-05-25 12:18:41 UTC
The blueprint registered on lauchpad.

  https://blueprints.launchpad.net/nova/+spec/memory-backend-file-discard

Comment 8 Artom Lifshitz 2018-05-31 10:59:07 UTC
From Mel in the upstream blueprint:

As discussed in the #openstack-nova IRC channel today: because libvirt 4.4.0 is soon to be released (during the rocky cycle), we can just roll the discard change into the original blueprint libvirt-file-backed-memory once libvirt 4.4.0 is available. Just tag the patch with something like "Part of blueprint libvirt-file-backed-memory" when libvirt 4.4.0 is out. -- melwitt 20180530

*** This bug has been marked as a duplicate of bug 1541570 ***


Note You need to log in before you can comment on or make changes to this bug.