+++ This bug was initially created as a clone of Bug #1480668 +++ This is a clone of Bug #1460848, to include in libvirt the ability to specify in Domain XML the new flag or option introduced in Bug #1460848 +++ This bug was initially created as a clone of Bug #1460848 +++ Description of problem: This is a request to enhance qemu to support optionally freeing memory provided by memory-backend-file prior to exit. Currently, when using memory via memory-backend-file, qemu does not free the memory at exit, leaving the guest's memory on disk after exit and forcing any dirty pages to be written to the backing store. In some situations, it may be advantageous to clear the memory from the backing store before exit, and prevent the memory from being flushed to the backing store at exit time. I believe this can be implemented as an additional flag for memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to exiting. --- Additional comment from Eduardo Habkost on 2017-06-13 14:16:07 EDT --- (In reply to Zack Cornelius from comment #0) > Description of problem: > > This is a request to enhance qemu to support optionally freeing memory > provided by memory-backend-file prior to exit. > > Currently, when using memory via memory-backend-file, qemu does not free the > memory at exit, leaving the guest's memory on disk after exit and forcing > any dirty pages to be written to the backing store. > > In some situations, it may be advantageous to clear the memory from the > backing store before exit, and prevent the memory from being flushed to the > backing store at exit time. > > I believe this can be implemented as an additional flag for > memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to > exiting. We might have a mechanism that already has a similar effect in QEMU: the "share=on|off" option in memory-backend-file. This enables the MAP_PRIVATE mmap() flag, and it is supposed ensure the kernel don't try to keep page contents even if QEMU is terminated before calling madvise(MADV_REMOVE). I believe the "share" option is already enabled by default. Would mmap(..., MAP_PRIVATE) have the the desired effect on your use case? --- Additional comment from Zack Cornelius on 2017-06-13 14:58:52 EDT --- In our use case, we need the memory to be backed by the file while the guest is running, but do not need or want the data to stay around after the guest has exited. Because of this, we'll need the MAP_SHARED flag to ensure the file is used as the backing for the memory. --- Additional comment from Eduardo Habkost on 2017-06-14 16:59:21 EDT --- Series submitted to qemu-devel: Subject: [PATCH 0/5] hostmem-file: Add "persistent" option Date: Wed, 14 Jun 2017 17:29:55 -0300 Message-Id: <20170614203000.19984-1-ehabkost> --- Additional comment from Eduardo Habkost on 2017-07-13 11:27:35 EDT --- Implementing this unfortunately would take more effort than expected: we don't have a working mechanism to ensure memory region data is freed before QEMU quits, so the machine state initialization/finalization code would need to be refactored first. More info at the upstream discussion thread: https://www.mail-archive.com/qemu-devel@nongnu.org/msg462900.html --- Additional comment from Michal Privoznik on 2017-10-14 14:27:58 CEST --- Nobody mentioned this in the linked discussion but we have to be cautious about qemu crashing too, i.e. we cannot rely on qemu calling some syscalls just before it exits, because if qemu crashes those syscall wouldn't be called and this feature would not work. --- Additional comment from Michal Privoznik on 2018-05-14 09:48:39 CEST --- I've just pushed patches upstream: commit 2c4affd57e1746183ff410c023face80866bbe0f Author: Michal Privoznik <mprivozn> AuthorDate: Thu Apr 12 17:16:40 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:21 2018 +0200 qemu: Implement memoryBacking/discard https://bugzilla.redhat.com/show_bug.cgi?id=1480668 QEMU has this new feature memory-backend-file.discard-data=yes which is a nifty optimization. Basically, when qemu is quitting or on memory hotplug it calls munmap() and close() on the file that is backing the memory. However, this does not mean kernel won't stop touching that part of memory. It still might. With this feature enabled we tell kernel: "we don't need this memory nor data stored in it". This makes kernel drop the memory immediately without trying to sync memory with the mapped file. Unfortunately, this cannot be turned on by default because we can't be sure when users really don't care about what happens to data after qemu dies. So it has to be opt-in. As usual, there are three places where one can configure memory attributes. This patch adds the feature to all of them. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 2300c92fe0d596258ed2f7da1b867d0ce2112f90 Author: Michal Privoznik <mprivozn> AuthorDate: Fri May 11 15:08:53 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 conf: Introduce memoryBacking/discard QEMU has possibility to call madvise(.., MADV_REMOVE) in some cases. Expose this feature to users by new element/attribute discard. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 0329075733839a51ff9b61606dc720f726d03fb0 Author: Michal Privoznik <mprivozn> AuthorDate: Fri May 11 14:48:59 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 conf: Move virDomainMemtune formatting into a separate function At the same time convert the code to use virXMLFormatElement. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 72c1770aa0545034d020dd564af12a9c8262d088 Author: Michal Privoznik <mprivozn> AuthorDate: Fri Apr 20 10:25:49 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 qemu_capabilities: Introduce QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD This capability tracks if memory-backend-file has discard-data attribute or not. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 8a94501e8cb66ccd57babb128af9ed2715f2d412 Author: Michal Privoznik <mprivozn> AuthorDate: Fri Apr 20 10:31:54 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 qemu_capabilities: Introduce QEMU_CAPS_QOM_LIST_PROPERTIES This capability tracks if qemu has "qom-list-properties" monitor command. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit d81cf424cabe08d9af8d6628943dd6e05afbb911 Author: Michal Privoznik <mprivozn> AuthorDate: Thu Apr 12 15:04:07 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 qemu_monitor: Introduce qemuMonitorGetObjectProps Now that we've gotten rid of misleading names we can introduce qemuMonitorGetObjectProps() function which queries -object properties. Again, some parts of code can be reused. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> commit 036120209b60d6974cd5bb912ee5f890a873cb45 Author: Michal Privoznik <mprivozn> AuthorDate: Fri Apr 20 09:20:36 2018 +0200 Commit: Michal Privoznik <mprivozn> CommitDate: Mon May 14 09:42:20 2018 +0200 qemuMonitorJSONGetDeviceProps: Separate props processing The code that processes list of device properties is going to be reused. Therefore put it into a separate function. Signed-off-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> v4.3.0-169-g2c4affd57e
Would Nova have to do anything to enable this? If so, can you say what?
(In reply to Matthew Booth from comment #1) > Would Nova have to do anything to enable this? If so, can you say what? Hi Matt, Near as I see, there is some Nova work to be done here to expose (but Nova shouldn't enable it by default) the just-added libvirt tunable 'discard' for the 'MemoryBacking' element[1]. Broader context: It is related to this in-progress specification: https://review.openstack.org/#/c/563704 -- Libvirt file backed memory FWIW, talking to the libvirt developer Michal Privoznik, he said they have implemented it based on a requirement that came in via OpenStack. What does 'discard' do? From[1]: "When set and supported by hypervisor the memory content is discarded just before guest shuts down (or when DIMM module is unplugged). Please note that this is just an optimization and is not guaranteed to work in all cases (e.g. when hypervisor crashes). Since 4.4.0 (QEMU/KVM only)" In QEMU parlance, libvirt has wired up the following part of the QEMMU command-line "memory-backend-file.discard-data=yes": "The new option can be used to indicate that the file contents can be destroyed and don't need to be flushed to disk when QEMU exits or when the memory backend object is removed." [1] https://libvirt.org/formatdomain.html#elementsMemoryBacking [2] https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04899.html
(In reply to Matthew Booth from comment #1) > Would Nova have to do anything to enable this? If so, can you say what? Kashyap summarized it in comment 2. Thanks!
As we don't support memoryBacking at all yet, this is best handled along with that feature. *** This bug has been marked as a duplicate of bug 1541570 ***
We do support memory backend in Nova to handle hugepages. discard will will be available for libvirt 4.4.0.
We need to understand whether discard attribute is only available for source file or if in future that will be enhanced for all kind of sources. If not we should probably put that BZ dependent of bug 1541570
The blueprint registered on lauchpad. https://blueprints.launchpad.net/nova/+spec/memory-backend-file-discard
From Mel in the upstream blueprint: As discussed in the #openstack-nova IRC channel today: because libvirt 4.4.0 is soon to be released (during the rocky cycle), we can just roll the discard change into the original blueprint libvirt-file-backed-memory once libvirt 4.4.0 is available. Just tag the patch with something like "Part of blueprint libvirt-file-backed-memory" when libvirt 4.4.0 is out. -- melwitt 20180530 *** This bug has been marked as a duplicate of bug 1541570 ***