Bug 1480668 - RFE: Enhance qemu to support freeing memory before exit when using memory-backend-file
Summary: RFE: Enhance qemu to support freeing memory before exit when using memory-bac...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Luyao Huang
URL:
Whiteboard:
Keywords: FutureFeature, Upstream
Depends On:
Blocks: 1558125 1578831
TreeView+ depends on / blocked
 
Reported: 2017-08-11 16:40 UTC by Zack Cornelius
Modified: 2018-10-30 09:52 UTC (History)
21 users (show)

(edit)
Clone Of: 1460848
: 1578831 (view as bug list)
(edit)
Last Closed: 2018-10-30 09:49:58 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:3113 None None None 2018-10-30 09:52 UTC

Description Zack Cornelius 2017-08-11 16:40:08 UTC
This is a clone of Bug #1460848, to include in libvirt the ability to specify in Domain XML the new flag or option introduced in Bug #1460848

+++ This bug was initially created as a clone of Bug #1460848 +++

Description of problem:

This is a request to enhance qemu to support optionally freeing memory provided by memory-backend-file prior to exit.

Currently, when using memory via memory-backend-file, qemu does not free the memory at exit, leaving the guest's memory on disk after exit and forcing any dirty pages to be written to the backing store.

In some situations, it may be advantageous to clear the memory from the backing store before exit, and prevent the memory from being flushed to the backing store at exit time. 

I believe this can be implemented as an additional flag for memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to exiting.

--- Additional comment from Eduardo Habkost on 2017-06-13 14:16:07 EDT ---

(In reply to Zack Cornelius from comment #0)
> Description of problem:
> 
> This is a request to enhance qemu to support optionally freeing memory
> provided by memory-backend-file prior to exit.
> 
> Currently, when using memory via memory-backend-file, qemu does not free the
> memory at exit, leaving the guest's memory on disk after exit and forcing
> any dirty pages to be written to the backing store.
> 
> In some situations, it may be advantageous to clear the memory from the
> backing store before exit, and prevent the memory from being flushed to the
> backing store at exit time. 
> 
> I believe this can be implemented as an additional flag for
> memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to
> exiting.

We might have a mechanism that already has a similar effect in QEMU: the "share=on|off" option in memory-backend-file.  This enables the MAP_PRIVATE mmap() flag, and it is supposed ensure the kernel don't try to keep page contents even if QEMU is terminated before calling madvise(MADV_REMOVE).  I believe the "share" option is already enabled by default.

Would mmap(..., MAP_PRIVATE) have the the desired effect on your use case?

--- Additional comment from Zack Cornelius on 2017-06-13 14:58:52 EDT ---

In our use case, we need the memory to be backed by the file while the guest is running, but do not need or want the data to stay around after the guest has exited. 

Because of this, we'll need the MAP_SHARED flag to ensure the file is used as the backing for the memory.

--- Additional comment from Eduardo Habkost on 2017-06-14 16:59:21 EDT ---

Series submitted to qemu-devel:

Subject: [PATCH 0/5] hostmem-file: Add "persistent" option
Date: Wed, 14 Jun 2017 17:29:55 -0300
Message-Id: <20170614203000.19984-1-ehabkost@redhat.com>

--- Additional comment from Eduardo Habkost on 2017-07-13 11:27:35 EDT ---

Implementing this unfortunately would take more effort than expected: we don't have a working mechanism to ensure memory region data is freed before QEMU quits, so the machine state initialization/finalization code would need to be refactored first.

More info at the upstream discussion thread:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg462900.html

Comment 2 Michal Privoznik 2017-10-14 12:27:58 UTC
Nobody mentioned this in the linked discussion but we have to be cautious about qemu crashing too, i.e. we cannot rely on qemu calling some syscalls just before it exits, because if qemu crashes those syscall wouldn't be called and this feature would not work.

Comment 3 Paul Lancaster 2018-04-06 15:10:17 UTC
Can we please update the partner on the status and release version?  They are looking to have this land in RHEL 7.5, but the latest flag is set to RHEL 7.6.

Comment 6 Michal Privoznik 2018-05-14 07:48:39 UTC
I've just pushed patches upstream:

commit 2c4affd57e1746183ff410c023face80866bbe0f
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Thu Apr 12 17:16:40 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:21 2018 +0200

    qemu: Implement memoryBacking/discard
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1480668
    
    QEMU has this new feature memory-backend-file.discard-data=yes
    which is a nifty optimization. Basically, when qemu is quitting
    or on memory hotplug it calls munmap() and close() on the file
    that is backing the memory. However, this does not mean kernel
    won't stop touching that part of memory. It still might. With
    this feature enabled we tell kernel: "we don't need this memory
    nor data stored in it". This makes kernel drop the memory
    immediately without trying to sync memory with the mapped file.
    
    Unfortunately, this cannot be turned on by default because we
    can't be sure when users really don't care about what happens to
    data after qemu dies. So it has to be opt-in. As usual, there are
    three places where one can configure memory attributes. This
    patch adds the feature to all of them.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit 2300c92fe0d596258ed2f7da1b867d0ce2112f90
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri May 11 15:08:53 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    conf: Introduce memoryBacking/discard
    
    QEMU has possibility to call madvise(.., MADV_REMOVE) in some
    cases. Expose this feature to users by new element/attribute
    discard.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit 0329075733839a51ff9b61606dc720f726d03fb0
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri May 11 14:48:59 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    conf: Move virDomainMemtune formatting into a separate function
    
    At the same time convert the code to use virXMLFormatElement.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit 72c1770aa0545034d020dd564af12a9c8262d088
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri Apr 20 10:25:49 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_capabilities: Introduce QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD
    
    This capability tracks if memory-backend-file has discard-data
    attribute or not.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit 8a94501e8cb66ccd57babb128af9ed2715f2d412
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri Apr 20 10:31:54 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_capabilities: Introduce QEMU_CAPS_QOM_LIST_PROPERTIES
    
    This capability tracks if qemu has "qom-list-properties" monitor
    command.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit d81cf424cabe08d9af8d6628943dd6e05afbb911
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Thu Apr 12 15:04:07 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemu_monitor: Introduce qemuMonitorGetObjectProps
    
    Now that we've gotten rid of misleading names we can introduce
    qemuMonitorGetObjectProps() function which queries -object
    properties. Again, some parts of code can be reused.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

commit 036120209b60d6974cd5bb912ee5f890a873cb45
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Fri Apr 20 09:20:36 2018 +0200
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Mon May 14 09:42:20 2018 +0200

    qemuMonitorJSONGetDeviceProps: Separate props processing
    
    The code that processes list of device properties is going to be
    reused. Therefore put it into a separate function.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
    Reviewed-by: Ján Tomko <jtomko@redhat.com>

v4.3.0-169-g2c4affd57e

Comment 8 Jing Qi 2018-06-15 03:39:51 UTC
With libvirt-4.4.0-2.el7.x86_64 and qemu-kvm-rhev-2.12.0-3.el7.x86_64 version.

1. For "file" backing type and "shared" memory, set <discard /> for the backing file memory.
    <memoryBacking>
    ....
    <access mode='shared'/>
     <source type='file' />
     <discard />
    </memoryBacking>
2. Start the domain and check the qemu-kvm command : 
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3/ram-node0,discard-data=yes,share=yes,size=4280287232 

3. Remove  the <discard/> setting in <memoryBacking> and add below xml between <devices> and </devices> and restart the vm:
   <memory model='dimm' access='shared' discard='yes'>
      <target>
        <node>0</node>
        <size unit='G'>1</size>
      </target>
    </memory>
4. check the qemu-command and discard-data is "yes":

-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/25-avocado-vt-vm-numa3/ram-node0,discard-data=yes,share=yes,size=4280287232 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/25-avocado-vt-vm-numa3/dimm0,discard-data=yes,share=yes,size=1073741824 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0


5. Prepare memory1.xml as below:  
<memory model='dimm' memAccess='shared' discard='yes'>
<target>
<size unit='m'>500</size>
<node>0</node>
</target>
</memory>

6. With active domain vm1 and hot plug above memory with command:
# virsh attach-device vm1 memory1.xml
Device attached successfully 

In /var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3
directory, check the new memory file "dimmX" appeared.
7. # virsh detach-device vm1 memory1.xml
Device detached successfully 

In /var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3
directory, check the new memory file "dimmX" disappeared.

Comment 9 Jing Qi 2018-06-15 08:47:33 UTC
8. For a numa machine, add below xml and restart the vm:
<cell id='1' cpus='2-3' memory='2097152' unit='KiB' memAccess='shared' discard='yes'/>	

9. Check the qemu-cmd: 
-object memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/26-avocado-vt-vm-numa,discard-data=yes,share=yes,size=2147483648 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0

Comment 10 Jing Qi 2018-08-29 03:33:24 UTC
If there is only <discard/> configed, and no <source type='file'/> and no <access mode='***'/> configed in memorybacking part. Or "discard' is configed in <cell> or memory device and there is no <memoryBacking> configed.

<memoryBacking>
     <discard/>
  </memoryBacking>

Or
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/>
    </numa>
  </cpu>
  ...
   <memory model='dimm' discard='yes'>
      <target>
        <size unit='KiB'>524288</size>
        <node>0</node>
      </target>
      <address type='dimm' slot='0'/>
    </memory>

Q1: There is no discard_data='yes' in the qemu command line. Is this as expected?

/usr/libexec/qemu-kvm -name guest=avocado-vt-vm-numa,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-204-avocado-vt-vm-numa/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,+kvm_pv_unhalt -bios /usr/share/seabios/bios.bin -m size=1048576k,slots=16,maxmem=2097152k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=1024 -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 -uuid fef59087-9555-4b54-9198-6c644b7ce078 ...


Q2: 

Although we know 'discard' need to work with 'file' & 'shared' together in qemu  (Comment 14 in bug 1460848), is it ok to only pass these configed values to qemu for libvirt? 

With
<memoryBacking>
     <source type='file'>
  </memoryBacking>

Or
<memoryBacking>
     <access mode='private'>
  </memoryBacking>

Then

<cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/>
    </numa>
  </cpu>
 
 There is no share=yes and only discard-data=yes in Qemu command, is it as expect?

/usr/libexec/qemu-kvm -name guest=avocado-vt-vm-numa,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-205-avocado-vt-vm-numa/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,+kvm_pv_unhalt -bios /usr/share/seabios/bios.bin -m size=1048576k,slots=16,maxmem=2097152k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/205-avocado-vt-vm-numa/ram-node0,discard-data=yes,size=1073741824 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/205-avocado-vt-vm-numa/dimm0,discard-data=yes,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 -uuid fef59087-9555-4b54-9198-6c644b7ce078 -no-user-config ....

Comment 11 Michal Privoznik 2018-08-29 14:54:03 UTC
(In reply to Jing Qi from comment #10)
> If there is only <discard/> configed, and no <source type='file'/> and no
> <access mode='***'/> configed in memorybacking part. Or "discard' is
> configed in <cell> or memory device and there is no <memoryBacking> configed.
> 
> <memoryBacking>
>      <discard/>
>   </memoryBacking>
> 
> Or
>   <cpu>
>     <numa>
>       <cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/>
>     </numa>
>   </cpu>
>   ...
>    <memory model='dimm' discard='yes'>
>       <target>
>         <size unit='KiB'>524288</size>
>         <node>0</node>
>       </target>
>       <address type='dimm' slot='0'/>
>     </memory>
> 
> Q1: There is no discard_data='yes' in the qemu command line. Is this as
> expected?

Yes. The ability to use discard depends on chosen memory backend. The decision process there is so complicated and we get it wrong so often that erroring out in case we can't satisfy discard='yes' would only confuse people. Also, it's not harmful if we don't discard the data on domain exit.

> 
> Q2: 
> 
> Although we know 'discard' need to work with 'file' & 'shared' together in
> qemu  (Comment 14 in bug 1460848), is it ok to only pass these configed
> values to qemu for libvirt? 

Sure, discard has nothing to do with shared or private or source type. Discard is an optimization that for memory-backend-file qemu/kernel will not try to sync the mmapped file when quiting qemu.

Comment 13 errata-xmlrpc 2018-10-30 09:49:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113


Note You need to log in before you can comment on or make changes to this bug.