Bug 1480668
| Summary: | RFE: Enhance qemu to support freeing memory before exit when using memory-backend-file | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Zack Cornelius <zack.cornelius> | |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | |
| Status: | CLOSED ERRATA | QA Contact: | Luyao Huang <lhuang> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | low | |||
| Version: | 7.5 | CC: | ailan, chayang, chhu, ehabkost, jdenemar, jinqi, jinzhao, jiyan, juzhang, kchamart, knoel, michen, mprivozn, mtessun, plancast, sgordon, virt-maint, xuzhang, yalzhang, yuhuang, zack.cornelius | |
| Target Milestone: | rc | Keywords: | FutureFeature, Upstream | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | libvirt-4.4.0-1.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1460848 | |||
| : | 1578831 (view as bug list) | Environment: | ||
| Last Closed: | 2018-10-30 09:49:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1558125, 1578831 | |||
|
Description
Zack Cornelius
2017-08-11 16:40:08 UTC
Nobody mentioned this in the linked discussion but we have to be cautious about qemu crashing too, i.e. we cannot rely on qemu calling some syscalls just before it exits, because if qemu crashes those syscall wouldn't be called and this feature would not work. Can we please update the partner on the status and release version? They are looking to have this land in RHEL 7.5, but the latest flag is set to RHEL 7.6. I've just pushed patches upstream:
commit 2c4affd57e1746183ff410c023face80866bbe0f
Author: Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 12 17:16:40 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:21 2018 +0200
qemu: Implement memoryBacking/discard
https://bugzilla.redhat.com/show_bug.cgi?id=1480668
QEMU has this new feature memory-backend-file.discard-data=yes
which is a nifty optimization. Basically, when qemu is quitting
or on memory hotplug it calls munmap() and close() on the file
that is backing the memory. However, this does not mean kernel
won't stop touching that part of memory. It still might. With
this feature enabled we tell kernel: "we don't need this memory
nor data stored in it". This makes kernel drop the memory
immediately without trying to sync memory with the mapped file.
Unfortunately, this cannot be turned on by default because we
can't be sure when users really don't care about what happens to
data after qemu dies. So it has to be opt-in. As usual, there are
three places where one can configure memory attributes. This
patch adds the feature to all of them.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit 2300c92fe0d596258ed2f7da1b867d0ce2112f90
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri May 11 15:08:53 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
conf: Introduce memoryBacking/discard
QEMU has possibility to call madvise(.., MADV_REMOVE) in some
cases. Expose this feature to users by new element/attribute
discard.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit 0329075733839a51ff9b61606dc720f726d03fb0
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri May 11 14:48:59 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
conf: Move virDomainMemtune formatting into a separate function
At the same time convert the code to use virXMLFormatElement.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit 72c1770aa0545034d020dd564af12a9c8262d088
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 10:25:49 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
qemu_capabilities: Introduce QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD
This capability tracks if memory-backend-file has discard-data
attribute or not.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit 8a94501e8cb66ccd57babb128af9ed2715f2d412
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 10:31:54 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
qemu_capabilities: Introduce QEMU_CAPS_QOM_LIST_PROPERTIES
This capability tracks if qemu has "qom-list-properties" monitor
command.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit d81cf424cabe08d9af8d6628943dd6e05afbb911
Author: Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 12 15:04:07 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
qemu_monitor: Introduce qemuMonitorGetObjectProps
Now that we've gotten rid of misleading names we can introduce
qemuMonitorGetObjectProps() function which queries -object
properties. Again, some parts of code can be reused.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
commit 036120209b60d6974cd5bb912ee5f890a873cb45
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 20 09:20:36 2018 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon May 14 09:42:20 2018 +0200
qemuMonitorJSONGetDeviceProps: Separate props processing
The code that processes list of device properties is going to be
reused. Therefore put it into a separate function.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Ján Tomko <jtomko>
v4.3.0-169-g2c4affd57e
With libvirt-4.4.0-2.el7.x86_64 and qemu-kvm-rhev-2.12.0-3.el7.x86_64 version.
1. For "file" backing type and "shared" memory, set <discard /> for the backing file memory.
<memoryBacking>
....
<access mode='shared'/>
<source type='file' />
<discard />
</memoryBacking>
2. Start the domain and check the qemu-kvm command :
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3/ram-node0,discard-data=yes,share=yes,size=4280287232
3. Remove the <discard/> setting in <memoryBacking> and add below xml between <devices> and </devices> and restart the vm:
<memory model='dimm' access='shared' discard='yes'>
<target>
<node>0</node>
<size unit='G'>1</size>
</target>
</memory>
4. check the qemu-command and discard-data is "yes":
-object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/25-avocado-vt-vm-numa3/ram-node0,discard-data=yes,share=yes,size=4280287232 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/25-avocado-vt-vm-numa3/dimm0,discard-data=yes,share=yes,size=1073741824 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0
5. Prepare memory1.xml as below:
<memory model='dimm' memAccess='shared' discard='yes'>
<target>
<size unit='m'>500</size>
<node>0</node>
</target>
</memory>
6. With active domain vm1 and hot plug above memory with command:
# virsh attach-device vm1 memory1.xml
Device attached successfully
In /var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3
directory, check the new memory file "dimmX" appeared.
7. # virsh detach-device vm1 memory1.xml
Device detached successfully
In /var/lib/libvirt/qemu/ram/libvirt/qemu/23-avocado-vt-vm-numa3
directory, check the new memory file "dimmX" disappeared.
8. For a numa machine, add below xml and restart the vm: <cell id='1' cpus='2-3' memory='2097152' unit='KiB' memAccess='shared' discard='yes'/> 9. Check the qemu-cmd: -object memory-backend-file,id=memdimm0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/26-avocado-vt-vm-numa,discard-data=yes,share=yes,size=2147483648 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 If there is only <discard/> configed, and no <source type='file'/> and no <access mode='***'/> configed in memorybacking part. Or "discard' is configed in <cell> or memory device and there is no <memoryBacking> configed.
<memoryBacking>
<discard/>
</memoryBacking>
Or
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/>
</numa>
</cpu>
...
<memory model='dimm' discard='yes'>
<target>
<size unit='KiB'>524288</size>
<node>0</node>
</target>
<address type='dimm' slot='0'/>
</memory>
Q1: There is no discard_data='yes' in the qemu command line. Is this as expected?
/usr/libexec/qemu-kvm -name guest=avocado-vt-vm-numa,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-204-avocado-vt-vm-numa/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,+kvm_pv_unhalt -bios /usr/share/seabios/bios.bin -m size=1048576k,slots=16,maxmem=2097152k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=1024 -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 -uuid fef59087-9555-4b54-9198-6c644b7ce078 ...
Q2:
Although we know 'discard' need to work with 'file' & 'shared' together in qemu (Comment 14 in bug 1460848), is it ok to only pass these configed values to qemu for libvirt?
With
<memoryBacking>
<source type='file'>
</memoryBacking>
Or
<memoryBacking>
<access mode='private'>
</memoryBacking>
Then
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/>
</numa>
</cpu>
There is no share=yes and only discard-data=yes in Qemu command, is it as expect?
/usr/libexec/qemu-kvm -name guest=avocado-vt-vm-numa,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-205-avocado-vt-vm-numa/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,+kvm_pv_unhalt -bios /usr/share/seabios/bios.bin -m size=1048576k,slots=16,maxmem=2097152k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-file,id=ram-node0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/205-avocado-vt-vm-numa/ram-node0,discard-data=yes,size=1073741824 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-file,id=memdimm0,mem-path=/var/lib/libvirt/qemu/ram/libvirt/qemu/205-avocado-vt-vm-numa/dimm0,discard-data=yes,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 -uuid fef59087-9555-4b54-9198-6c644b7ce078 -no-user-config ....
(In reply to Jing Qi from comment #10) > If there is only <discard/> configed, and no <source type='file'/> and no > <access mode='***'/> configed in memorybacking part. Or "discard' is > configed in <cell> or memory device and there is no <memoryBacking> configed. > > <memoryBacking> > <discard/> > </memoryBacking> > > Or > <cpu> > <numa> > <cell id='0' cpus='0-1' memory='1000000' unit='KiB' discard='yes'/> > </numa> > </cpu> > ... > <memory model='dimm' discard='yes'> > <target> > <size unit='KiB'>524288</size> > <node>0</node> > </target> > <address type='dimm' slot='0'/> > </memory> > > Q1: There is no discard_data='yes' in the qemu command line. Is this as > expected? Yes. The ability to use discard depends on chosen memory backend. The decision process there is so complicated and we get it wrong so often that erroring out in case we can't satisfy discard='yes' would only confuse people. Also, it's not harmful if we don't discard the data on domain exit. > > Q2: > > Although we know 'discard' need to work with 'file' & 'shared' together in > qemu (Comment 14 in bug 1460848), is it ok to only pass these configed > values to qemu for libvirt? Sure, discard has nothing to do with shared or private or source type. Discard is an optimization that for memory-backend-file qemu/kernel will not try to sync the mmapped file when quiting qemu. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3113 |