Bug 1460848 - RFE: Enhance qemu to support freeing memory before exit when using memory-backend-file
Summary: RFE: Enhance qemu to support freeing memory before exit when using memory-bac...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Yumei Huang
URL:
Whiteboard:
Keywords: FutureFeature
Depends On:
Blocks: 1461214
TreeView+ depends on / blocked
 
Reported: 2017-06-13 00:23 UTC by Zack Cornelius
Modified: 2018-04-18 18:11 UTC (History)
14 users (show)

(edit)
Clone Of:
: 1480668 (view as bug list)
(edit)
Last Closed: 2018-04-11 00:26:27 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 None None None 2018-04-11 00:28 UTC

Description Zack Cornelius 2017-06-13 00:23:53 UTC
Description of problem:

This is a request to enhance qemu to support optionally freeing memory provided by memory-backend-file prior to exit.

Currently, when using memory via memory-backend-file, qemu does not free the memory at exit, leaving the guest's memory on disk after exit and forcing any dirty pages to be written to the backing store.

In some situations, it may be advantageous to clear the memory from the backing store before exit, and prevent the memory from being flushed to the backing store at exit time. 

I believe this can be implemented as an additional flag for memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to exiting.

Comment 3 Eduardo Habkost 2017-06-13 18:16:07 UTC
(In reply to Zack Cornelius from comment #0)
> Description of problem:
> 
> This is a request to enhance qemu to support optionally freeing memory
> provided by memory-backend-file prior to exit.
> 
> Currently, when using memory via memory-backend-file, qemu does not free the
> memory at exit, leaving the guest's memory on disk after exit and forcing
> any dirty pages to be written to the backing store.
> 
> In some situations, it may be advantageous to clear the memory from the
> backing store before exit, and prevent the memory from being flushed to the
> backing store at exit time. 
> 
> I believe this can be implemented as an additional flag for
> memory-backend-file which calls madvise(MADV_REMOVE) on the memory prior to
> exiting.

We might have a mechanism that already has a similar effect in QEMU: the "share=on|off" option in memory-backend-file.  This enables the MAP_PRIVATE mmap() flag, and it is supposed ensure the kernel don't try to keep page contents even if QEMU is terminated before calling madvise(MADV_REMOVE).  I believe the "share" option is already enabled by default.

Would mmap(..., MAP_PRIVATE) have the the desired effect on your use case?

Comment 4 Zack Cornelius 2017-06-13 18:58:52 UTC
In our use case, we need the memory to be backed by the file while the guest is running, but do not need or want the data to stay around after the guest has exited. 

Because of this, we'll need the MAP_SHARED flag to ensure the file is used as the backing for the memory.

Comment 5 Eduardo Habkost 2017-06-14 20:59:21 UTC
Series submitted to qemu-devel:

Subject: [PATCH 0/5] hostmem-file: Add "persistent" option
Date: Wed, 14 Jun 2017 17:29:55 -0300
Message-Id: <20170614203000.19984-1-ehabkost@redhat.com>

Comment 6 Eduardo Habkost 2017-07-13 15:27:35 UTC
Implementing this unfortunately would take more effort than expected: we don't have a working mechanism to ensure memory region data is freed before QEMU quits, so the machine state initialization/finalization code would need to be refactored first.

More info at the upstream discussion thread:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg462900.html

Comment 7 Eduardo Habkost 2017-08-11 19:22:44 UTC
Upstream discussion about unlink()+close() vs madvise() continued at:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg473276.html

Comment 8 Eduardo Habkost 2017-08-24 19:24:28 UTC
New version submitted upstream:

From: Eduardo Habkost <ehabkost@redhat.com>
To: qemu-devel@nongnu.org
Subject: [PATCH v2 0/3] hostmem-file: Add "discard-data" option
Date: Thu, 24 Aug 2017 16:23:12 -0300
Message-Id: <20170824192315.5897-1-ehabkost@redhat.com>

Comment 9 Eduardo Habkost 2017-09-28 20:56:39 UTC
Merged upstream:

commit 11ae6ed8affdd131e735bac39b21e7d3cde66f7b
Author: Eduardo Habkost <ehabkost@redhat.com>
Date:   Thu Aug 24 16:23:15 2017 -0300

    hostmem-file: Add "discard-data" option
    
    The new option can be used to indicate that the file contents can
    be destroyed and don't need to be flushed to disk when QEMU exits
    or when the memory backend object is removed.
    
    Internally, it will trigger a madvise(MADV_REMOVE) call when the
    memory backend is removed.
    
    Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
    Message-Id: <20170824192315.5897-4-ehabkost@redhat.com>
    [ehabkost: fixup: improved documentation]
    Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
    Tested-by: Zack Cornelius <zack.cornelius@kove.net>
    Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

Comment 11 Miroslav Rezanina 2017-10-20 09:32:28 UTC
Fix included in qemu-kvm-rhev-2.10.0-3.el7

Comment 13 Yumei Huang 2017-11-15 11:20:00 UTC
Hi Eduardo,
Could you please help give some instructions to verify this bz? 

QE tried the following way but it seemed no difference if "discard-data" option is on or off. 

Details:

1. Create a file in host as backend 
# echo "123456" > test2.file
# truncate -s 256M test2.file
# hexdump -C test2.file 
00000000  31 32 33 34 35 36 0a 00  00 00 00 00 00 00 00 00  |123456..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
10000000

2. Boot a guest

3. Hotplug memory backed by the file created in step 1, with discard-data=on/off
(qemu)object_add memory-backend-file,mem-path=test2.file,id=mem1,size=256M,discard-data=off/on
(qemu) device_add pc-dimm,memdev=mem1,id=dimm1

4. After a while, unplug the memory 
(qemu) device_del dimm1
(qemu) object_del mem1

5. Shutdown guest, and check the content of test2.file
# hexdump -C test2.file 
00000000  31 32 33 34 35 36 0a 00  00 00 00 00 00 00 00 00  |123456..........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
10000000

It turns out that whenever option "discard-data" is on or off, the content of test2.file is same.  Could you please help check above steps? 

Thanks!
Yumei Huang

Comment 14 Eduardo Habkost 2017-11-15 16:39:32 UTC
Hi,

QEMU won't touch the backing file contents if using a private mapping, so discard-data has a visible effect only if using share=on too.

Example:

$ dd if=/dev/urandom of=/tmp/testfile bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB, 256 MiB) copied, 1,43591 s, 187 MB/s
$ hexdump -C /tmp/testfile -n 32
00000000  73 84 e6 e6 19 28 a4 79  a8 1d e6 24 ad fe 5a 2c  |s....(.y...$..Z,|
00000010  7d 92 cf 5f ea b4 3b b0  a3 fa 0d 13 c5 8a d4 09  |}.._..;.........|
00000020
$ ./x86_64-softmmu/qemu-system-x86_64 -object memory-backend-file,share=on,discard-data=on,mem-path=/tmp/testfile,id=mem0,size=256M -monitor stdio -display none
QEMU 2.10.91 monitor - type 'help' for more information
(qemu) quit
$ hexdump -C /tmp/testfile
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
10000000
$

Comment 15 Yumei Huang 2017-11-20 07:53:43 UTC
Thanks Eduardo!

Verify:
qemu-kvm-rhev-2.10.0-6.el7
kernel-3.10.0-765.el7.x86_64

Steps:

1. Create testfile with random data,

# dd if=/dev/urandom of=/tmp/testfile bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 1.53172 s, 175 MB/s

# hexdump -C /tmp/testfile -n 32
00000000  75 65 0e 9b 58 6e 79 1e  83 50 49 9e 1f 63 89 b0  |ue..Xny..PI..c..|
00000010  bd 6e 0a 58 54 dc 3f 2c  6a 5c dd 45 bf dc 4b 17  |.n.XT.?,j\.E..K.|
00000020

2. Boot guest with "share=on,discard-data=off",

# /usr/libexec/qemu-kvm -object memory-backend-file,share=on,discard-data=off,mem-path=/tmp/testfile,id=mem0,size=256M -monitor stdio

3. Quit qemu and check content of testfile,

(qemu) quit

# hexdump -C /tmp/testfile -n 32
00000000  75 65 0e 9b 58 6e 79 1e  83 50 49 9e 1f 63 89 b0  |ue..Xny..PI..c..|
00000010  bd 6e 0a 58 54 dc 3f 2c  6a 5c dd 45 bf dc 4b 17  |.n.XT.?,j\.E..K.|
00000020

The content of test file is same as before.

Repeat step2&3 with "share=on,discard-data=on", recheck the content of testfile,

# hexdump -C /tmp/testfile -n 3200000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000020

The content of test file has changed as expected.

Comment 17 Yumei Huang 2017-11-20 07:59:21 UTC
Correct comment 15 the last step:

Repeat step2&3 with "share=on,discard-data=on", recheck the content of testfile,

# hexdump -C /tmp/testfile -n 32
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000020

Comment 18 Min Deng 2018-01-31 08:01:15 UTC
Verified it on P8 and P9 host.
P8 build information
kernel-3.10.0-842.el7.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le

P9 build information
kernel-4.14.0-34.el7a.ppc64le
qemu-kvm-rhev-2.10.0-18.el7.ppc64le
Steps please refer to comment15.

Actual results,
Boot guest with "share=on,discard-data=off",the content didn't change.

00000000  3c e7 fb 16 20 94 d0 b6  d6 34 dd b7 22 96 78 24  |<... ....4..".x$|
00000010  a7 1d db 8e 14 fa bd df  6a af c2 a5 62 31 9f c4  |........j...b1..|
00000020

Boot guest with "share=on,discard-data=on" ,the file changed as expected.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000020

Expected results,

Boot guest with "share=on,discard-data=off",the content didn't change.

00000000  3c e7 fb 16 20 94 d0 b6  d6 34 dd b7 22 96 78 24  |<... ....4..".x$|
00000010  a7 1d db 8e 14 fa bd df  6a af c2 a5 62 31 9f c4  |........j...b1..|
00000020

Boot guest with "share=on,discard-data=on" ,the file changed as expected.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000020

 The above scenario also passed on ppc.Thanks

Comment 20 errata-xmlrpc 2018-04-11 00:26:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.