1373604 – Enhance live migration post-copy to support file-backed memory (e.g. 2M hugepages)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1373604 - Enhance live migration post-copy to support file-backed memory (e.g. 2M hugepages)

Summary: Enhance live migration post-copy to support file-backed memory (e.g. 2M hugep...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	7.4
Assignee:	Dr. David Alan Gilbert
QA Contact:	xianwang
Docs Contact:
URL:
Whiteboard:
Depends On:	1373606 1430172 1430174
Blocks:	RHV4.1PPC 1385707 1411879
TreeView+	depends on / blocked

Reported:	2016-09-06 17:15 UTC by Hai Huang
Modified:	2017-08-02 03:29 UTC (History)
CC List:	16 users (show)
Fixed In Version:	qemu-kvm-rhev-2.9.0-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1373606 (view as bug list)
Environment:
Last Closed:	2017-08-01 23:34:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:2392	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2017-08-01 20:04:36 UTC

Description Hai Huang 2016-09-06 17:15:01 UTC

Description of problem:
Live migration post-copy (as of upstream qemu 2.7) does not support file-backed
memory, including memory backed by /dev/hugepages.

In the NFV/DPDK use cases, 1G hugepage is generally required for DKDK 
applications (as documented in http://people.redhat.com/~pmatilai/dpdk-guide/setup/hugepages.html).

This feature BZ is to enhance live migration to support the 1G (and 2M) 
hugepages.


Version-Release number of selected component (if applicable):
7.4

How reproducible:
100%

Steps to Reproduce:
1. Configure 1G hugepages on host
2. Create a guest VM with 1G hugepages
3. Run guest VM and a memory stress app (such as google stress)
4. Live migrate the guest VM with post-copy

Actual results:
No post-copy live migration support

Expected results:
Successful live migration

Additional info:
N/A

Comment 1 Dr. David Alan Gilbert 2016-09-07 09:25:07 UTC

Hai: Do we know much about what lives in the big 1G huge pages? Are they mostly static or are they changing reasonably frequently?

Comment 3 Dr. David Alan Gilbert 2017-01-06 18:43:40 UTC

First version of qemu posted to qemu-devel; note this supports 2MB hugepages but not 1GB. 

1GB support is not currently reasonable with the technique we're using in the kernel.

Comment 4 Dr. David Alan Gilbert 2017-03-02 19:40:42 UTC

Fixed in 2.9
Merged as part of 251501a3714096f807778f6d3f03711dcdb9ce29

      postcopy: Add extra check for COPY function
      postcopy: Add doc about hugepages and postcopy
      postcopy: Check for userfault+hugepage feature
      postcopy: Update userfaultfd.h header
      postcopy: Allow hugepages
      postcopy: Send whole huge pages
      postcopy: Mask fault addresses to huge page boundary
      postcopy: Load huge pages in one go
      postcopy: Use temporary for placing zero huge pages
      postcopy: Plumb pagesize down into place helpers
      postcopy: Record largest page size
      postcopy: enhance ram_block_discard_range for hugepages
      exec: ram_block_discard_range
      postcopy: Chunk discards for hugepages
      postcopy: Transmit and compare individual page sizes
      postcopy: Transmit ram size summary word

Comment 10 xianwang 2017-04-28 03:36:01 UTC

This bug is fixed for qemu-kvm-rhev-2.9.0-1.el7
Bug verify:
Version:
3.10.0-657.el7.x86_64
qemu-kvm-rhev-2.9.0-1.el7.x86_64
seabios-bin-1.10.2-2.el7.noarch

Steps:
1)Configure hugepage in src host and dst host
# echo 2048 > /proc/sys/vm/nr_hugepages
# cat /proc/meminfo | grep -i hugepage
AnonHugePages:     14336 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
2)Boot a guest on src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.4.0 \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -device usb-ehci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -mem-path /dev/hugepages \
    -mem-prealloc \
    -m 4096 \
    -smp 4 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -vga std \
    -monitor stdio \
    -rtc base=localtime  \
    -boot order=cdn,once=n,menu=on,strict=off  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0 
3)Launch a guest in dst host for listening mode with "-incoming tcp:0:5801" 
4)In guest, execute program to generate dirty page
# cat test.c 
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
int main()
{
void wakeup();
signal(SIGALRM,wakeup);
alarm(120);
char *buf = (char *) calloc(40960, 4096);
while (1) {
int i;
for (i = 0; i < 40960 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
void wakeup()
{
exit(0);
}
# gcc test.c -o test
# ./test
5)In src host, do migration and enable postcopy, after generating dirty page, switch to postcopy mode
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.16.184.92:5801
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: active 
dirty sync count: 3
dirty pages rate: 6889 pages
(qemu) migrate_start_postcopy 

Actual result:
Postcopy migration completed and vm works well.
Src host:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: completed
dirty sync count: 6
postcopy request count: 268
Dst host:
(qemu) info status 
VM status: running

So, this bug is fixed.

Comment 12 errata-xmlrpc 2017-08-01 23:34:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 13 errata-xmlrpc 2017-08-02 01:12:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 14 errata-xmlrpc 2017-08-02 02:04:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 02:45:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 03:09:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 03:29:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.