Bug 1373604

Summary: Enhance live migration post-copy to support file-backed memory (e.g. 2M hugepages)
Product: Red Hat Enterprise Linux 7 Reporter: Hai Huang <hhuang>
Component: qemu-kvm-rhevAssignee: Dr. David Alan Gilbert <dgilbert>
Status: CLOSED ERRATA QA Contact: xianwang <xianwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: amit.shah, chayang, dgilbert, hannsj_uhl, hhuang, huding, juzhang, michen, mrezanin, pezhang, quintela, qzhang, virt-maint, xfu, xianwang, xiywang
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.4   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1373606 (view as bug list) Environment:
Last Closed: 2017-08-01 23:34:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1373606, 1430172, 1430174    
Bug Blocks: 1359843, 1385707, 1411879    

Description Hai Huang 2016-09-06 17:15:01 UTC
Description of problem:
Live migration post-copy (as of upstream qemu 2.7) does not support file-backed
memory, including memory backed by /dev/hugepages.

In the NFV/DPDK use cases, 1G hugepage is generally required for DKDK 
applications (as documented in http://people.redhat.com/~pmatilai/dpdk-guide/setup/hugepages.html).

This feature BZ is to enhance live migration to support the 1G (and 2M) 
hugepages.


Version-Release number of selected component (if applicable):
7.4

How reproducible:
100%

Steps to Reproduce:
1. Configure 1G hugepages on host
2. Create a guest VM with 1G hugepages
3. Run guest VM and a memory stress app (such as google stress)
4. Live migrate the guest VM with post-copy

Actual results:
No post-copy live migration support

Expected results:
Successful live migration

Additional info:
N/A

Comment 1 Dr. David Alan Gilbert 2016-09-07 09:25:07 UTC
Hai: Do we know much about what lives in the big 1G huge pages? Are they mostly static or are they changing reasonably frequently?

Comment 3 Dr. David Alan Gilbert 2017-01-06 18:43:40 UTC
First version of qemu posted to qemu-devel; note this supports 2MB hugepages but not 1GB. 

1GB support is not currently reasonable with the technique we're using in the kernel.

Comment 4 Dr. David Alan Gilbert 2017-03-02 19:40:42 UTC
Fixed in 2.9
Merged as part of 251501a3714096f807778f6d3f03711dcdb9ce29

      postcopy: Add extra check for COPY function
      postcopy: Add doc about hugepages and postcopy
      postcopy: Check for userfault+hugepage feature
      postcopy: Update userfaultfd.h header
      postcopy: Allow hugepages
      postcopy: Send whole huge pages
      postcopy: Mask fault addresses to huge page boundary
      postcopy: Load huge pages in one go
      postcopy: Use temporary for placing zero huge pages
      postcopy: Plumb pagesize down into place helpers
      postcopy: Record largest page size
      postcopy: enhance ram_block_discard_range for hugepages
      exec: ram_block_discard_range
      postcopy: Chunk discards for hugepages
      postcopy: Transmit and compare individual page sizes
      postcopy: Transmit ram size summary word

Comment 10 xianwang 2017-04-28 03:36:01 UTC
This bug is fixed for qemu-kvm-rhev-2.9.0-1.el7
Bug verify:
Version:
3.10.0-657.el7.x86_64
qemu-kvm-rhev-2.9.0-1.el7.x86_64
seabios-bin-1.10.2-2.el7.noarch

Steps:
1)Configure hugepage in src host and dst host
# echo 2048 > /proc/sys/vm/nr_hugepages
# cat /proc/meminfo | grep -i hugepage
AnonHugePages:     14336 kB
HugePages_Total:    2048
HugePages_Free:     2048
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
2)Boot a guest on src host with qemu cli:
/usr/libexec/qemu-kvm \
    -name 'vm1'  \
    -sandbox off  \
    -machine pc-i440fx-rhel7.4.0 \
    -nodefaults  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \
    -device usb-ehci,id=usb1,bus=pci.0,addr=06 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \
    -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0  \
    -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
    -mem-path /dev/hugepages \
    -mem-prealloc \
    -m 4096 \
    -smp 4 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -device usb-mouse,id=input1,bus=usb1.0,port=2 \
    -device usb-kbd,id=input2,bus=usb1.0,port=3 \
    -vnc :1 \
    -qmp tcp:0:8881,server,nowait \
    -vga std \
    -monitor stdio \
    -rtc base=localtime  \
    -boot order=cdn,once=n,menu=on,strict=off  \
    -enable-kvm  \
    -watchdog i6300esb \
    -watchdog-action reset \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0 
3)Launch a guest in dst host for listening mode with "-incoming tcp:0:5801" 
4)In guest, execute program to generate dirty page
# cat test.c 
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
int main()
{
void wakeup();
signal(SIGALRM,wakeup);
alarm(120);
char *buf = (char *) calloc(40960, 4096);
while (1) {
int i;
for (i = 0; i < 40960 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
void wakeup()
{
exit(0);
}
# gcc test.c -o test
# ./test
5)In src host, do migration and enable postcopy, after generating dirty page, switch to postcopy mode
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.16.184.92:5801
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: active 
dirty sync count: 3
dirty pages rate: 6889 pages
(qemu) migrate_start_postcopy 

Actual result:
Postcopy migration completed and vm works well.
Src host:
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: completed
dirty sync count: 6
postcopy request count: 268
Dst host:
(qemu) info status 
VM status: running

So, this bug is fixed.

Comment 12 errata-xmlrpc 2017-08-01 23:34:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 13 errata-xmlrpc 2017-08-02 01:12:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 14 errata-xmlrpc 2017-08-02 02:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 02:45:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 03:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 03:29:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392