Red Hat Bugzilla – Bug 1373604
Enhance live migration post-copy to support file-backed memory (e.g. 2M hugepages)
Last modified: 2017-08-01 23:29:59 EDT
Description of problem: Live migration post-copy (as of upstream qemu 2.7) does not support file-backed memory, including memory backed by /dev/hugepages. In the NFV/DPDK use cases, 1G hugepage is generally required for DKDK applications (as documented in http://people.redhat.com/~pmatilai/dpdk-guide/setup/hugepages.html). This feature BZ is to enhance live migration to support the 1G (and 2M) hugepages. Version-Release number of selected component (if applicable): 7.4 How reproducible: 100% Steps to Reproduce: 1. Configure 1G hugepages on host 2. Create a guest VM with 1G hugepages 3. Run guest VM and a memory stress app (such as google stress) 4. Live migrate the guest VM with post-copy Actual results: No post-copy live migration support Expected results: Successful live migration Additional info: N/A
Hai: Do we know much about what lives in the big 1G huge pages? Are they mostly static or are they changing reasonably frequently?
First version of qemu posted to qemu-devel; note this supports 2MB hugepages but not 1GB. 1GB support is not currently reasonable with the technique we're using in the kernel.
Fixed in 2.9 Merged as part of 251501a3714096f807778f6d3f03711dcdb9ce29 postcopy: Add extra check for COPY function postcopy: Add doc about hugepages and postcopy postcopy: Check for userfault+hugepage feature postcopy: Update userfaultfd.h header postcopy: Allow hugepages postcopy: Send whole huge pages postcopy: Mask fault addresses to huge page boundary postcopy: Load huge pages in one go postcopy: Use temporary for placing zero huge pages postcopy: Plumb pagesize down into place helpers postcopy: Record largest page size postcopy: enhance ram_block_discard_range for hugepages exec: ram_block_discard_range postcopy: Chunk discards for hugepages postcopy: Transmit and compare individual page sizes postcopy: Transmit ram size summary word
This bug is fixed for qemu-kvm-rhev-2.9.0-1.el7 Bug verify: Version: 3.10.0-657.el7.x86_64 qemu-kvm-rhev-2.9.0-1.el7.x86_64 seabios-bin-1.10.2-2.el7.noarch Steps: 1)Configure hugepage in src host and dst host # echo 2048 > /proc/sys/vm/nr_hugepages # cat /proc/meminfo | grep -i hugepage AnonHugePages: 14336 kB HugePages_Total: 2048 HugePages_Free: 2048 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB 2)Boot a guest on src host with qemu cli: /usr/libexec/qemu-kvm \ -name 'vm1' \ -sandbox off \ -machine pc-i440fx-rhel7.4.0 \ -nodefaults \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \ -device usb-ehci,id=usb1,bus=pci.0,addr=06 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=09 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=/root/rhel74-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bus=pci.0,bootindex=0 \ -device virtio-net-pci,mac=9a:4f:50:51:52:53,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0 \ -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -mem-path /dev/hugepages \ -mem-prealloc \ -m 4096 \ -smp 4 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-mouse,id=input1,bus=usb1.0,port=2 \ -device usb-kbd,id=input2,bus=usb1.0,port=3 \ -vnc :1 \ -qmp tcp:0:8881,server,nowait \ -vga std \ -monitor stdio \ -rtc base=localtime \ -boot order=cdn,once=n,menu=on,strict=off \ -enable-kvm \ -watchdog i6300esb \ -watchdog-action reset \ -device virtio-balloon-pci,id=balloon0,bus=pci.0 3)Launch a guest in dst host for listening mode with "-incoming tcp:0:5801" 4)In guest, execute program to generate dirty page # cat test.c #include <stdlib.h> #include <stdio.h> #include <signal.h> int main() { void wakeup(); signal(SIGALRM,wakeup); alarm(120); char *buf = (char *) calloc(40960, 4096); while (1) { int i; for (i = 0; i < 40960 * 4; i++) { buf[i * 4096 / 4]++; } printf("."); } } void wakeup() { exit(0); } # gcc test.c -o test # ./test 5)In src host, do migration and enable postcopy, after generating dirty page, switch to postcopy mode (qemu) migrate_set_capability postcopy-ram on (qemu) migrate -d tcp:10.16.184.92:5801 (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off Migration status: active dirty sync count: 3 dirty pages rate: 6889 pages (qemu) migrate_start_postcopy Actual result: Postcopy migration completed and vm works well. Src host: (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off Migration status: completed dirty sync count: 6 postcopy request count: 268 Dst host: (qemu) info status VM status: running So, this bug is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392