Bug 1266856

Summary: Migration from 7.0 to 7.2 failed with numa+hugepage settings.
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Martin Kletzander <mkletzan>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.2CC: dyuan, fjin, huding, juzhang, lhuang, lmiksik, mzhan, rbalakri, zpeng
Target Milestone: rcKeywords: Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.17-13.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 06:55:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The guest XML none

Description Fangge Jin 2015-09-28 08:38:28 UTC
Description of problem:
Prepare a guest xml with numa+hugepage:
......
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
......
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>SandyBridge</model>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

Migrate the guest from 7.0 host to 7.2 host:
# virsh migrate rhel7d0 qemu+ssh://10.66.4.141/system --live --verbose
error: operation failed: migration job: unexpectedly failed


Version-Release number of selected component (if applicable):
Source:
libvirt-1.1.1-29.el7_0.7.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.12.x86_64

Target:
libvirt-1.2.17-11.el7.x86_64
qemu-kvm-rhev-2.3.0-26.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
0.Prepare a source host(7.0) and a target host(7.2)

1.Prepare a guest with numa+hugepage setting

2.Config hugepage on both source host and target host:
#mount -t hugetlbfs hugetlbfs /dev/hugepages
#sysctl vm.nr_hugepages=600
#service libvirtd restart

3.Start the guest and do migration from 7.0 to 7.2:
# virsh migrate rhel7d0 qemu+ssh://10.66.106.26/system --live --verbose
error: operation failed: migration job: unexpectedly failed

4.Check the guest log on target host:
2015-09-28 08:28:50.343+0000: starting up libvirt version: 1.2.17, package: 11.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-09-25-04:15:16, x86-036.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-25.el7)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name rhel7d0 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,mem-merge=off -m 1000 -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid 4dea22b2-1d52-d8f3-2516-782e98ab3fa0 -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7d0/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=1 -boot order=cd,menu=on,reboot-timeout=0,strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x6 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x8 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.1,addr=0x5 -drive file=/90121/fjin/rhel7-4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads,bps=10000000,iops_rd=400000,iops_wr=100000 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fds=27:28:29:30:31,id=hostnet0,vhost=on,vhostfds=32:33:34:35:36 -device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=off,mq=on,vectors=12,netdev=hostnet0,id=net0,mac=52:54:00:c6:3b:95,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/rhel7d2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device pvpanic,ioport=1285 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2015-09-28T08:28:51.424146Z qemu-kvm: Unknown ramblock "pc.ram", cannot accept migration
2015-09-28T08:28:51.424191Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2015-09-28T08:28:51.424274Z qemu-kvm: load of migration failed: Invalid argument
2015-09-28 08:28:51.475+0000: shutting down

5.qemu command line on source host:
# ps aux|grep qemu|grep huge
qemu      2322 29.6  2.0 1979472 162308 ?      SLl  16:35   0:14 /usr/libexec/qemu-kvm -name rhel7d0 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,mem-merge=off -m 1000 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 -uuid 4dea22b2-1d52-d8f3-2516-782e98ab3fa0 -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7d0.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=1 -boot order=cd,menu=on,reboot-timeout=0,strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x6 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x8 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.1,addr=0x5 -drive file=/90121/fjin/rhel7-4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads,bps=10000000,iops_rd=400000,iops_wr=100000 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fds=23:24:25:26:27,id=hostnet0,vhost=on,vhostfds=28:29:30:31:32 -device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=off,mq=on,vectors=12,netdev=hostnet0,id=net0,mac=52:54:00:c6:3b:95,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/rhel7d2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device pvpanic,ioport=1285


Actual results:
Migration from 7.0 to 7.2 failed.

Expected results:
Migration succeed.

Additional info:
1.Remove the hugepage or numa setting from guest xml, migration can succeed.

Comment 2 Martin Kletzander 2015-09-30 04:37:23 UTC
There is no <numatune/> element in your XML, right?  Just so we're sure because you haven't posted the whole XML.  Thanks.

Comment 3 Fangge Jin 2015-09-30 08:17:47 UTC
Created attachment 1078560 [details]
The guest XML

Comment 4 Fangge Jin 2015-09-30 08:19:45 UTC
(In reply to Martin Kletzander from comment #2)
> There is no <numatune/> element in your XML, right?  Just so we're sure
> because you haven't posted the whole XML.  Thanks.

I have attached the full xml of the guest. there is no <numatune/> element.

Comment 5 Martin Kletzander 2015-10-01 12:24:03 UTC
Patches proposed upstream (the last one fixes the problem):

https://www.redhat.com/archives/libvir-list/2015-October/msg00010.html

Comment 6 Martin Kletzander 2015-10-06 13:14:09 UTC
Fixed upstream by commit v1.2.20-10-g41c2aa729f0a:

commit 41c2aa729f0af084ede95ee9a06219a2dd5fb5df
Author: Martin Kletzander <mkletzan>
Date:   Thu Oct 1 07:34:57 2015 +0200

    qemu: Use memory-backing-file only when needed

Comment 9 Fangge Jin 2015-10-09 07:19:14 UTC
Test on build libvirt-1.2.17-13.el7.x86_64 with the following scenarios, each has got the expected result:

1)7.0-> 7.2, with numa+hugepage, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

2)7.0-> 7.2, with numa+hugepage+numatune, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

3)7.2->7.2, with numa+hugepage, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

4)7.2->7.2, with numa+hugepage+numatune, without specified pagesize, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

5)7.2->7.2, with numa+hugepage+numatune, with specified pagesize, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='1'/>
    </hugepages>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

Comment 10 Fangge Jin 2015-10-09 08:23:34 UTC
Regression test for 6.7->7.2:
6)6.7->7.2, with numa+numatune, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

7)6.7->7.2 with hugepage, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
  </memoryBacking>


8)6.7->7.2, with numa+numatune+hugepage, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <memoryBacking>
    <hugepages/>
    <nosharepages/>
    <locked/>
  </memoryBacking>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

Comment 11 Fangge Jin 2015-10-10 02:22:09 UTC
More regression tests for 7.0->7.2:
9)7.0-> 7.2, with numa, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>


10)7.0-> 7.2, with numa+numatune, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

More regression tests for 7.2->7.2:
11)7.2-> 7.2, with numa, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

12)7.2-> 7.2, with numa+numatune, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>

More regression test for 6.7->7.2:
13)6.7->7.2, with numa, PASS (Migration succeed):
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1000000</currentMemory>
  <cpu>
    <numa>
      <cell cpus='0-1' memory='512000'/>
      <cell cpus='2-3' memory='512000'/>
    </numa>
  </cpu>

Comment 12 Fangge Jin 2015-10-10 02:25:19 UTC
The qemu command line for each numa/numatune/hugepage combination:

a)numa+hugepage:

-mem-path /dev/hugepages/libvirt/qemu
-numa node,nodeid=0,cpus=0-1,mem=500
-numa node,nodeid=1,cpus=2-3,mem=500

b)numa:
-numa node,nodeid=0,cpus=0-1,mem=500
-numa node,nodeid=1,cpus=2-3,mem=500

c)numa+numatune:
-numa node,nodeid=0,cpus=0-1,mem=500
-numa node,nodeid=1,cpus=2-3,mem=500

d)numa+numatune+hugepage(without specified hugepage size):

-mem-path /dev/hugepages/libvirt/qemu
-numa node,nodeid=0,cpus=0-1,mem=500
-numa node,nodeid=1,cpus=2-3,mem=500

e)numa+numatune+hugepage(with specified hugepage size, only for 7.2->7.2):

-object memory-backend-ram,id=ram-node0,size=524288000,host-nodes=0,policy=bind
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0
-object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000,host-nodes=0,policy=bind
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1

Comment 13 Fangge Jin 2015-10-10 02:26:19 UTC
Comment9~12 can verify this bug.

Comment 15 errata-xmlrpc 2015-11-19 06:55:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html