Bug 1266856
| Summary: | Migration from 7.0 to 7.2 failed with numa+hugepage settings. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Fangge Jin <fjin> | ||||
| Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.2 | CC: | dyuan, fjin, huding, juzhang, lhuang, lmiksik, mzhan, rbalakri, zpeng | ||||
| Target Milestone: | rc | Keywords: | Upstream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | libvirt-1.2.17-13.el7 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-11-19 06:55:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
There is no <numatune/> element in your XML, right? Just so we're sure because you haven't posted the whole XML. Thanks. Created attachment 1078560 [details]
The guest XML
(In reply to Martin Kletzander from comment #2) > There is no <numatune/> element in your XML, right? Just so we're sure > because you haven't posted the whole XML. Thanks. I have attached the full xml of the guest. there is no <numatune/> element. Patches proposed upstream (the last one fixes the problem): https://www.redhat.com/archives/libvir-list/2015-October/msg00010.html Fixed upstream by commit v1.2.20-10-g41c2aa729f0a:
commit 41c2aa729f0af084ede95ee9a06219a2dd5fb5df
Author: Martin Kletzander <mkletzan>
Date: Thu Oct 1 07:34:57 2015 +0200
qemu: Use memory-backing-file only when needed
Test on build libvirt-1.2.17-13.el7.x86_64 with the following scenarios, each has got the expected result:
1)7.0-> 7.2, with numa+hugepage, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<memoryBacking>
<hugepages/>
</memoryBacking>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
2)7.0-> 7.2, with numa+hugepage+numatune, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<memoryBacking>
<hugepages/>
<nosharepages/>
<locked/>
</memoryBacking>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
3)7.2->7.2, with numa+hugepage, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<memoryBacking>
<hugepages/>
<nosharepages/>
<locked/>
</memoryBacking>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
4)7.2->7.2, with numa+hugepage+numatune, without specified pagesize, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<memoryBacking>
<hugepages/>
<nosharepages/>
<locked/>
</memoryBacking>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
5)7.2->7.2, with numa+hugepage+numatune, with specified pagesize, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='1'/>
</hugepages>
<nosharepages/>
<locked/>
</memoryBacking>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
Regression test for 6.7->7.2:
6)6.7->7.2, with numa+numatune, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1000000</currentMemory>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
7)6.7->7.2 with hugepage, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1000000</currentMemory>
<memoryBacking>
<hugepages/>
<nosharepages/>
<locked/>
</memoryBacking>
8)6.7->7.2, with numa+numatune+hugepage, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1000000</currentMemory>
<memoryBacking>
<hugepages/>
<nosharepages/>
<locked/>
</memoryBacking>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
More regression tests for 7.0->7.2:
9)7.0-> 7.2, with numa, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
10)7.0-> 7.2, with numa+numatune, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
More regression tests for 7.2->7.2:
11)7.2-> 7.2, with numa, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
12)7.2-> 7.2, with numa+numatune, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1024000</currentMemory>
<numatune>
<memory mode='strict' nodeset='0'/>
</numatune>
<cpu>
<numa>
<cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
</numa>
</cpu>
More regression test for 6.7->7.2:
13)6.7->7.2, with numa, PASS (Migration succeed):
<memory unit='KiB'>1024000</memory>
<currentMemory unit='KiB'>1000000</currentMemory>
<cpu>
<numa>
<cell cpus='0-1' memory='512000'/>
<cell cpus='2-3' memory='512000'/>
</numa>
</cpu>
The qemu command line for each numa/numatune/hugepage combination: a)numa+hugepage: -mem-path /dev/hugepages/libvirt/qemu -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 b)numa: -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 c)numa+numatune: -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 d)numa+numatune+hugepage(without specified hugepage size): -mem-path /dev/hugepages/libvirt/qemu -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 e)numa+numatune+hugepage(with specified hugepage size, only for 7.2->7.2): -object memory-backend-ram,id=ram-node0,size=524288000,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000,host-nodes=0,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 Comment9~12 can verify this bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2202.html |
Description of problem: Prepare a guest xml with numa+hugepage: ...... <memoryBacking> <hugepages/> </memoryBacking> ...... <cpu mode='custom' match='exact'> <model fallback='allow'>SandyBridge</model> <numa> <cell cpus='0-1' memory='512000'/> <cell cpus='2-3' memory='512000'/> </numa> </cpu> Migrate the guest from 7.0 host to 7.2 host: # virsh migrate rhel7d0 qemu+ssh://10.66.4.141/system --live --verbose error: operation failed: migration job: unexpectedly failed Version-Release number of selected component (if applicable): Source: libvirt-1.1.1-29.el7_0.7.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.12.x86_64 Target: libvirt-1.2.17-11.el7.x86_64 qemu-kvm-rhev-2.3.0-26.el7.x86_64 How reproducible: 100% Steps to Reproduce: 0.Prepare a source host(7.0) and a target host(7.2) 1.Prepare a guest with numa+hugepage setting 2.Config hugepage on both source host and target host: #mount -t hugetlbfs hugetlbfs /dev/hugepages #sysctl vm.nr_hugepages=600 #service libvirtd restart 3.Start the guest and do migration from 7.0 to 7.2: # virsh migrate rhel7d0 qemu+ssh://10.66.106.26/system --live --verbose error: operation failed: migration job: unexpectedly failed 4.Check the guest log on target host: 2015-09-28 08:28:50.343+0000: starting up libvirt version: 1.2.17, package: 11.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-09-25-04:15:16, x86-036.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-25.el7) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name rhel7d0 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,mem-merge=off -m 1000 -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=524288000 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid 4dea22b2-1d52-d8f3-2516-782e98ab3fa0 -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7d0/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=1 -boot order=cd,menu=on,reboot-timeout=0,strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x6 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x8 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.1,addr=0x5 -drive file=/90121/fjin/rhel7-4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads,bps=10000000,iops_rd=400000,iops_wr=100000 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fds=27:28:29:30:31,id=hostnet0,vhost=on,vhostfds=32:33:34:35:36 -device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=off,mq=on,vectors=12,netdev=hostnet0,id=net0,mac=52:54:00:c6:3b:95,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/rhel7d2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device pvpanic,ioport=1285 -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) 2015-09-28T08:28:51.424146Z qemu-kvm: Unknown ramblock "pc.ram", cannot accept migration 2015-09-28T08:28:51.424191Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2015-09-28T08:28:51.424274Z qemu-kvm: load of migration failed: Invalid argument 2015-09-28 08:28:51.475+0000: shutting down 5.qemu command line on source host: # ps aux|grep qemu|grep huge qemu 2322 29.6 2.0 1979472 162308 ? SLl 16:35 0:14 /usr/libexec/qemu-kvm -name rhel7d0 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,mem-merge=off -m 1000 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu -realtime mlock=on -smp 4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 -uuid 4dea22b2-1d52-d8f3-2516-782e98ab3fa0 -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7d0.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=1 -boot order=cd,menu=on,reboot-timeout=0,strict=on -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x6 -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x8 -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.1,addr=0x5 -drive file=/90121/fjin/rhel7-4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=f65effa5-90a6-47f2-8487-a9f64c95d4f5,cache=none,discard=unmap,werror=stop,rerror=stop,aio=threads,bps=10000000,iops_rd=400000,iops_wr=100000 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fds=23:24:25:26:27,id=hostnet0,vhost=on,vhostfds=28:29:30:31:32 -device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=off,mq=on,vectors=12,netdev=hostnet0,id=net0,mac=52:54:00:c6:3b:95,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/rhel7d2.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -device pvpanic,ioport=1285 Actual results: Migration from 7.0 to 7.2 failed. Expected results: Migration succeed. Additional info: 1.Remove the hugepage or numa setting from guest xml, migration can succeed.