Bug 1194982
Summary: | numa-enabled domains cannot be migrated from RHEL hosts older than 7.1 to 7.1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jan Kurik <jkurik> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 7.1 | CC: | bmcclain, dgilbert, dyuan, gklein, jdenemar, jherrman, jmiao, jsuchane, lhuang, michal.skrivanek, mprivozn, mzhan, pm-eus, rbalakri, rgolan, sherold, zhwang, zpeng | ||||
Target Milestone: | rc | Keywords: | Upstream, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-1.2.8-16.el7_1.1 | Doc Type: | Bug Fix | ||||
Doc Text: |
A prior QEMU update introduced one-to-one Non-Uniform Memory Access (NUMA) memory pinning of guest NUMA nodes and host NUMA nodes, which also included a new way of NUMA specification at QEMU startup. However, the libvirt library previously always used the newer NUMA specification, even if one-on-one NUMA pinning was not specified in the libvirt configuration XML file. This caused the guest to have an incompatible application binary interface (ABI), which in turn led to failed migration of NUMA domains from Red Hat Enterprise Linux 6 to Red Hat Enterprise Linux 7. With this update, libvirt only uses the newer NUMA specification when it is specified in the configuration, and the described NUMA domains migrate correctly.
|
Story Points: | --- | ||||
Clone Of: | 1191567 | ||||||
: | 1196644 (view as bug list) | Environment: | |||||
Last Closed: | 2015-03-05 14:09:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1191567 | ||||||
Bug Blocks: | 1196644 | ||||||
Attachments: |
|
Description
Jan Kurik
2015-02-21 15:33:56 UTC
Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2015-February/msg00481.html Fixing bug summary because this bug is a bit more general and affects all machine types. Migrating any domain with the following XML from RHEL 6 or 7.0 to 7.1 will fail: <numatune> <memory mode='strict' nodeset='0'/> </numatune> <cpu> <numa> <cell id='0' cpus='0-1' memory='2048000'/> </numa> </cpu> For guest NUMA, if domain xml doesn't contain hugepage or memnode, libvirtd will not generate qemu cmdline with '-object': # virsh dumpxml dummy <domain type='kvm' id='18'> <name>dummy</name> <uuid>1fd33000-27cb-425c-94f9-c7454914acdb</uuid> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <vcpu placement='auto' current='1'>16</vcpu> <numatune> <memory mode='strict' nodeset='0-1'/> </numatune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.1.0'>hvm</type> <boot dev='hd'/> </os> <cpu> <numa> <cell id='0' cpus='0-3' memory='524288'/> <cell id='1' cpus='4-7' memory='524288'/> <cell id='2' cpus='8-11' memory='524288'/> <cell id='3' cpus='12-15' memory='524288'/> </numa> </cpu> ... For libvirt-1.2.8-16.el7, '-object memory-backend-ram' will be added to qemu cmdline firstly. # rpm -q libvirt libvirt-1.2.8-16.el7.x86_64 # virsh start dummy # ps -ef | grep qemu qemu 26986 1 4 15:00 ? 00:00:00 /usr/libexec/qemu-kvm -name dummy -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -object memory-backend-ram,size=512M,id=ram-node0,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -object memory-backend-ram,size=512M,id=ram-node1,host-nodes=0-1,policy=bind -numa node,nodeid=1,cpus=4-7,memdev=ram-node1 -object memory-backend-ram,size=512M,id=ram-node2,host-nodes=0-1,policy=bind -numa node,nodeid=2,cpus=8-11,memdev=ram-node2 -object memory-backend-ram,size=512M,id=ram-node3,host-nodes=0-1,policy=bind -numa node,nodeid=3,cpus=12-15,memdev=ram-node3 -uuid 1fd33000-27cb-425c-94f9-c7454914acdb -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/dummy.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on But for libvirt-1.2.8-16.el7_1.1.x86_64, it used 'mem=XXX' # rpm -q libvirt libvirt-1.2.8-16.el7_1.1.x86_64 # virsh start dummy # ps -ef | grep qemu qemu 27696 1 0 15:13 ? 00:00:01 /usr/libexec/qemu-kvm -name dummy -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,cpus=4-7,mem=512 -numa node,nodeid=2,cpus=8-11,mem=512 -numa node,nodeid=3,cpus=12-15,mem=512 -uuid 1fd33000-27cb-425c-94f9-c7454914acdb -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/dummy.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x2 -msg timestamp=on I can reproduce this issue with libvirt-1.2.8-16.el7.x86_64: 1.prepare a running happy vm in rhel6 host: # rpm -q libvirt libvirt-0.10.2-48.el6.x86_64 # virsh list Id Name State ---------------------------------------------------- 6 r6 running 2.make sure this vm has settings like this: # virsh dumpxml r6 <numatune> <memory mode='strict' nodeset='0'/> </numatune> <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> <cpu> <numa> <cell cpus='0-1' memory='1024000'/> </numa> </cpu> 3.migrate to rhel7.1 host (version: libvirt-1.2.8-16.el7.x86_64): # virsh migrate r6 --live qemu+ssh://10.66.6.19/system root.6.19's password: error: internal error: process exited while connecting to monitor: 2015-02-26T07:57:23.869659Z qemu-kvm: -numa memdev is not supported by machine rhel6.5.0 4.also test migrate this from 7.0 host to 7.1 host, get the same result. try to verify this bug with libvirt-1.2.8-16.el7_1.1.x86_64 and qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64: 1.first rebuild the src rpm to make sure all the tests will pass: # rpm -ivh libvirt-1.2.8-16.el7_1.1.src.rpm # rpmbuild -bb SPECS/libvirt.spec ... ============================================================================ Testsuite summary for libvirt 1.2.8 ============================================================================ # TOTAL: 116 # PASS: 116 # SKIP: 0 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ============================================================================ ... 2. vm have numatune settings and vm numa settings in rhel6.6 # virsh dumpxml r6 <numatune> <memory mode='strict' nodeset='0'/> </numatune> ... <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> ... <cpu> <numa> <cell cpus='0-1' memory='1024000'/> </numa> </cpu> 3. migrate will success and try to migrate back from 7.1 to 6.6(sometimes will failed, i don't know why) IN 6.6 host: # virsh migrate r6 --live qemu+ssh://10.66.6.19/system root.6.19's password: IN 7.1 host: # virsh migrate r6 --live qemu+ssh://10.66.100.118/system root.100.118's password: error: operation failed: migration job: unexpectedly failed # virsh migrate r6 --live qemu+ssh://10.66.100.118/system root.100.118's password: 4. check the running vm qemu cmd line in 7.1 host, and libvirt won't use memory-backend-file in this case: # ps aux|grep qemu ...-numa node,nodeid=0,cpus=0-1,mem=1000... 5. for hugepage (memnode do not support in rhel7.0 and rhel6): # virsh dumpxml test3 <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> ... <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> ... <cpu> <numa> <cell id='0' cpus='0,2' memory='512000'/> <cell id='1' cpus='1,3' memory='512000'/> </numa> </cpu> 6. vm will fail to start in rhel7.1 host # virsh start test3 error: Failed to start domain test3 error: internal error: early end of file from monitor: possible problem: 2015-02-26T08:34:53.624748Z qemu-kvm: -numa memdev is not supported by machine rhel6.5.0 7.change machine type to rhel7.0.0 than start it, libvirt will use memory-backend-file in this caseļ¼ # virsh edit test3 <os> <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type> <boot dev='hd'/> </os> # virsh start test3 Domain test3 started # ps aux|grep test3 ... -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=500M,id=ram-node0,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0,cpus=2,memdev=ram-node0 -object memory-backend-ram,size=500M,id=ram-node1,host-nodes=0,policy=bind -numa node,nodeid=1,cpus=1,cpus=3,memdev=ram-node1 ... 8. cross migration test for vm have hugepages settings in rhel6.6 host: # virsh dumpxml r6 <memoryBacking> <hugepages/> </memoryBacking> ... <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> ... <cpu> <numa> <cell cpus='0-1' memory='1024000'/> </numa> </cpu> ... 9.migrate to rhel7.1 host(will failed but seems not numa node settings issue): # virsh migrate r6 --live qemu+ssh://10.66.6.19/system root.6.19's password: error: internal error: Unable to find any usable hugetlbfs mount for 0 KiB So the problem has come: Hi Michal, would you please help to check out these two issue when i try to verify this bug, maybe both of them will affect the test result: 1. when i try to verify this issue with libvirt-1.2.8-16.el7_1.1.x86_64, i found cross migrate cannot success every times, steps was in verify step 3 and i can only find a warnning in libvirtd.log in target host(os is rhel6) 2015-02-26 08:25:29.796+0000: 302: warning : qemuDomainObjEnterMonitorInternal:1062 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous Is this issue will affect this bug verify ? 2. i test cross migrate with hugepages, however i found i cannot migrate a vm have hugepages settings from rhel6 to rhel7.1 or from rhel7.0 to rhel7.1, but it will work well if i do migrate from rhel6 to rhel7.0, the reason seems to be rhel7.1 libvirt forbid vm start with xml like this: ... <memoryBacking> <hugepages/> </memoryBacking> ... Is this issue need fix in rhel7.1.z? and is this issue will affect this bug verify ? Thanks in advance for your answer!! r6 XML for issue 2: <domain type='kvm' id='5'> <name>r6</name> <uuid>63b566d4-40e9-4152-b784-f46cc953abb0</uuid> <memory unit='KiB'>1024000</memory> <currentMemory unit='KiB'>1024000</currentMemory> <memoryBacking> <hugepages/> </memoryBacking> <vcpu placement='static' cpuset='1-3' current='1'>4</vcpu> <numatune> <memory mode='strict' nodeset='0'/> </numatune> <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <cpu> <numa> <cell cpus='0-1' memory='1024000'/> </numa> </cpu> <clock offset='localtime'> <timer name='rtc' tickpolicy='catchup' track='guest'> <catchup threshold='123' slew='120' limit='10000'/> </timer> <timer name='pit' tickpolicy='delay'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/nfs/lhuang/test3.img'> <seclabel model='selinux' relabel='no'/> </source> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <controller type='scsi' index='0' model='virtio-scsi'> <alias name='scsi0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='ide' index='0'> <alias name='ide0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='ccid' index='0'> <alias name='ccid0'/> </controller> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <interface type='network'> <mac address='52:54:00:cc:a3:82'/> <source network='default'/> <target dev='vnet0'/> <model type='e1000'/> <filterref filter='clean-traffic'/> <link state='up'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' function='0x0' multifunction='on'/> </interface> <interface type='bridge'> <mac address='52:54:00:19:99:53'/> <source bridge='virbr0'/> <target dev='vnet1'/> <model type='rtl8139'/> <filterref filter='clean-traffic'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <smartcard mode='passthrough' type='spicevmc'> <alias name='smartcard0'/> <address type='ccid' controller='0' slot='0'/> </smartcard> <serial type='pty'> <source path='/dev/pts/1'/> <target port='0'/> <alias name='serial0'/> </serial> <serial type='pty'> <source path='/dev/pts/2'/> <target port='0'/> <alias name='serial1'/> </serial> <console type='pty' tty='/dev/pts/1'> <source path='/dev/pts/1'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/r6.agent'/> <target type='virtio' name='org.qemu.guest_agent.0'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <alias name='channel1'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> <graphics type='spice' port='5901' autoport='yes' listen='127.0.0.1'> <listen type='address' address='127.0.0.1'/> </graphics> <video> <model type='qxl' ram='65536' vram='65536' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <watchdog model='i6300esb' action='poweroff'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </watchdog> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> <rng model='virtio'> <backend model='random'>/dev/random</backend> <alias name='rng0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </rng> <panic> </panic> </devices> <seclabel type='dynamic' model='dac' relabel='yes'> <label>107:107</label> <imagelabel>107:107</imagelabel> </seclabel> <seclabel type='dynamic' model='selinux' relabel='yes'> <label>unconfined_u:system_r:svirt_t:s0:c1004,c1016</label> <imagelabel>unconfined_u:object_r:svirt_image_t:s0:c1004,c1016</imagelabel> </seclabel> </domain> Created attachment 995514 [details]
libvirtd.log
(In reply to Luyao Huang from comment #9) > Hi Michal, > > would you please help to check out these two issue when i try to verify this > bug, maybe both of them will affect the test result: > > 1. when i try to verify this issue with libvirt-1.2.8-16.el7_1.1.x86_64, i > found cross migrate cannot success every times, steps was in verify step 3 > and i can only find a warnning in libvirtd.log in target host(os is rhel6) > > 2015-02-26 08:25:29.796+0000: 302: warning : > qemuDomainObjEnterMonitorInternal:1062 : This thread seems to be the async > job owner; entering monitor without asking for a nested job is dangerous Despite what the message says, it's harmless. > > Is this issue will affect this bug verify ? That's okay and probably a qemu bug. If migration can finish successfully sometimes and sometimes not, it's likely to be a qemu bug anyway. So this is okay on libvirt side. > > > 2. i test cross migrate with hugepages, however i found i cannot migrate > a vm have hugepages settings from rhel6 to rhel7.1 or from rhel7.0 to > rhel7.1, but it will work well if i do migrate from rhel6 to rhel7.0, the > reason seems to be rhel7.1 libvirt forbid vm start with xml like this: > ... > <memoryBacking> > <hugepages/> > </memoryBacking> > ... > > Is this issue need fix in rhel7.1.z? and is this issue will affect this bug > verify ? This is not okay, but not much related to this bug. So I suggest cloning this bug to cover this second part and let this original through. Luyao: For case (1) where it sometimes works and sometimes doesn't, please open a qemu bug with the details and also include the log file for a failing migration from /etc/libvirt/qemu/guestname.xml from both source/dest. Please cc me on the bug. (In reply to Michal Privoznik from comment #12) > (In reply to Luyao Huang from comment #9) > > > Hi Michal, > > > > would you please help to check out these two issue when i try to verify this > > bug, maybe both of them will affect the test result: > > > > 1. when i try to verify this issue with libvirt-1.2.8-16.el7_1.1.x86_64, i > > found cross migrate cannot success every times, steps was in verify step 3 > > and i can only find a warnning in libvirtd.log in target host(os is rhel6) > > > > 2015-02-26 08:25:29.796+0000: 302: warning : > > qemuDomainObjEnterMonitorInternal:1062 : This thread seems to be the async > > job owner; entering monitor without asking for a nested job is dangerous > > Despite what the message says, it's harmless. > > > > > Is this issue will affect this bug verify ? > > That's okay and probably a qemu bug. If migration can finish successfully > sometimes and sometimes not, it's likely to be a qemu bug anyway. So this is > okay on libvirt side. > > > > > > > 2. i test cross migrate with hugepages, however i found i cannot migrate > > a vm have hugepages settings from rhel6 to rhel7.1 or from rhel7.0 to > > rhel7.1, but it will work well if i do migrate from rhel6 to rhel7.0, the > > reason seems to be rhel7.1 libvirt forbid vm start with xml like this: > > ... > > <memoryBacking> > > <hugepages/> > > </memoryBacking> > > ... > > > > Is this issue need fix in rhel7.1.z? and is this issue will affect this bug > > verify ? > > This is not okay, but not much related to this bug. So I suggest cloning > this bug to cover this second part and let this original through. Okay, seems these two issue not related to this bug, i will verify this bug. and open/clone new bug for the new issue. And thanks a lot for your reply (In reply to Dr. David Alan Gilbert from comment #13) > Luyao: > For case (1) where it sometimes works and sometimes doesn't, please open a > qemu bug with the details and also include the log file for a failing > migration from /etc/libvirt/qemu/guestname.xml from both source/dest. > Please cc me on the bug. Hi David, Okay, i have open a qemu bug 1196692 for this issue and find some useful log in source /var/log/libvirt/qemu/r6.log and attach them in the new bug, also include vm xml and libvirtd.log. Luyao Luyao: Thanks; however migrating from 7.x->6.x is NOT supported so it's not expected to work (even for rhel6 machine types) - only 6.x->7.x and 7.x-7.x is supported. If you have any cases where it fails in those directions please open another qemu bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0625.html |