Hide Forgot
Description of problem: Start ovs+dpdk and boot guest with vhotuser. Then create large dirty page in guest(about half of the memory size), migration fail. Version-Release number of selected component (if applicable): 3.10.0-514.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 (dpdk and ovs is upstream) dpdk-16.07.zip openvswitch-2.6.0.tar.gz How reproducible: 100% Steps to Reproduce: 1. Start ovs-dpdk in host1 and host2, about the ports please refer to Reference[1]. 2. Boot guest with 16G memeory and vhostuser in host1, full xml please refer to Reference[2] 3. Mount tmpfs in guest # mount -t tmpfs none /home/mnt 4. Create 7G dirty page in memory by 'dd' # dd if=/dev/urandom of=/home/mnt/data bs=1M count=7168 5. Migrate guest from host1 to host2 # virsh migrate --timeout 120 --live $guest_name qemu+ssh://$des_ip/system 6. Migration fail. Migration: [ 79 %]error: operation failed: migration job: unexpectedly failed Actual results: (1) virsh report error # virsh migrate --verbose --live rhel7.3-2q-nonrt qemu+ssh://10.73.73.65/system Migration: [ 79 %]error: operation failed: migration job: unexpectedly failed (2) VM log info # cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log 2016-11-08 06:27:47.298+0000: initiating migration 2016-11-08T06:28:29.995427Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615) (3) QEMU migration info before and after fail. # virsh qemu-monitor-command rhel7.3-2q-nonrt --hmp "info migrate" capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off Migration status: active total time: 42733 milliseconds expected downtime: 300 milliseconds setup: 59 milliseconds transferred ram: 4813361 kbytes throughput: 942.51 mbps remaining ram: 3352588 kbytes total ram: 16778504 kbytes duplicate: 13374205 pages skipped: 0 pages normal: 7254327 pages normal bytes: 29017308 kbytes dirty sync count: 1 # virsh qemu-monitor-command rhel7.3-2q-nonrt --hmp "info migrate" capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off Migration status: failed total time: 0 milliseconds Expected results: Migration should work well. Additional info: 1. Without step3 and step4, migration works. Reference [1]# ovs-vsctl show Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser Port "dpdk0" Interface "dpdk0" type: dpdk Bridge "ovsbr1" Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser Port "dpdk1" Interface "dpdk1" type: dpdk Port "ovsbr1" Interface "ovsbr1" type: internal [2] <domain type='kvm'> <name>rhel7.3-2q-nonrt</name> <uuid>ff8b540c-9ef3-4441-9b0d-da7c97543f21</uuid> <memory unit='KiB'>16777216</memory> <currentMemory unit='KiB'>16777216</currentMemory> <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> <vcpu placement='static'>6</vcpu> <cputune> <vcpupin vcpu='0' cpuset='38'/> <vcpupin vcpu='1' cpuset='36'/> <vcpupin vcpu='2' cpuset='34'/> <vcpupin vcpu='3' cpuset='32'/> <vcpupin vcpu='4' cpuset='30'/> <vcpupin vcpu='5' cpuset='28'/> <emulatorpin cpuset='1,3,5,7'/> </cputune> <numatune> <memory mode='strict' nodeset='0-1'/> </numatune> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pmu state='off'/> <vmport state='off'/> </features> <cpu mode='host-passthrough'> <feature policy='require' name='tsc-deadline'/> <numa> <cell id='0' cpus='0-5' memory='16777216' unit='KiB' memAccess='shared'/> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/mnt/rhel7.3-nonrt.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> <controller type='pci' index='0' model='pci-root'/> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='usb' index='0' model='none'/> <interface type='bridge'> <mac address='18:76:da:e6:01:01'/> <source bridge='switch'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:e6:01:02'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/> <model type='virtio'/> <driver name='vhost' queues='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:e6:01:03'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/> <model type='virtio'/> <driver name='vhost' queues='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/> </memballoon> </devices> </domain>
Do you have the /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log from the destination side and also the libvirt logs from both sides. The error from the source says that either the network connection failed or the destination died; probably the destination died so we need to see the logs there.
Created attachment 1219171 [details] libvirt log from destination host
1. qemu logs from destination host # cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log 2016-11-10 05:15:20.913+0000: starting up libvirt version: 2.0.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-09-21-10:15:26, x86-038.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7), hostname: dell-per430-11.lab.eng.pek2.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=rhel7.3-2q-nonrt,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-rhel7.3-2q-nonrt/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,vmport=off -cpu host,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+dca,+osxsave,+f16c,+rdrand,+arat,+tsc_adjust,+xsaveopt,+pdpe1gb,+abm,pmu=off -m 16384 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=yes,size=17179869184,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 -uuid ff8b540c-9ef3-4441-9b0d-da7c97543f21 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-rhel7.3-2q-nonrt/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/rhel7.3-nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:76:da:e6:01:01,bus=pci.0,addr=0x3 -chardev socket,id=charnet1,path=/usr/local/var/run/openvswitch/vhost-user1 -netdev type=vhost-user,id=hostnet1,chardev=charnet1,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=18:66:da:e6:01:02,bus=pci.0,addr=0x2 -chardev socket,id=charnet2,path=/usr/local/var/run/openvswitch/vhost-user2 -netdev type=vhost-user,id=hostnet2,chardev=charnet2,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet2,id=net2,mac=18:66:da:e6:01:03,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1-rhel7.3-2q-nonrt/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc -msg timestamp=on Domain id=1 is tainted: high-privileges Domain id=1 is tainted: host-cpu 2016-11-10 05:15:20.915+0000: 5294: debug : virFileClose:102 : Closed fd 27 2016-11-10 05:15:20.915+0000: 5294: debug : virFileClose:102 : Closed fd 33 2016-11-10 05:15:20.917+0000: 5295: debug : virExec:692 : Run hook 0x7f36d07f7c40 0x7f37056bb460 2016-11-10 05:15:20.917+0000: 5295: debug : qemuProcessHook:2648 : Obtaining domain lock 2016-11-10 05:15:20.917+0000: 5295: debug : virSecuritySELinuxSetSocketLabel:2267 : Setting VM rhel7.3-2q-nonrt socket context system_u:system_r:svirt_t:s0:c762,c865 2016-11-10 05:15:20.918+0000: 5294: debug : virFileClose:102 : Closed fd 3 2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockProcessStart:179 : plugin=0x7f36b011d390 dom=0x7f36b0281990 paused=1 fd=0x7f37056baf90 2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:134 : plugin=0x7f36b011d390 dom=0x7f36b0281990 withResources=1 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerPluginGetDriver:281 : plugin=0x7f36b011d390 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerNew:305 : driver=0x7f371537ba40 type=0 nparams=5 params=0x7f37056bae40 flags=1 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:98 : key=uuid type=uuid value=ff8b540c-9ef3-4441-9b0d-da7c97543f21 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:91 : key=name type=string value=rhel7.3-2q-nonrt 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:79 : key=id type=uint value=1 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:79 : key=pid type=uint value=5295 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:94 : key=uri type=cstring value=qemu:///system 2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:146 : Adding leases 2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:151 : Adding disks 2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerAddImage:90 : Add disk /mnt/rhel7.3-nonrt.qcow2 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerAddResource:332 : lock=0x7f36f8015c00 type=0 name=/mnt/rhel7.3-nonrt.qcow2 nparams=0 params=(nil) flags=0 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerAcquire:350 : lock=0x7f36f8015c00 state='<null>' flags=3 action=0 fd=0x7f37056baf90 2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerFree:387 : lock=0x7f36f8015c00 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupNew:1148 : pid=-1 path= parent=(nil) controllers=-1 group=0x7f37056baf50 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:644 : group=0x7f36f8017810 controllers=-1 path= parent=(nil) 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:690 : Auto-detecting controllers 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpu' present=yes 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpuacct' present=yes 2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpuset' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'memory' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'devices' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'freezer' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'blkio' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'net_cls' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'perf_event' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'name=systemd' present=yes 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetectPlacement:549 : Detecting placement for pid -1 path 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /system.slice for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /system.slice for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in / for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /system.slice for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /system.slice/libvirtd.service for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in / for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /system.slice for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in / for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in / for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/libvirtd.service for pid 18446744073709551615 2016-11-10 05:15:20.919+0000: 5295: debug : qemuProcessHook:2675 : Relying on CGroups for memory binding 2016-11-10 05:15:20.919+0000: 5295: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7f36b02146c0 2016-11-10 05:15:20.919+0000: 5295: debug : qemuProcessHook:2689 : Hook complete ret=0 2016-11-10 05:15:20.919+0000: 5295: debug : virExec:694 : Done hook 0 2016-11-10 05:15:20.919+0000: 5295: debug : virExec:701 : Setting child security label to system_u:system_r:svirt_t:s0:c762,c865 2016-11-10 05:15:20.919+0000: 5295: debug : virExec:731 : Setting child uid:gid to 0:107 with caps 0 2016-11-10 05:15:20.919+0000: 5295: debug : virCommandHandshakeChild:433 : Notifying parent for handshake start on 30 2016-11-10 05:15:20.919+0000: 5295: debug : virCommandHandshakeChild:441 : Waiting on parent for handshake complete on 31 2016-11-10 05:15:20.958+0000: 5295: debug : virFileClose:102 : Closed fd 30 2016-11-10 05:15:20.958+0000: 5295: debug : virFileClose:102 : Closed fd 31 2016-11-10 05:15:20.958+0000: 5295: debug : virCommandHandshakeChild:461 : Handshake with parent is done char device redirected to /dev/pts/1 (label charserial0) warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21] warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22] warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28] warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29] warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31] warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2] warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3] warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4] warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6] warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7] warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8] warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14] warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15] warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18] warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27] 2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004 2016-11-10 05:16:07.079+0000: shutting down 2. qemu logs from source host # cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log 2016-11-10 05:15:24.178+0000: initiating migration 2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615)
Created attachment 1219252 [details] libvirt log from source host I opened the 'debug' function of libvirtd.log, so it's log file is big. The only error info shows in the log is: 2016-11-10 09:07:00.043+0000: 22938: error : qemuMigrationCheckJobStatus:2657 : operation failed: migration job: unexpectedly failed
Destination: 2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004 Source: 2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615) OK, so that says the destination QEMU is being killed by something; and in the libvirt log I see: 2016-11-10 05:16:06.877+0000: 5004: error : virKeepAliveTimerInternal:143 : internal error: connection closed due to keepalive timeout So this looks like libvirt has decided to kill it; we've had similar timeouts before and we've never understood why.
The testing results prove that this bug issue is caused by network bandwidth. Re-test with same memory pressure in same testing environment, migration works well with 10G network bandwidth, however hit this bug issue with 1G network bandwidth. David, based on this, should we close this bug 'NOTABUG'? Thanks. Best Regards, Pei
Also tested with qemu, migration works well with 1G network.
(In reply to Dr. David Alan Gilbert from comment #6) > Destination: > 2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004 > > Source: > 2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for > (131328/18446744073709551615) > > OK, so that says the destination QEMU is being killed by something; and in > the libvirt log I see: > > 2016-11-10 05:16:06.877+0000: 5004: error : virKeepAliveTimerInternal:143 : > internal error: connection closed due to keepalive timeout > > So this looks like libvirt has decided to kill it; we've had similar > timeouts before and we've never understood why. Yes, i remember dzheng opened a bug for these problems: https://bugzilla.redhat.com/show_bug.cgi?id=1367620
Yes, I don't think we should close this until we understand it.
We hit this same issue again in bug 1614182 and we were finally able to reproduce the issue reliably enough to be able to investigate what's going on here. The keep alive packets sent by the source libvirtd are stuck in the kernel somewhere between ssh and the network interface on the source. The kernel sends them eventually to the network, but it may be already too late. I'm closing this as a duplicate of 1614182, where we have a lot more information for further investigation. *** This bug has been marked as a duplicate of bug 1614182 ***