Bug 1392756 - Migration fail with large dirty page in guest over ovs+dpdk
Summary: Migration fail with large dirty page in guest over ovs+dpdk
Keywords:
Status: CLOSED DUPLICATE of bug 1614182
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Jing Qi
URL:
Whiteboard:
Depends On:
Blocks: 1401400
TreeView+ depends on / blocked
 
Reported: 2016-11-08 07:35 UTC by Pei Zhang
Modified: 2018-09-04 11:06 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-04 11:06:08 UTC
Target Upstream Version:


Attachments (Terms of Use)
libvirt log from destination host (1.74 MB, text/plain)
2016-11-10 05:26 UTC, Pei Zhang
no flags Details
libvirt log from source host (6.59 MB, text/plain)
2016-11-10 09:20 UTC, Pei Zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1367620 0 unspecified CLOSED storage migration fails due to keepalive timeout 2021-06-10 11:27:45 UTC

Internal Links: 1367620

Description Pei Zhang 2016-11-08 07:35:31 UTC
Description of problem:
Start ovs+dpdk and boot guest with vhotuser. Then create large dirty page in guest(about half of the memory size), migration fail.


Version-Release number of selected component (if applicable):
3.10.0-514.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
(dpdk and ovs is upstream)
dpdk-16.07.zip
openvswitch-2.6.0.tar.gz


How reproducible:
100%

Steps to Reproduce:
1. Start ovs-dpdk in host1 and host2, about the ports please refer to Reference[1].

2. Boot guest with 16G memeory and vhostuser in host1, full xml please refer to Reference[2]

3. Mount tmpfs in guest
# mount -t tmpfs none /home/mnt

4. Create 7G dirty page in memory by 'dd'
# dd if=/dev/urandom of=/home/mnt/data bs=1M count=7168

5. Migrate guest from host1 to host2
# virsh migrate --timeout 120 --live $guest_name qemu+ssh://$des_ip/system

6. Migration fail.
Migration: [ 79 %]error: operation failed: migration job: unexpectedly failed


Actual results:
(1) virsh report error
# virsh migrate  --verbose --live rhel7.3-2q-nonrt qemu+ssh://10.73.73.65/system
Migration: [ 79 %]error: operation failed: migration job: unexpectedly failed

(2) VM log info
# cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log
2016-11-08 06:27:47.298+0000: initiating migration
2016-11-08T06:28:29.995427Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615)

(3) QEMU migration info before and after fail.
# virsh qemu-monitor-command rhel7.3-2q-nonrt --hmp "info migrate"
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off 
Migration status: active
total time: 42733 milliseconds
expected downtime: 300 milliseconds
setup: 59 milliseconds
transferred ram: 4813361 kbytes
throughput: 942.51 mbps
remaining ram: 3352588 kbytes
total ram: 16778504 kbytes
duplicate: 13374205 pages
skipped: 0 pages
normal: 7254327 pages
normal bytes: 29017308 kbytes
dirty sync count: 1

# virsh qemu-monitor-command rhel7.3-2q-nonrt --hmp "info migrate"
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds


Expected results:
Migration should work well.


Additional info:
1. Without step3 and step4, migration works.


Reference
[1]# ovs-vsctl show
Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
    Bridge "ovsbr1"
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal

[2]
<domain type='kvm'>
  <name>rhel7.3-2q-nonrt</name>
  <uuid>ff8b540c-9ef3-4441-9b0d-da7c97543f21</uuid>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>6</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='38'/>
    <vcpupin vcpu='1' cpuset='36'/>
    <vcpupin vcpu='2' cpuset='34'/>
    <vcpupin vcpu='3' cpuset='32'/>
    <vcpupin vcpu='4' cpuset='30'/>
    <vcpupin vcpu='5' cpuset='28'/>
    <emulatorpin cpuset='1,3,5,7'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state='off'/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-5' memory='16777216' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/rhel7.3-nonrt.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='none'/>
    <interface type='bridge'>
      <mac address='18:76:da:e6:01:01'/>
      <source bridge='switch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:01:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:01:03'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Comment 2 Dr. David Alan Gilbert 2016-11-09 09:07:16 UTC
Do you have the /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log  from the destination side and also the libvirt logs from both sides.

The error from the source says that either the network connection failed or the destination died; probably the destination died so we need to see the logs there.

Comment 3 Pei Zhang 2016-11-10 05:26:27 UTC
Created attachment 1219171 [details]
libvirt log from destination host

Comment 4 Pei Zhang 2016-11-10 07:17:22 UTC
1. qemu logs from destination host
# cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log

2016-11-10 05:15:20.913+0000: starting up libvirt version: 2.0.0, package: 10.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-09-21-10:15:26, x86-038.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7), hostname: dell-per430-11.lab.eng.pek2.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=rhel7.3-2q-nonrt,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-rhel7.3-2q-nonrt/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,vmport=off -cpu host,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+dca,+osxsave,+f16c,+rdrand,+arat,+tsc_adjust,+xsaveopt,+pdpe1gb,+abm,pmu=off -m 16384 -realtime mlock=off -smp 6,sockets=6,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=yes,size=17179869184,host-nodes=0-1,policy=bind -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 -uuid ff8b540c-9ef3-4441-9b0d-da7c97543f21 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-rhel7.3-2q-nonrt/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/rhel7.3-nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:76:da:e6:01:01,bus=pci.0,addr=0x3 -chardev socket,id=charnet1,path=/usr/local/var/run/openvswitch/vhost-user1 -netdev type=vhost-user,id=hostnet1,chardev=charnet1,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet1,id=net1,mac=18:66:da:e6:01:02,bus=pci.0,addr=0x2 -chardev socket,id=charnet2,path=/usr/local/var/run/openvswitch/vhost-user2 -netdev type=vhost-user,id=hostnet2,chardev=charnet2,queues=2 -device virtio-net-pci,mq=on,vectors=6,netdev=hostnet2,id=net2,mac=18:66:da:e6:01:03,bus=pci.0,addr=0x6 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1-rhel7.3-2q-nonrt/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xc -msg timestamp=on
Domain id=1 is tainted: high-privileges
Domain id=1 is tainted: host-cpu
2016-11-10 05:15:20.915+0000: 5294: debug : virFileClose:102 : Closed fd 27
2016-11-10 05:15:20.915+0000: 5294: debug : virFileClose:102 : Closed fd 33
2016-11-10 05:15:20.917+0000: 5295: debug : virExec:692 : Run hook 0x7f36d07f7c40 0x7f37056bb460
2016-11-10 05:15:20.917+0000: 5295: debug : qemuProcessHook:2648 : Obtaining domain lock
2016-11-10 05:15:20.917+0000: 5295: debug : virSecuritySELinuxSetSocketLabel:2267 : Setting VM rhel7.3-2q-nonrt socket context system_u:system_r:svirt_t:s0:c762,c865
2016-11-10 05:15:20.918+0000: 5294: debug : virFileClose:102 : Closed fd 3
2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockProcessStart:179 : plugin=0x7f36b011d390 dom=0x7f36b0281990 paused=1 fd=0x7f37056baf90
2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:134 : plugin=0x7f36b011d390 dom=0x7f36b0281990 withResources=1
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerPluginGetDriver:281 : plugin=0x7f36b011d390
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerNew:305 : driver=0x7f371537ba40 type=0 nparams=5 params=0x7f37056bae40 flags=1
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:98 :   key=uuid type=uuid value=ff8b540c-9ef3-4441-9b0d-da7c97543f21
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:91 :   key=name type=string value=rhel7.3-2q-nonrt
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:79 :   key=id type=uint value=1
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:79 :   key=pid type=uint value=5295
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerLogParams:94 :   key=uri type=cstring value=qemu:///system
2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:146 : Adding leases
2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerNew:151 : Adding disks
2016-11-10 05:15:20.918+0000: 5295: debug : virDomainLockManagerAddImage:90 : Add disk /mnt/rhel7.3-nonrt.qcow2
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerAddResource:332 : lock=0x7f36f8015c00 type=0 name=/mnt/rhel7.3-nonrt.qcow2 nparams=0 params=(nil) flags=0
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerAcquire:350 : lock=0x7f36f8015c00 state='<null>' flags=3 action=0 fd=0x7f37056baf90
2016-11-10 05:15:20.918+0000: 5295: debug : virLockManagerFree:387 : lock=0x7f36f8015c00
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupNew:1148 : pid=-1 path= parent=(nil) controllers=-1 group=0x7f37056baf50
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:644 : group=0x7f36f8017810 controllers=-1 path= parent=(nil)
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:690 : Auto-detecting controllers
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpu' present=yes
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpuacct' present=yes
2016-11-10 05:15:20.918+0000: 5295: debug : virCgroupDetect:695 : Controller 'cpuset' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'memory' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'devices' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'freezer' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'blkio' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'net_cls' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'perf_event' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:695 : Controller 'name=systemd' present=yes
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetectPlacement:549 : Detecting placement for pid -1 path 
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /system.slice for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /system.slice for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in / for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /system.slice for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /system.slice/libvirtd.service for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in / for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /system.slice for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in / for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in / for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : virCgroupDetect:737 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice/libvirtd.service for pid 18446744073709551615
2016-11-10 05:15:20.919+0000: 5295: debug : qemuProcessHook:2675 : Relying on CGroups for memory binding
2016-11-10 05:15:20.919+0000: 5295: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7f36b02146c0
2016-11-10 05:15:20.919+0000: 5295: debug : qemuProcessHook:2689 : Hook complete ret=0
2016-11-10 05:15:20.919+0000: 5295: debug : virExec:694 : Done hook 0
2016-11-10 05:15:20.919+0000: 5295: debug : virExec:701 : Setting child security label to system_u:system_r:svirt_t:s0:c762,c865
2016-11-10 05:15:20.919+0000: 5295: debug : virExec:731 : Setting child uid:gid to 0:107 with caps 0
2016-11-10 05:15:20.919+0000: 5295: debug : virCommandHandshakeChild:433 : Notifying parent for handshake start on 30
2016-11-10 05:15:20.919+0000: 5295: debug : virCommandHandshakeChild:441 : Waiting on parent for handshake complete on 31
2016-11-10 05:15:20.958+0000: 5295: debug : virFileClose:102 : Closed fd 30
2016-11-10 05:15:20.958+0000: 5295: debug : virFileClose:102 : Closed fd 31
2016-11-10 05:15:20.958+0000: 5295: debug : virCommandHandshakeChild:461 : Handshake with parent is done
char device redirected to /dev/pts/1 (label charserial0)
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
warning: host doesn't support requested feature: CPUID.01H:EDX.ds [bit 21]
warning: host doesn't support requested feature: CPUID.01H:EDX.acpi [bit 22]
warning: host doesn't support requested feature: CPUID.01H:EDX.ht [bit 28]
warning: host doesn't support requested feature: CPUID.01H:EDX.tm [bit 29]
warning: host doesn't support requested feature: CPUID.01H:EDX.pbe [bit 31]
warning: host doesn't support requested feature: CPUID.01H:ECX.dtes64 [bit 2]
warning: host doesn't support requested feature: CPUID.01H:ECX.monitor [bit 3]
warning: host doesn't support requested feature: CPUID.01H:ECX.ds_cpl [bit 4]
warning: host doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
warning: host doesn't support requested feature: CPUID.01H:ECX.smx [bit 6]
warning: host doesn't support requested feature: CPUID.01H:ECX.est [bit 7]
warning: host doesn't support requested feature: CPUID.01H:ECX.tm2 [bit 8]
warning: host doesn't support requested feature: CPUID.01H:ECX.xtpr [bit 14]
warning: host doesn't support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn't support requested feature: CPUID.01H:ECX.dca [bit 18]
warning: host doesn't support requested feature: CPUID.01H:ECX.osxsave [bit 27]
2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004
2016-11-10 05:16:07.079+0000: shutting down


2. qemu logs from source host
# cat /var/log/libvirt/qemu/rhel7.3-2q-nonrt.log
2016-11-10 05:15:24.178+0000: initiating migration
2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615)

Comment 5 Pei Zhang 2016-11-10 09:20:01 UTC
Created attachment 1219252 [details]
libvirt log from source host

I opened the 'debug' function of libvirtd.log, so it's log file is big.

The only error info shows in the log is:
2016-11-10 09:07:00.043+0000: 22938: error : qemuMigrationCheckJobStatus:2657 : operation failed: migration job: unexpectedly failed

Comment 6 Dr. David Alan Gilbert 2016-11-10 09:28:37 UTC
Destination:
2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004

Source:
2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for (131328/18446744073709551615)

OK, so that says the destination QEMU is being killed by something; and in the libvirt log I see:

2016-11-10 05:16:06.877+0000: 5004: error : virKeepAliveTimerInternal:143 : internal error: connection closed due to keepalive timeout

So this looks like libvirt has decided to kill it; we've had similar timeouts before and we've never understood why.

Comment 7 Pei Zhang 2016-11-23 08:42:39 UTC
The testing results prove that this bug issue is caused by network bandwidth.

Re-test with same memory pressure in same testing environment, migration works well with 10G network bandwidth, however hit this bug issue with 1G network bandwidth. 

David, based on this, should we close this bug 'NOTABUG'? Thanks.


Best Regards,
Pei

Comment 8 Pei Zhang 2016-11-24 05:18:54 UTC
Also tested with qemu, migration works well with 1G network.

Comment 9 Luyao Huang 2016-11-24 06:10:07 UTC
(In reply to Dr. David Alan Gilbert from comment #6)
> Destination:
> 2016-11-10T05:16:06.879084Z qemu-kvm: terminating on signal 15 from pid 5004
> 
> Source:
> 2016-11-10T05:16:06.923389Z qemu-kvm: socket_writev_buffer: Got err=104 for
> (131328/18446744073709551615)
> 
> OK, so that says the destination QEMU is being killed by something; and in
> the libvirt log I see:
> 
> 2016-11-10 05:16:06.877+0000: 5004: error : virKeepAliveTimerInternal:143 :
> internal error: connection closed due to keepalive timeout
> 
> So this looks like libvirt has decided to kill it; we've had similar
> timeouts before and we've never understood why.

Yes, i remember dzheng opened a bug for these problems:

https://bugzilla.redhat.com/show_bug.cgi?id=1367620

Comment 10 Dr. David Alan Gilbert 2016-11-24 09:08:35 UTC
Yes, I don't think we should close this until we understand it.

Comment 11 Jiri Denemark 2018-09-04 11:06:08 UTC
We hit this same issue again in bug 1614182 and we were finally able to
reproduce the issue reliably enough to be able to investigate what's going on
here. The keep alive packets sent by the source libvirtd are stuck in the
kernel somewhere between ssh and the network interface on the source. The
kernel sends them eventually to the network, but it may be already too late.

I'm closing this as a duplicate of 1614182, where we have a lot more
information for further investigation.

*** This bug has been marked as a duplicate of bug 1614182 ***


Note You need to log in before you can comment on or make changes to this bug.