Hide Forgot
Description of problem: Start ovs-dpdk in 2 hosts, and boot guest with vhostuser in one host, then do ping-pong migration several times(seems no regularity, 1 or 2, or 20 times), guest will become unresponsive. Version-Release number of selected component (if applicable): 3.10.0-514.rt56.420.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 tuned-2.7.1-3.el7.noarch tuned-profiles-realtime-2.7.1-3.el7.noarch tuned-profiles-nfv-2.7.1-3.el7.noarch dpdk-16.07.zip openvswitch-2.6.0.tar.gz How reproducible: Always. Steps to Reproduce: 1. Start ovs-dpdk in host1 and host2, about the ports please refer to [1] 2. Boot guest with vhostuser in host1, full xml please refer to[2] 3. Migrate guest from host1 to host2 # virsh migrate --live $guest_name qemu+ssh://$des_ip/system 4. Do ping-pong migration several times, guest will become unresponsive. Check guest status, it's 'running' in src host and 'paused' in des host. Actual results: Guest become unresponsive after several times(1 or 2, or 20) ping-pong migration. Expected results: Guest should keep working well. Additional info: 1. RHEL7.3(with kernel 3.10.0-514.el7.x86_64) seems doesn't hit this issue. Did ping-pong migration 50 times, guest keeps working well. 2. In my testing, src host and des host are same machine type. Dell Inc. PowerEdge R430 Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz 3. reference [1] # # ovs-vsctl show ed08ca6d-ddc4-4e5b-9056-4710811e6aae Bridge "ovsbr0" Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser Port "dpdk0" Interface "dpdk0" type: dpdk Port "ovsbr0" Interface "ovsbr0" type: internal Bridge "ovsbr1" Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser Port "dpdk1" Interface "dpdk1" type: dpdk Port "ovsbr1" Interface "ovsbr1" type: internal [2] full guest xml file <domain type='kvm'> <name>rhel7.3-2q-rt</name> <uuid>ff8b540c-9ef3-4441-9b0d-da7c97543f22</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='39'/> <vcpupin vcpu='1' cpuset='38'/> <vcpupin vcpu='2' cpuset='36'/> <vcpupin vcpu='3' cpuset='34'/> <vcpupin vcpu='4' cpuset='32'/> <vcpupin vcpu='5' cpuset='30'/> <vcpupin vcpu='6' cpuset='28'/> <vcpupin vcpu='7' cpuset='37'/> <emulatorpin cpuset='1,3,5,7'/> <vcpusched vcpus='0-7' scheduler='fifo' priority='1'/> </cputune> <numatune> <memory mode='strict' nodeset='0-1'/> </numatune> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pmu state='off'/> <vmport state='off'/> </features> <cpu mode='host-passthrough'> <feature policy='require' name='tsc-deadline'/> <numa> <cell id='0' cpus='0-7' memory='8388608' unit='KiB' memAccess='shared'/> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/mnt/rhel7.3-rt.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> <controller type='pci' index='0' model='pci-root'/> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <controller type='usb' index='0' model='none'/> <interface type='bridge'> <mac address='18:77:da:e6:02:01'/> <source bridge='switch'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:e6:02:02'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/> <model type='virtio'/> <driver name='vhost' queues='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:e6:02:03'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/> <model type='virtio'/> <driver name='vhost' queues='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='2'/> </channel> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/> </memballoon> </devices> </domain>
Hello Pei Zhang, I tried to reproduce this issue with the exact environment which you mentioned. Two of the same kernel versions on which I tried live migration are: 1] kernel-rt-3.10.0-306.0.1.rt56.179.el7.x86_64 2] kernel-rt-kvm-3.10.0-514.7.1.rt56.430.el7.x86_64 For most of the trails Live migration with DPDK succeeded, I did not face any issue. For only couple of instances(out of 50 approx) I encountered 'hung task' time-outs similar to comment 3. A] First occurrence [root@xxxx ~]# cat /proc/24502/task/24502/status Name: qemu-kvm ==> Emulator thread which goes to un-interruptible state, State: D (disk sleep) Tgid: 24502 Ngid: 0 Pid: 24502 PPid: 1 TracerPid: 0 Uid: 107 107 107 107 Gid: 107 107 107 107 FDSize: 64 Groups: 11 36 107 VmPeak: 4548168 kB VmSize: 3681140 kB VmLck: 535404 kB VmPin: 0 kB VmHWM: 406248 kB VmRSS: 318440 kB VmData: 267976 kB VmStk: 136 kB VmExe: 6352 kB VmLib: 40132 kB VmPTE: 1096 kB VmSwap: 0 kB Threads: 1 SigQ: 1/256768 SigPnd: 0000000000000100 ShdPnd: 0000000000000100 SigBlk: 0000000010002240 SigIgn: 0000000000001000 SigCgt: 0000000180004243 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff Seccomp: 0 Cpus_allowed: 0000,00000005 Cpus_allowed_list: 0,2 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002 Mems_allowed_list: 1 voluntary_ctxt_switches: 52769 nonvoluntary_ctxt_switches: 2428 B] Second occurence: 'dpdk_nic_bind.p' 557.067767] ixgbe 0000:41:00.0: complete [ 1230.128120] INFO: task dpdk_nic_bind.p:38951 blocked for more than 600 seconds. [ 1230.128120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1230.128123] dpdk_nic_bind.p D ffff880731ff50a0 0 38951 1 0x00000084 [ 1230.128126] ffff880026e13b28 0000000000000002 ffff880026e13fd8 ffff880026e13fd8 [ 1230.128127] ffff880026e13fd8 ffff880026e13fd8 ffffffff819be460 ffff880731ff50a0 [ 1230.128128] 7fffffffffffffff ffff880026e13c68 ffff880731ff50a0 00000000ffffffff [ 1230.128129] Call Trace: [ 1230.128134] [<ffffffff81685b30>] schedule+0x30/0xa0 [ 1230.128136] [<ffffffff81683c01>] schedule_timeout+0x2e1/0x380 [ 1230.128139] [<ffffffff810bbf5c>] ? try_to_wake_up+0x6c/0x540 [ 1230.128140] [<ffffffff81684a04>] wait_for_completion+0xc4/0x100 [ 1230.128143] [<ffffffff8109bb51>] flush_work+0xf1/0x170 [ 1230.128145] [<ffffffff8109a4f0>] ? flush_workqueue_prep_pwqs+0x1d0/0x1d0 [ 1230.128146] [<ffffffff8109d8a5>] work_on_cpu+0x75/0x90 [ 1230.128148] [<ffffffff81099ea0>] ? find_worker_executing_work+0x90/0x90 [ 1230.128151] [<ffffffff81353dd0>] ? pci_device_shutdown+0x70/0x70 [ 1230.128152] [<ffffffff81355362>] pci_device_probe+0x142/0x150 [ 1230.128156] [<ffffffff8141cb05>] driver_probe_device+0x145/0x3c0 [ 1230.128157] [<ffffffff8141ce53>] __driver_attach+0x93/0xa0 [ 1230.128159] [<ffffffff8141cdc0>] ? __device_attach+0x40/0x40 [ 1230.128160] [<ffffffff8141a693>] bus_for_each_dev+0x73/0xc0 [ 1230.128162] [<ffffffff8141c44e>] driver_attach+0x1e/0x20 [ 1230.128163] [<ffffffff81354eaf>] pci_add_dynid+0xaf/0xd0 [ 1230.128164] [<ffffffff813554d7>] store_new_id+0x167/0x1b0 [ 1230.128167] [<ffffffff8141a384>] drv_attr_store+0x24/0x40 [ 1230.128170] [<ffffffff812732f9>] sysfs_write_file+0xc9/0x140 [ 1230.128172] [<ffffffff811f286d>] vfs_write+0xbd/0x1e0 [ 1230.128174] [<ffffffff81116404>] ? __audit_syscall_entry+0xb4/0x110 [ 1230.128176] [<ffffffff811f338f>] SyS_write+0x7f/0xe0 [ 1230.128178] [<ffffffff8168fc94>] tracesys+0xdd/0xe2 Both these occurrence occurred at different time stamps. In second occurrence we were just binding NIC with user-space driver. Possibility of occurrence of both the events could have common cause host side might not be related to live migration. I am continuing my debugging to find out what exactly is happening. I have questions for you: * At your set-up does issue occur frequently? * Also, any other observation you have which will help to narrow down the issue? * Can we setup host to crash when next time issue occurs so that we can capture host memory dump? Best regards, Pankaj
Hello Pei Zhang, I could reproduce this issue and take guest crash. In guest logs I could see below stack trace which gave me a pointer to patch we backported in RHEL-RT. But it looks like this patch is not committed due to some reason. I am starting a conversation on 'kvm-rt' about the patch. http://post-office.corp.redhat.com/archives/kvm-rt/2016-March/msg00008.html [ 2189.765245] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W ------------ 3.10.0-537.rt56.444.el7.x86_64 #1 [ 2189.765245] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014 [ 2189.765247] ffff8800b87affd8 1eebb7a41eebb7a1 ffff8800b87afc58 ffffffff8167f44c [ 2189.765248] ffff8800b87afc68 ffffffff81679ebd ffff8800b87afcc8 ffffffff81683868 [ 2189.765249] ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8 [ 2189.765250] Call Trace: [ 2189.765251] [<ffffffff8167f44c>] dump_stack+0x19/0x1b [ 2189.765252] [<ffffffff81679ebd>] __schedule_bug+0x62/0x70 [ 2189.765254] [<ffffffff81683868>] __schedule+0x698/0x7e0 [ 2189.765256] [<ffffffff816839e0>] schedule+0x30/0xa0 [ 2189.765257] [<ffffffff8168486d>] rt_spin_lock_slowlock+0xdd/0x240 [ 2189.765258] [<ffffffff81684ff5>] rt_spin_lock+0x25/0x30 [ 2189.765260] [<ffffffff81054275>] apf_task_wake_all+0x25/0x80 [ 2189.765262] [<ffffffff81054546>] kvm_async_pf_task_wake+0x116/0x130 [ 2189.765263] [<ffffffff81688e98>] do_async_page_fault+0x68/0xf0 [ 2189.765265] [<ffffffff81685da8>] async_page_fault+0x28/0x30 [ 2189.765266] [<ffffffff81055226>] ? native_safe_halt+0x6/0x10 [ 2189.765267] [<ffffffff810262fd>] default_idle+0x2d/0x130 [ 2189.765268] [<ffffffff81026f4e>] arch_cpu_idle+0x2e/0x40 [ 2189.765269] [<ffffffff810dd13f>] cpu_startup_entry+0x2af/0x340 [ 2189.765271] [<ffffffff81042858>] start_secondary+0x1b8/0x230 [ 2189.765272] bad: scheduling from the idle thread! Thanks, Pankaj