Bug 1390910 - Guest become unresponsive when do ping-pong migration with ovs-dpdk
Summary: Guest become unresponsive when do ping-pong migration with ovs-dpdk
Keywords:
Status: CLOSED DUPLICATE of bug 1416403
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: pagupta
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-02 08:55 UTC by Pei Zhang
Modified: 2017-02-06 06:30 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-06 06:30:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pei Zhang 2016-11-02 08:55:27 UTC
Description of problem:
Start ovs-dpdk in 2 hosts, and boot guest with vhostuser in one host, then do ping-pong migration several times(seems no regularity, 1 or 2, or 20 times), guest will become unresponsive.


Version-Release number of selected component (if applicable):
3.10.0-514.rt56.420.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
tuned-2.7.1-3.el7.noarch
tuned-profiles-realtime-2.7.1-3.el7.noarch
tuned-profiles-nfv-2.7.1-3.el7.noarch
dpdk-16.07.zip
openvswitch-2.6.0.tar.gz


How reproducible:
Always.


Steps to Reproduce:
1. Start ovs-dpdk in host1 and host2, about the ports please refer to [1]

2. Boot guest with vhostuser in host1, full xml please refer to[2]

3. Migrate guest from host1 to host2
# virsh migrate --live $guest_name qemu+ssh://$des_ip/system

4. Do ping-pong migration several times, guest will become unresponsive.
Check guest status, it's 'running' in src host and 'paused' in des host.


Actual results:
Guest become unresponsive after several times(1 or 2, or 20) ping-pong migration. 


Expected results:
Guest should keep working well.

Additional info:
1. RHEL7.3(with kernel 3.10.0-514.el7.x86_64) seems doesn't hit this issue. 
Did ping-pong migration 50 times, guest keeps working well.

2. In my testing, src host and des host are same machine type. 
Dell Inc. PowerEdge R430
Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz

3. reference
[1]
# # ovs-vsctl show
ed08ca6d-ddc4-4e5b-9056-4710811e6aae
    Bridge "ovsbr0"
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
    Bridge "ovsbr1"
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal

[2] full guest xml file
<domain type='kvm'>
  <name>rhel7.3-2q-rt</name>
  <uuid>ff8b540c-9ef3-4441-9b0d-da7c97543f22</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='39'/>
    <vcpupin vcpu='1' cpuset='38'/>
    <vcpupin vcpu='2' cpuset='36'/>
    <vcpupin vcpu='3' cpuset='34'/>
    <vcpupin vcpu='4' cpuset='32'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='28'/>
    <vcpupin vcpu='7' cpuset='37'/>
    <emulatorpin cpuset='1,3,5,7'/>
    <vcpusched vcpus='0-7' scheduler='fifo' priority='1'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state='off'/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-7' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/rhel7.3-rt.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='none'/>
    <interface type='bridge'>
      <mac address='18:77:da:e6:02:01'/>
      <source bridge='switch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:03'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Comment 4 pagupta 2017-01-02 13:07:26 UTC
Hello Pei Zhang,

I tried to reproduce this issue with the exact environment which you mentioned.
Two of the same kernel versions on which I tried live migration are:

1] kernel-rt-3.10.0-306.0.1.rt56.179.el7.x86_64
2] kernel-rt-kvm-3.10.0-514.7.1.rt56.430.el7.x86_64

For most of the trails Live migration with DPDK succeeded, I did not face any issue. For only couple of instances(out of 50 approx) I encountered 'hung task' time-outs similar to comment 3. 

A] First occurrence

[root@xxxx ~]# cat /proc/24502/task/24502/status 
Name:	qemu-kvm ==> Emulator thread which goes to un-interruptible state, 
State:	D (disk sleep)
Tgid:	24502
Ngid:	0
Pid:	24502
PPid:	1
TracerPid:	0
Uid:	107	107	107	107
Gid:	107	107	107	107
FDSize:	64
Groups:	11 36 107 
VmPeak:	 4548168 kB
VmSize:	 3681140 kB
VmLck:	  535404 kB
VmPin:	       0 kB
VmHWM:	  406248 kB
VmRSS:	  318440 kB
VmData:	  267976 kB
VmStk:	     136 kB
VmExe:	    6352 kB
VmLib:	   40132 kB
VmPTE:	    1096 kB
VmSwap:	       0 kB
Threads:	1
SigQ:	1/256768
SigPnd:	0000000000000100
ShdPnd:	0000000000000100
SigBlk:	0000000010002240
SigIgn:	0000000000001000
SigCgt:	0000000180004243
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000001fffffffff
Seccomp:	0
Cpus_allowed:	0000,00000005
Cpus_allowed_list:	0,2
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
Mems_allowed_list:	1
voluntary_ctxt_switches:	52769
nonvoluntary_ctxt_switches:	2428

B] Second occurence: 'dpdk_nic_bind.p'

557.067767] ixgbe 0000:41:00.0: complete
[ 1230.128120] INFO: task dpdk_nic_bind.p:38951 blocked for more than 600 seconds.
[ 1230.128120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1230.128123] dpdk_nic_bind.p D ffff880731ff50a0     0 38951      1 0x00000084
[ 1230.128126]  ffff880026e13b28 0000000000000002 ffff880026e13fd8 ffff880026e13fd8
[ 1230.128127]  ffff880026e13fd8 ffff880026e13fd8 ffffffff819be460 ffff880731ff50a0
[ 1230.128128]  7fffffffffffffff ffff880026e13c68 ffff880731ff50a0 00000000ffffffff
[ 1230.128129] Call Trace:
[ 1230.128134]  [<ffffffff81685b30>] schedule+0x30/0xa0
[ 1230.128136]  [<ffffffff81683c01>] schedule_timeout+0x2e1/0x380
[ 1230.128139]  [<ffffffff810bbf5c>] ? try_to_wake_up+0x6c/0x540
[ 1230.128140]  [<ffffffff81684a04>] wait_for_completion+0xc4/0x100
[ 1230.128143]  [<ffffffff8109bb51>] flush_work+0xf1/0x170
[ 1230.128145]  [<ffffffff8109a4f0>] ? flush_workqueue_prep_pwqs+0x1d0/0x1d0
[ 1230.128146]  [<ffffffff8109d8a5>] work_on_cpu+0x75/0x90
[ 1230.128148]  [<ffffffff81099ea0>] ? find_worker_executing_work+0x90/0x90
[ 1230.128151]  [<ffffffff81353dd0>] ? pci_device_shutdown+0x70/0x70
[ 1230.128152]  [<ffffffff81355362>] pci_device_probe+0x142/0x150
[ 1230.128156]  [<ffffffff8141cb05>] driver_probe_device+0x145/0x3c0
[ 1230.128157]  [<ffffffff8141ce53>] __driver_attach+0x93/0xa0
[ 1230.128159]  [<ffffffff8141cdc0>] ? __device_attach+0x40/0x40
[ 1230.128160]  [<ffffffff8141a693>] bus_for_each_dev+0x73/0xc0
[ 1230.128162]  [<ffffffff8141c44e>] driver_attach+0x1e/0x20
[ 1230.128163]  [<ffffffff81354eaf>] pci_add_dynid+0xaf/0xd0
[ 1230.128164]  [<ffffffff813554d7>] store_new_id+0x167/0x1b0
[ 1230.128167]  [<ffffffff8141a384>] drv_attr_store+0x24/0x40
[ 1230.128170]  [<ffffffff812732f9>] sysfs_write_file+0xc9/0x140
[ 1230.128172]  [<ffffffff811f286d>] vfs_write+0xbd/0x1e0
[ 1230.128174]  [<ffffffff81116404>] ? __audit_syscall_entry+0xb4/0x110
[ 1230.128176]  [<ffffffff811f338f>] SyS_write+0x7f/0xe0
[ 1230.128178]  [<ffffffff8168fc94>] tracesys+0xdd/0xe2

Both these occurrence occurred at different time stamps. In second occurrence we were just binding NIC with user-space driver.

Possibility of occurrence of both the events could have common cause host side might not be related to live migration. I am continuing my debugging to find out what exactly is happening.  

I have questions for you: 

* At your set-up does issue occur frequently?
* Also, any other observation you have which will help to narrow down the issue?
* Can we setup host to crash when next time issue occurs so that we can capture 
host memory dump?

Best regards,
Pankaj

Comment 6 pagupta 2017-01-25 12:15:36 UTC
Hello Pei Zhang,

I could reproduce this issue and take guest crash. In guest logs I could see below stack trace which gave me a pointer to patch we backported in RHEL-RT. But it looks like this patch is not committed due to some reason. I am starting a conversation on 'kvm-rt' about the patch. 

http://post-office.corp.redhat.com/archives/kvm-rt/2016-March/msg00008.html 

[ 2189.765245] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W      ------------   3.10.0-537.rt56.444.el7.x86_64 #1
[ 2189.765245] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2189.765247]  ffff8800b87affd8 1eebb7a41eebb7a1 ffff8800b87afc58 ffffffff8167f44c
[ 2189.765248]  ffff8800b87afc68 ffffffff81679ebd ffff8800b87afcc8 ffffffff81683868
[ 2189.765249]  ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8
[ 2189.765250] Call Trace:
[ 2189.765251]  [<ffffffff8167f44c>] dump_stack+0x19/0x1b
[ 2189.765252]  [<ffffffff81679ebd>] __schedule_bug+0x62/0x70
[ 2189.765254]  [<ffffffff81683868>] __schedule+0x698/0x7e0
[ 2189.765256]  [<ffffffff816839e0>] schedule+0x30/0xa0
[ 2189.765257]  [<ffffffff8168486d>] rt_spin_lock_slowlock+0xdd/0x240
[ 2189.765258]  [<ffffffff81684ff5>] rt_spin_lock+0x25/0x30
[ 2189.765260]  [<ffffffff81054275>] apf_task_wake_all+0x25/0x80
[ 2189.765262]  [<ffffffff81054546>] kvm_async_pf_task_wake+0x116/0x130
[ 2189.765263]  [<ffffffff81688e98>] do_async_page_fault+0x68/0xf0
[ 2189.765265]  [<ffffffff81685da8>] async_page_fault+0x28/0x30
[ 2189.765266]  [<ffffffff81055226>] ? native_safe_halt+0x6/0x10
[ 2189.765267]  [<ffffffff810262fd>] default_idle+0x2d/0x130
[ 2189.765268]  [<ffffffff81026f4e>] arch_cpu_idle+0x2e/0x40
[ 2189.765269]  [<ffffffff810dd13f>] cpu_startup_entry+0x2af/0x340
[ 2189.765271]  [<ffffffff81042858>] start_secondary+0x1b8/0x230
[ 2189.765272] bad: scheduling from the idle thread!


Thanks,
Pankaj


Note You need to log in before you can comment on or make changes to this bug.