RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1390910 - Guest become unresponsive when do ping-pong migration with ovs-dpdk
Summary: Guest become unresponsive when do ping-pong migration with ovs-dpdk
Keywords:
Status: CLOSED DUPLICATE of bug 1416403
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: pagupta
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-02 08:55 UTC by Pei Zhang
Modified: 2017-02-06 06:30 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-06 06:30:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pei Zhang 2016-11-02 08:55:27 UTC
Description of problem:
Start ovs-dpdk in 2 hosts, and boot guest with vhostuser in one host, then do ping-pong migration several times(seems no regularity, 1 or 2, or 20 times), guest will become unresponsive.


Version-Release number of selected component (if applicable):
3.10.0-514.rt56.420.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
tuned-2.7.1-3.el7.noarch
tuned-profiles-realtime-2.7.1-3.el7.noarch
tuned-profiles-nfv-2.7.1-3.el7.noarch
dpdk-16.07.zip
openvswitch-2.6.0.tar.gz


How reproducible:
Always.


Steps to Reproduce:
1. Start ovs-dpdk in host1 and host2, about the ports please refer to [1]

2. Boot guest with vhostuser in host1, full xml please refer to[2]

3. Migrate guest from host1 to host2
# virsh migrate --live $guest_name qemu+ssh://$des_ip/system

4. Do ping-pong migration several times, guest will become unresponsive.
Check guest status, it's 'running' in src host and 'paused' in des host.


Actual results:
Guest become unresponsive after several times(1 or 2, or 20) ping-pong migration. 


Expected results:
Guest should keep working well.

Additional info:
1. RHEL7.3(with kernel 3.10.0-514.el7.x86_64) seems doesn't hit this issue. 
Did ping-pong migration 50 times, guest keeps working well.

2. In my testing, src host and des host are same machine type. 
Dell Inc. PowerEdge R430
Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz

3. reference
[1]
# # ovs-vsctl show
ed08ca6d-ddc4-4e5b-9056-4710811e6aae
    Bridge "ovsbr0"
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
    Bridge "ovsbr1"
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal

[2] full guest xml file
<domain type='kvm'>
  <name>rhel7.3-2q-rt</name>
  <uuid>ff8b540c-9ef3-4441-9b0d-da7c97543f22</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='39'/>
    <vcpupin vcpu='1' cpuset='38'/>
    <vcpupin vcpu='2' cpuset='36'/>
    <vcpupin vcpu='3' cpuset='34'/>
    <vcpupin vcpu='4' cpuset='32'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='28'/>
    <vcpupin vcpu='7' cpuset='37'/>
    <emulatorpin cpuset='1,3,5,7'/>
    <vcpusched vcpus='0-7' scheduler='fifo' priority='1'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0-1'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pmu state='off'/>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough'>
    <feature policy='require' name='tsc-deadline'/>
    <numa>
      <cell id='0' cpus='0-7' memory='8388608' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/rhel7.3-rt.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='none'/>
    <interface type='bridge'>
      <mac address='18:77:da:e6:02:01'/>
      <source bridge='switch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:03'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost' queues='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Comment 4 pagupta 2017-01-02 13:07:26 UTC
Hello Pei Zhang,

I tried to reproduce this issue with the exact environment which you mentioned.
Two of the same kernel versions on which I tried live migration are:

1] kernel-rt-3.10.0-306.0.1.rt56.179.el7.x86_64
2] kernel-rt-kvm-3.10.0-514.7.1.rt56.430.el7.x86_64

For most of the trails Live migration with DPDK succeeded, I did not face any issue. For only couple of instances(out of 50 approx) I encountered 'hung task' time-outs similar to comment 3. 

A] First occurrence

[root@xxxx ~]# cat /proc/24502/task/24502/status 
Name:	qemu-kvm ==> Emulator thread which goes to un-interruptible state, 
State:	D (disk sleep)
Tgid:	24502
Ngid:	0
Pid:	24502
PPid:	1
TracerPid:	0
Uid:	107	107	107	107
Gid:	107	107	107	107
FDSize:	64
Groups:	11 36 107 
VmPeak:	 4548168 kB
VmSize:	 3681140 kB
VmLck:	  535404 kB
VmPin:	       0 kB
VmHWM:	  406248 kB
VmRSS:	  318440 kB
VmData:	  267976 kB
VmStk:	     136 kB
VmExe:	    6352 kB
VmLib:	   40132 kB
VmPTE:	    1096 kB
VmSwap:	       0 kB
Threads:	1
SigQ:	1/256768
SigPnd:	0000000000000100
ShdPnd:	0000000000000100
SigBlk:	0000000010002240
SigIgn:	0000000000001000
SigCgt:	0000000180004243
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000001fffffffff
Seccomp:	0
Cpus_allowed:	0000,00000005
Cpus_allowed_list:	0,2
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
Mems_allowed_list:	1
voluntary_ctxt_switches:	52769
nonvoluntary_ctxt_switches:	2428

B] Second occurence: 'dpdk_nic_bind.p'

557.067767] ixgbe 0000:41:00.0: complete
[ 1230.128120] INFO: task dpdk_nic_bind.p:38951 blocked for more than 600 seconds.
[ 1230.128120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1230.128123] dpdk_nic_bind.p D ffff880731ff50a0     0 38951      1 0x00000084
[ 1230.128126]  ffff880026e13b28 0000000000000002 ffff880026e13fd8 ffff880026e13fd8
[ 1230.128127]  ffff880026e13fd8 ffff880026e13fd8 ffffffff819be460 ffff880731ff50a0
[ 1230.128128]  7fffffffffffffff ffff880026e13c68 ffff880731ff50a0 00000000ffffffff
[ 1230.128129] Call Trace:
[ 1230.128134]  [<ffffffff81685b30>] schedule+0x30/0xa0
[ 1230.128136]  [<ffffffff81683c01>] schedule_timeout+0x2e1/0x380
[ 1230.128139]  [<ffffffff810bbf5c>] ? try_to_wake_up+0x6c/0x540
[ 1230.128140]  [<ffffffff81684a04>] wait_for_completion+0xc4/0x100
[ 1230.128143]  [<ffffffff8109bb51>] flush_work+0xf1/0x170
[ 1230.128145]  [<ffffffff8109a4f0>] ? flush_workqueue_prep_pwqs+0x1d0/0x1d0
[ 1230.128146]  [<ffffffff8109d8a5>] work_on_cpu+0x75/0x90
[ 1230.128148]  [<ffffffff81099ea0>] ? find_worker_executing_work+0x90/0x90
[ 1230.128151]  [<ffffffff81353dd0>] ? pci_device_shutdown+0x70/0x70
[ 1230.128152]  [<ffffffff81355362>] pci_device_probe+0x142/0x150
[ 1230.128156]  [<ffffffff8141cb05>] driver_probe_device+0x145/0x3c0
[ 1230.128157]  [<ffffffff8141ce53>] __driver_attach+0x93/0xa0
[ 1230.128159]  [<ffffffff8141cdc0>] ? __device_attach+0x40/0x40
[ 1230.128160]  [<ffffffff8141a693>] bus_for_each_dev+0x73/0xc0
[ 1230.128162]  [<ffffffff8141c44e>] driver_attach+0x1e/0x20
[ 1230.128163]  [<ffffffff81354eaf>] pci_add_dynid+0xaf/0xd0
[ 1230.128164]  [<ffffffff813554d7>] store_new_id+0x167/0x1b0
[ 1230.128167]  [<ffffffff8141a384>] drv_attr_store+0x24/0x40
[ 1230.128170]  [<ffffffff812732f9>] sysfs_write_file+0xc9/0x140
[ 1230.128172]  [<ffffffff811f286d>] vfs_write+0xbd/0x1e0
[ 1230.128174]  [<ffffffff81116404>] ? __audit_syscall_entry+0xb4/0x110
[ 1230.128176]  [<ffffffff811f338f>] SyS_write+0x7f/0xe0
[ 1230.128178]  [<ffffffff8168fc94>] tracesys+0xdd/0xe2

Both these occurrence occurred at different time stamps. In second occurrence we were just binding NIC with user-space driver.

Possibility of occurrence of both the events could have common cause host side might not be related to live migration. I am continuing my debugging to find out what exactly is happening.  

I have questions for you: 

* At your set-up does issue occur frequently?
* Also, any other observation you have which will help to narrow down the issue?
* Can we setup host to crash when next time issue occurs so that we can capture 
host memory dump?

Best regards,
Pankaj

Comment 6 pagupta 2017-01-25 12:15:36 UTC
Hello Pei Zhang,

I could reproduce this issue and take guest crash. In guest logs I could see below stack trace which gave me a pointer to patch we backported in RHEL-RT. But it looks like this patch is not committed due to some reason. I am starting a conversation on 'kvm-rt' about the patch. 

http://post-office.corp.redhat.com/archives/kvm-rt/2016-March/msg00008.html 

[ 2189.765245] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W      ------------   3.10.0-537.rt56.444.el7.x86_64 #1
[ 2189.765245] Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2189.765247]  ffff8800b87affd8 1eebb7a41eebb7a1 ffff8800b87afc58 ffffffff8167f44c
[ 2189.765248]  ffff8800b87afc68 ffffffff81679ebd ffff8800b87afcc8 ffffffff81683868
[ 2189.765249]  ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8 ffff8800b87affd8
[ 2189.765250] Call Trace:
[ 2189.765251]  [<ffffffff8167f44c>] dump_stack+0x19/0x1b
[ 2189.765252]  [<ffffffff81679ebd>] __schedule_bug+0x62/0x70
[ 2189.765254]  [<ffffffff81683868>] __schedule+0x698/0x7e0
[ 2189.765256]  [<ffffffff816839e0>] schedule+0x30/0xa0
[ 2189.765257]  [<ffffffff8168486d>] rt_spin_lock_slowlock+0xdd/0x240
[ 2189.765258]  [<ffffffff81684ff5>] rt_spin_lock+0x25/0x30
[ 2189.765260]  [<ffffffff81054275>] apf_task_wake_all+0x25/0x80
[ 2189.765262]  [<ffffffff81054546>] kvm_async_pf_task_wake+0x116/0x130
[ 2189.765263]  [<ffffffff81688e98>] do_async_page_fault+0x68/0xf0
[ 2189.765265]  [<ffffffff81685da8>] async_page_fault+0x28/0x30
[ 2189.765266]  [<ffffffff81055226>] ? native_safe_halt+0x6/0x10
[ 2189.765267]  [<ffffffff810262fd>] default_idle+0x2d/0x130
[ 2189.765268]  [<ffffffff81026f4e>] arch_cpu_idle+0x2e/0x40
[ 2189.765269]  [<ffffffff810dd13f>] cpu_startup_entry+0x2af/0x340
[ 2189.765271]  [<ffffffff81042858>] start_secondary+0x1b8/0x230
[ 2189.765272] bad: scheduling from the idle thread!


Thanks,
Pankaj


Note You need to log in before you can comment on or make changes to this bug.