Bug 989854 - Dst guest call trace when test long mulit-queue pktgen test.
Dst guest call trace when test long mulit-queue pktgen test.
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: jason wang
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-30 01:08 EDT by Qian Guo
Modified: 2013-11-04 02:30 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-04 02:30:19 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pktgen script in guestA (1.11 KB, application/x-shellscript)
2013-07-30 01:08 EDT, Qian Guo
no flags Details
dmesg of GuestB when just act as the pktgen dst (13.48 KB, text/plain)
2013-07-30 01:12 EDT, Qian Guo
no flags Details
dmesg of GuestB when pktgen test and change queues loop. (15.97 KB, text/plain)
2013-07-30 01:13 EDT, Qian Guo
no flags Details

  None (edit)
Description Qian Guo 2013-07-30 01:08:59 EDT
Created attachment 780329 [details]
pktgen script in guestA

Description of problem:
GuestA and GuestB both enabled multi-queues in same host, when guestA run pktgen test to the dst guestB, after less than 10 minutes, guestB got call trace, catch it from dmesg.

Version-Release number of selected component (if applicable):
Hos&guest kernel version:kernel-3.10.0-2.el7.x86_64
qemu-kvm version:qemu-kvm-1.5.1-2.el7.x86_64
openvswitch-1.9.0-2.el6ost.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Boot 2 rhel7 guests, they both have 4 smps and queues, one cli like this(GuestB)

# /usr/libexec/qemu-kvm -M pc -device pci-bridge,id=bridge1,chassis_nr=1 -cpu Penryn -m 6G -smp 4,sockets=1,cores=4,threads=1 -enable-kvm -nodefaults -nodefconfig -drive file=/home/rhel7_img/rhel7cp2.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,werror=stop,rerror=stop,aio=native -device virtio-scsi-pci,bus=bridge1,addr=0x1,id=virtio-disk0, -device scsi-hd,bus=virtio-disk0.0,drive=drive-virtio-disk0,id=scsi-hd1 -spice port=5931,disable-ticketing -global qxl-vga.vram_size=67108864 -vga qxl -monitor stdio -netdev tap,id=vnet0,vhost=on,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown,queues=4 -device virtio-net-pci,mq=on,vectors=9,bus=bridge1,addr=0x3,netdev=vnet0,mac=54:52:1a:4b:c2:11,id=vnic1 -boot menu=on -serial unix:/tmp/qiguo2,server,nowait

2.Both guests inside, enable multi-queues:
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	4

3.GuestA test pktgen script and set the dst as guestB.(refer to the attachment for the pktgen script.)

# sh -x pktgen.sh eth0 4


Actual results:
During tests, guestB is like stuck, and after a while(less than 10 minutes), quit the test, can got the call trace messages from dmesg:

Will attach the dmesg in this bug.


Expected results:
no call trace occures

Additional info:
I got call trace when I test this scenario:
GuestA and GuestB both repeatedly change the queues inside:

while true 
do 
for i in 1 2 1 3 1 4 
do 
ethtool -L eth0 combined $i
done
done

and in mean time GuestA do pkgten test. Got the similar call trace, so attach the dmesg messages, too.
Comment 1 Qian Guo 2013-07-30 01:12:43 EDT
Created attachment 780330 [details]
dmesg of GuestB when just act as the pktgen dst
Comment 2 Qian Guo 2013-07-30 01:13:38 EDT
Created attachment 780331 [details]
dmesg of GuestB when pktgen test and change queues loop.
Comment 4 jason wang 2013-07-30 02:38:07 EDT
Could you please disable nf_conntrack and retry?
Comment 5 Qian Guo 2013-07-30 05:41:55 EDT
(In reply to jason wang from comment #4)
> Could you please disable nf_conntrack and retry?

I tried w/ disable nf_conntrack, not hit this issue, 

# cat /etc/modprobe.d/blacklist.conf 
# disable nf_conntrack
 blacklist nf_conntrack
 blacklist nf_conntrack_ipv6
 blacklist xt_conntrack
 blacklist nf_conntrack_ftp
 blacklist xt_state
 blacklist iptable_nat
 blacklist ipt_REDIRECT
 blacklist nf_nat
 blacklist nf_conntrack_ipv4


# lsmod |grep nf
(no list)

# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:		0
TX:		0
Other:		0
Combined:	4
Current hardware settings:
RX:		0
TX:		0
Other:		0
Combined:	4


and then test between these 2 guests, test 3 times,(wait for the finish of pktgen, and it is a long time), both guests not hit this issue.
Comment 6 Qian Guo 2013-07-30 05:55:08 EDT
(In reply to Qian Guo from comment #2)
> Created attachment 780331 [details]
> dmesg of GuestB when pktgen test and change queues loop.

When disable the nf_conntrack, and test this scenario, this guest got the 

[ 2268.041040] BUG: soft lockup - CPU#1 stuck for 23s! [ethtool:8329]
[ 2268.064040] BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
[ 2296.041049] BUG: soft lockup - CPU#1 stuck for 22s! [ethtool:8329]
[ 2296.064033] BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 2365.169015] INFO: rcu_sched self-detected stall on CPU[ 2365.170060] INFO: rcu_sched detected stalls on CPUs/tasks:

repeatedly and the network is not accessable, even I stop the pktgen stress, it can not resume, but it looks like bug 978153, so wont file a new one.
Comment 7 jason wang 2013-09-05 23:09:27 EDT
(In reply to Qian Guo from comment #6)
> (In reply to Qian Guo from comment #2)
> > Created attachment 780331 [details]
> > dmesg of GuestB when pktgen test and change queues loop.
> 
> When disable the nf_conntrack, and test this scenario, this guest got the 
> 
> [ 2268.041040] BUG: soft lockup - CPU#1 stuck for 23s! [ethtool:8329]
> [ 2268.064040] BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
> [ 2296.041049] BUG: soft lockup - CPU#1 stuck for 22s! [ethtool:8329]
> [ 2296.064033] BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
> [ 2365.169015] INFO: rcu_sched self-detected stall on CPU[ 2365.170060]
> INFO: rcu_sched detected stalls on CPUs/tasks:
> 
> repeatedly and the network is not accessable, even I stop the pktgen stress,
> it can not resume, but it looks like bug 978153, so wont file a new one.

Since bz 978153 has been fixed, could you please retest (both w/ and w/o nf_conntrack).

Thanks
Comment 8 Qian Guo 2013-09-05 23:16:00 EDT
(In reply to jason wang from comment #7)
> (In reply to Qian Guo from comment #6)
> > (In reply to Qian Guo from comment #2)
> > > Created attachment 780331 [details]
> > > dmesg of GuestB when pktgen test and change queues loop.
> > 
> > When disable the nf_conntrack, and test this scenario, this guest got the 
> > 
> > [ 2268.041040] BUG: soft lockup - CPU#1 stuck for 23s! [ethtool:8329]
> > [ 2268.064040] BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
> > [ 2296.041049] BUG: soft lockup - CPU#1 stuck for 22s! [ethtool:8329]
> > [ 2296.064033] BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
> > [ 2365.169015] INFO: rcu_sched self-detected stall on CPU[ 2365.170060]
> > INFO: rcu_sched detected stalls on CPUs/tasks:
> > 
> > repeatedly and the network is not accessable, even I stop the pktgen stress,
> > it can not resume, but it looks like bug 978153, so wont file a new one.
> 
> Since bz 978153 has been fixed, could you please retest (both w/ and w/o
> nf_conntrack).
> 
Ok, I will retest this w/ latest kernel and update here later.
> Thanks
Comment 9 Qian Guo 2013-09-09 22:16:53 EDT
Hi, Jason

I test w/ the latest kernel build in all guest/host:
# uname -r
3.10.0-18.el7.x86_64

and the latest qemu-kvm:
# rpm -q qemu-kvm
qemu-kvm-1.5.3-2.el7.x86_64

Test pktgen from guestA to guestB, and both guests enabled mq, and both w/ nf enabled/disabled, didn't hit call trace.


So the latest components have fixed this issue.
Comment 10 jason wang 2013-11-04 02:30:19 EST
(In reply to Qian Guo from comment #9)
> Hi, Jason
> 
> I test w/ the latest kernel build in all guest/host:
> # uname -r
> 3.10.0-18.el7.x86_64
> 
> and the latest qemu-kvm:
> # rpm -q qemu-kvm
> qemu-kvm-1.5.3-2.el7.x86_64
> 
> Test pktgen from guestA to guestB, and both guests enabled mq, and both w/
> nf enabled/disabled, didn't hit call trace.
> 
> 
> So the latest components have fixed this issue.

Close this bug according to this comment.

Note You need to log in before you can comment on or make changes to this bug.