614358 – Kernel panics when using CBQ queueing method in Xen

Bug 614358 - Kernel panics when using CBQ queueing method in Xen

Summary: Kernel panics when using CBQ queueing method in Xen

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.5
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Xen Maintainance List
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	514490
TreeView+	depends on / blocked

Reported:	2010-07-14 09:05 UTC by David Fava
Modified:	2011-04-01 13:40 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-04-01 13:40:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Fava 2010-07-14 09:05:02 UTC

Description of problem:

The kernel randomly panics every 4-5 days when using tc with CBQ method. The Xen node is not heavily loaded and has been extendedly tested against hw defects.

Version-Release number of selected component (if applicable):
kernel-xen.x86_64 2.6.18-194.8.1.el5
xen.x86_64 3.0.3-105.el5_5.3

How reproducible:
It seems to be a random problem. Apparently there is no connection with the number of running domUs on the host. 

Additional info:
The physical server is a Dell R410 with a Broadcom NetXtreme II 5716 network card.
The network drivers were upgraded to the latest version with no results.
Drivers:
driver: bnx2
version: 2.0.8e
firmware-version: 5.0.9 bc 5.0.6 NCSI 2.0.3
bus-info: 0000:01:00.0

Panic trace:
Unable to handle kernel paging request at ffff88002a7a6580 RIP:
 [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711
PGD 1062067 PUD 1063067 PMD 11b7067 PTE 0
Oops: 0000 [1] SMP
last sysfs file: /devices/xen-backend/vbd-1-51714/statistics/wr_req
CPU 0
Modules linked in: tun nls_utf8 cls_fw iptable_mangle xt_MARK sch_tbf sch_htb dell_rbu act_police sch_ingress cls_u32 sch$
Pid: 0, comm: swapper Tainted: G      2.6.18-194.8.1.el5xen #1
RIP: e030:[<ffffffff883620b1>]  [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711
RSP: e02b:ffffffff80684e20  EFLAGS: 00010212
RAX: 000000000000175d RBX: 0000000000000000 RCX: 0000000000000003
RDX: ffff88002a7a0800 RSI: ffffffffcc2d7fee RDI: 0000000000000e50
RBP: 0000000000000000 R08: ffff88001d94fc00 R09: 000000000000baea
R10: 0000000000000018 R11: ffff88001d94fc00 R12: ffff880008e198c0
R13: ffff880008e19800 R14: ffff880026f40a00 R15: 0000000106eef0b1
FS:  00002b78ee445d00(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff80642000, task ffffffff804f4b00)
Stack:  00000aa8ffffffff  ffff880008e19800  ffff8800252f5db8  0000000000000000
 000000004c3d70db  00000000000034d6  0000000106eef0b1  ffff880026f40800
 0000000000000000  0000000000000000
Call Trace:
 <IRQ>  [<ffffffff8042ffeb>] __qdisc_run+0x76/0x1f9
 [<ffffffff80420fe9>] net_tx_action+0xc9/0xf1
 [<ffffffff80212cd3>] __do_softirq+0x8d/0x13b
 [<ffffffff80260da4>] call_softirq+0x1c/0x278
 [<ffffffff8026e0c1>] do_softirq+0x31/0x98
 [<ffffffff8026df4d>] do_IRQ+0xec/0xf5
 [<ffffffff803b3eca>] evtchn_do_upcall+0x13b/0x1fb
 [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
 <EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
 [<ffffffff8029a1db>] rcu_pending+0x26/0x50
 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8
 [<ffffffff8026ca80>] xen_idle+0x38/0x4a
 [<ffffffff8024ad7b>] cpu_idle+0x97/0xba
 [<ffffffff8064cb0f>] start_kernel+0x21f/0x224
 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb


Code: 8b 44 82 0c 48 29 c6 48 8d 14 3e 48 85 d2 0f 8f 9a 00 00 00
RIP  [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711
 RSP <ffffffff80684e20>

Comment 1 Andrew Jones 2010-07-14 10:11:16 UTC

Adding Andy to CC in case there are known issues with this driver.

Comment 2 Paolo Bonzini 2010-12-07 18:39:01 UTC

Found on the CentOS bug reporting tool as well,

http://bugs.centos.org/view.php?id=4426

Comment 3 Paolo Bonzini 2011-04-01 13:40:46 UTC

No activity ever from the reporter, not even to say they re-reproduced it.  Likely some locking problem, but hard to say without a dump.  Please reopen if you see it again.  If somebody wants to try running cbq for a while, perhaps we can reopen it ourselves, too.

Note You need to log in before you can comment on or make changes to this bug.