Description of problem: The kernel randomly panics every 4-5 days when using tc with CBQ method. The Xen node is not heavily loaded and has been extendedly tested against hw defects. Version-Release number of selected component (if applicable): kernel-xen.x86_64 2.6.18-194.8.1.el5 xen.x86_64 3.0.3-105.el5_5.3 How reproducible: It seems to be a random problem. Apparently there is no connection with the number of running domUs on the host. Additional info: The physical server is a Dell R410 with a Broadcom NetXtreme II 5716 network card. The network drivers were upgraded to the latest version with no results. Drivers: driver: bnx2 version: 2.0.8e firmware-version: 5.0.9 bc 5.0.6 NCSI 2.0.3 bus-info: 0000:01:00.0 Panic trace: Unable to handle kernel paging request at ffff88002a7a6580 RIP: [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711 PGD 1062067 PUD 1063067 PMD 11b7067 PTE 0 Oops: 0000 [1] SMP last sysfs file: /devices/xen-backend/vbd-1-51714/statistics/wr_req CPU 0 Modules linked in: tun nls_utf8 cls_fw iptable_mangle xt_MARK sch_tbf sch_htb dell_rbu act_police sch_ingress cls_u32 sch$ Pid: 0, comm: swapper Tainted: G 2.6.18-194.8.1.el5xen #1 RIP: e030:[<ffffffff883620b1>] [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711 RSP: e02b:ffffffff80684e20 EFLAGS: 00010212 RAX: 000000000000175d RBX: 0000000000000000 RCX: 0000000000000003 RDX: ffff88002a7a0800 RSI: ffffffffcc2d7fee RDI: 0000000000000e50 RBP: 0000000000000000 R08: ffff88001d94fc00 R09: 000000000000baea R10: 0000000000000018 R11: ffff88001d94fc00 R12: ffff880008e198c0 R13: ffff880008e19800 R14: ffff880026f40a00 R15: 0000000106eef0b1 FS: 00002b78ee445d00(0000) GS:ffffffff805d2000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process swapper (pid: 0, threadinfo ffffffff80642000, task ffffffff804f4b00) Stack: 00000aa8ffffffff ffff880008e19800 ffff8800252f5db8 0000000000000000 000000004c3d70db 00000000000034d6 0000000106eef0b1 ffff880026f40800 0000000000000000 0000000000000000 Call Trace: <IRQ> [<ffffffff8042ffeb>] __qdisc_run+0x76/0x1f9 [<ffffffff80420fe9>] net_tx_action+0xc9/0xf1 [<ffffffff80212cd3>] __do_softirq+0x8d/0x13b [<ffffffff80260da4>] call_softirq+0x1c/0x278 [<ffffffff8026e0c1>] do_softirq+0x31/0x98 [<ffffffff8026df4d>] do_IRQ+0xec/0xf5 [<ffffffff803b3eca>] evtchn_do_upcall+0x13b/0x1fb [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c <EOI> [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000 [<ffffffff8029a1db>] rcu_pending+0x26/0x50 [<ffffffff8026f4eb>] raw_safe_halt+0x84/0xa8 [<ffffffff8026ca80>] xen_idle+0x38/0x4a [<ffffffff8024ad7b>] cpu_idle+0x97/0xba [<ffffffff8064cb0f>] start_kernel+0x21f/0x224 [<ffffffff8064c1e5>] _sinittext+0x1e5/0x1eb Code: 8b 44 82 0c 48 29 c6 48 8d 14 3e 48 85 d2 0f 8f 9a 00 00 00 RIP [<ffffffff883620b1>] :sch_cbq:cbq_dequeue+0x166/0x711 RSP <ffffffff80684e20>
Adding Andy to CC in case there are known issues with this driver.
Found on the CentOS bug reporting tool as well, http://bugs.centos.org/view.php?id=4426
No activity ever from the reporter, not even to say they re-reproduced it. Likely some locking problem, but hard to say without a dump. Please reopen if you see it again. If somebody wants to try running cbq for a while, perhaps we can reopen it ourselves, too.