Description of problem: SYSTEM MAP: System.map-2.6.9-34.0.2 DEBUG KERNEL: vmlinux-2.6.9-34.0.2.ELsmp (2.6.9-34.0.2.ELsmp) DUMPFILE: vmcore CPUS: 2 DATE: Wed Apr 22 23:20:46 2009 UPTIME: 585 days, 20:36:08 LOAD AVERAGE: 0.63, 0.42, 0.30 TASKS: 331 NODENAME: mc-ldp03 RELEASE: 2.6.9-34.0.2.ELsmp VERSION: #1 SMP Fri Jun 30 10:33:58 EDT 2006 MACHINE: i686 (3803 Mhz) MEMORY: 4.4 GB PANIC: "kernel BUG at drivers/block/cfq-iosched.c:630!" PID: 1616 COMMAND: "kjournald" TASK: f76f83b0 [THREAD_INFO: c3720000] CPU: 0 STATE: TASK_RUNNING (PANIC) ------------[ cut here ]------------ kernel BUG at drivers/block/cfq-iosched.c:630! invalid operand: 0000 [#1] SMP Modules linked in: iptable_filter ip_tables parport_pc parport st seos(U) eAC_mini(U) sg cpqci(U) netconsole netdump dm_mirror dm_mod uhci_hcd ehci_hcd hw_random e1000(U) tg3 bond1(U) bonding(U) floppy ext3 jbd cciss s d_mod scsi_mod CPU: 0 EIP: 0060:[<c022a96f>] Tainted: P VLI EFLAGS: 00010046 (2.6.9-34.0.2.ELsmp) EIP is at cfq_put_request+0x15/0x86 eax: f7d31028 ebx: c375ad6c ecx: c3777f10 edx: f7d11c40 esi: f7d31028 edi: 00000001 ebp: 00000000 esp: c03eaf88 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1616, threadinfo=c03ea000 task=f76f83b0) Stack: c375ad6c f7d31028 c02219bc c0223b19 f7f5cc80 c375ad6c 00000000 c02248f6 00000000 f7400000 00000000 f7df3000 f885569c 00000001 00000001 00000000 00000082 f7dd4640 00000001 00000000 c3720ce8 c0107472 c3720ccc c03ea000 Call Trace: [<c02219bc>] elv_put_request+0x9/0xa [<c0223b19>] __blk_put_request+0x56/0x73 [<c02248f6>] end_that_request_last+0xa7/0xbb [<f885569c>] do_cciss_intr+0x341/0x4b4 [cciss] [<c0107472>] handle_IRQ_event+0x25/0x4f [<c01079d2>] do_IRQ+0x11c/0x1ae ======================= [<c02d304c>] common_interrupt+0x18/0x20 [<c022007b>] show_pools+0x73/0xe2 [<c0224174>] __make_request+0x452/0x46c [<c022431c>] generic_make_request+0x18e/0x19e [<c0120291>] autoremove_wake_function+0x0/0x2d [<c02243f6>] submit_bio+0xca/0xd2 [<c015e7c9>] bio_alloc+0x100/0x168 [<c015e180>] submit_bh+0x141/0x166 [<f8863a62>] journal_commit_transaction+0x847/0xfc1 [jbd] [<c0120291>] autoremove_wake_function+0x0/0x2d [<c0120291>] autoremove_wake_function+0x0/0x2d [<f8865e8d>] kjournald+0xc7/0x219 [jbd] [<c0120291>] autoremove_wake_function+0x0/0x2d [<c0120291>] autoremove_wake_function+0x0/0x2d [<c011d549>] schedule_tail+0x31/0xa7 [<f8865dc0>] commit_timeout+0x0/0x5 [jbd] [<f8865dc6>] kjournald+0x0/0x219 [jbd] [<c01041f5>] kernel_thread_helper+0x5/0xb Code: 04 24 39 4c 86 18 b8 00 00 00 00 0f 4f e8 5e 89 e8 5b 5e 5f 5d c3 56 89 c6 53 89 d3 8b 4b 40 8b 50 4c 8 5 c9 74 2e 39 58 08 75 08 <0f> 0b 76 02 1c 7a 2f c0 8d 41 20 39 41 20 74 08 0f 0b 77 02 1c core can be found on core-i386.gsslab.rdu.redhat.com Login with kerberos name/password $ cd /cores/20090429212851/work /cores/20090429212851/work$ ./crash I think what introduced the bug was the patch linux-2.6.9-cciss-update.patch which did this: -#define CCISS_LOCK(i) (hba[i]->queue->queue_lock) +#define CCISS_LOCK(i) (&hba[i]->lock) what I think this change has done was cause do_cciss_intr() to acquire a private lock instead of the queue lock.
I'll take this bug, if that's okay with you, Tomas. Cheers, Jeff
(In reply to comment #1) > I'll take this bug, if that's okay with you, Tomas. > > Cheers, > Jeff OK, thanks, Tomas
I've seen a sprinkling of these bugs across RHEL 4 and RHEL 5, and they all seem to involve cciss devices (and multiple different driver versions). I'm inclined to think that this is a firmware issue. If someone is able to reproduce the problem reliably, then we can work with HP to zero in on the problem. Does anyone have such an environment?