499782 – RHEL4 : HP-Japan : kernel BUG at drivers/block/cfq-iosched.c:630!

Bug 499782 - RHEL4 : HP-Japan : kernel BUG at drivers/block/cfq-iosched.c:630!

Summary: RHEL4 : HP-Japan : kernel BUG at drivers/block/cfq-iosched.c:630!

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	4.8
Assignee:	Jeff Moyer
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-05-08 06:33 UTC by Lachlan McIlroy
Modified:	2018-11-14 20:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-14 20:21:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Lachlan McIlroy 2009-05-08 06:33:14 UTC

Description of problem:

SYSTEM MAP: System.map-2.6.9-34.0.2
DEBUG KERNEL: vmlinux-2.6.9-34.0.2.ELsmp (2.6.9-34.0.2.ELsmp)
   DUMPFILE: vmcore
       CPUS: 2
       DATE: Wed Apr 22 23:20:46 2009
     UPTIME: 585 days, 20:36:08
LOAD AVERAGE: 0.63, 0.42, 0.30
      TASKS: 331
   NODENAME: mc-ldp03
    RELEASE: 2.6.9-34.0.2.ELsmp
    VERSION: #1 SMP Fri Jun 30 10:33:58 EDT 2006
    MACHINE: i686  (3803 Mhz)
     MEMORY: 4.4 GB
      PANIC: "kernel BUG at drivers/block/cfq-iosched.c:630!"
        PID: 1616
    COMMAND: "kjournald"
       TASK: f76f83b0  [THREAD_INFO: c3720000]
        CPU: 0
      STATE: TASK_RUNNING (PANIC)


------------[ cut here ]------------
kernel BUG at drivers/block/cfq-iosched.c:630!
invalid operand: 0000 [#1]
SMP
Modules linked in: iptable_filter ip_tables parport_pc parport st seos(U) eAC_mini(U) sg cpqci(U) netconsole
netdump dm_mirror dm_mod uhci_hcd ehci_hcd hw_random e1000(U) tg3 bond1(U) bonding(U) floppy ext3 jbd cciss s
d_mod scsi_mod
CPU:    0
EIP:    0060:[<c022a96f>]    Tainted: P      VLI
EFLAGS: 00010046   (2.6.9-34.0.2.ELsmp)
EIP is at cfq_put_request+0x15/0x86
eax: f7d31028   ebx: c375ad6c   ecx: c3777f10   edx: f7d11c40
esi: f7d31028   edi: 00000001   ebp: 00000000   esp: c03eaf88
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1616, threadinfo=c03ea000 task=f76f83b0)
Stack: c375ad6c f7d31028 c02219bc c0223b19 f7f5cc80 c375ad6c 00000000 c02248f6
      00000000 f7400000 00000000 f7df3000 f885569c 00000001 00000001 00000000
      00000082 f7dd4640 00000001 00000000 c3720ce8 c0107472 c3720ccc c03ea000
Call Trace:
[<c02219bc>] elv_put_request+0x9/0xa
[<c0223b19>] __blk_put_request+0x56/0x73
[<c02248f6>] end_that_request_last+0xa7/0xbb
[<f885569c>] do_cciss_intr+0x341/0x4b4 [cciss]
[<c0107472>] handle_IRQ_event+0x25/0x4f
[<c01079d2>] do_IRQ+0x11c/0x1ae
=======================
[<c02d304c>] common_interrupt+0x18/0x20
[<c022007b>] show_pools+0x73/0xe2
[<c0224174>] __make_request+0x452/0x46c
[<c022431c>] generic_make_request+0x18e/0x19e
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c02243f6>] submit_bio+0xca/0xd2
[<c015e7c9>] bio_alloc+0x100/0x168
[<c015e180>] submit_bh+0x141/0x166
[<f8863a62>] journal_commit_transaction+0x847/0xfc1 [jbd]
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<f8865e8d>] kjournald+0xc7/0x219 [jbd]
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c0120291>] autoremove_wake_function+0x0/0x2d
[<c011d549>] schedule_tail+0x31/0xa7
[<f8865dc0>] commit_timeout+0x0/0x5 [jbd]
[<f8865dc6>] kjournald+0x0/0x219 [jbd]
[<c01041f5>] kernel_thread_helper+0x5/0xb
Code: 04 24 39 4c 86 18 b8 00 00 00 00 0f 4f e8 5e 89 e8 5b 5e 5f 5d c3 56 89 c6 53 89 d3 8b 4b 40 8b 50 4c 8
5 c9 74 2e 39 58 08 75 08 <0f> 0b 76 02 1c 7a 2f c0 8d 41 20 39 41 20 74 08 0f 0b 77 02 1c

core can be found on core-i386.gsslab.rdu.redhat.com
Login with kerberos name/password
$ cd /cores/20090429212851/work
/cores/20090429212851/work$ ./crash 

I think what introduced the bug was the patch linux-2.6.9-cciss-update.patch which did this:

-#define CCISS_LOCK(i)  (hba[i]->queue->queue_lock)
+#define CCISS_LOCK(i)  (&hba[i]->lock)

what I think this change has done was cause do_cciss_intr() to acquire a private lock instead of the queue lock.

Comment 1 Jeff Moyer 2009-08-21 14:47:02 UTC

I'll take this bug, if that's okay with you, Tomas.

Cheers,
Jeff

Comment 2 Tomas Henzl 2009-08-24 10:19:24 UTC

(In reply to comment #1)
> I'll take this bug, if that's okay with you, Tomas.
> 
> Cheers,
> Jeff  
OK, thanks,
Tomas

Comment 3 Jeff Moyer 2010-10-14 16:56:12 UTC

I've seen a sprinkling of these bugs across RHEL 4 and RHEL 5, and they all seem to involve cciss devices (and multiple different driver versions).  I'm inclined to think that this is a firmware issue.  If someone is able to reproduce the problem reliably, then we can work with HP to zero in on the problem.  Does anyone have such an environment?

Note You need to log in before you can comment on or make changes to this bug.