Bug 786863 - kernel panic in cciss_softirq_done / _spin_lock_irqsave: "not syncing: Fatal exception in interrupt"
Summary: kernel panic in cciss_softirq_done / _spin_lock_irqsave: "not syncing: Fatal ...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Tomas Henzl
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-02 15:34 UTC by Marian Csontos
Modified: 2013-01-02 12:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-01-02 12:37:49 UTC
Target Upstream Version:


Attachments (Terms of Use)
serial console log (49.68 KB, text/plain)
2012-02-02 15:36 UTC, Marian Csontos
no flags Details
System ROM update (1017.64 KB, application/octet-stream)
2012-02-03 15:44 UTC, Mike Miller (OS Dev)
no flags Details

Description Marian Csontos 2012-02-02 15:34:02 UTC
Description of problem:
Booting installation image of RHEL5.8 on storageqe-02.rhts.eng.bos.redhat.com ends up with kernel panic in cciss module.

Version-Release number of selected component (if applicable):
kernel 2.6.18-304.el5 

Linux version 2.6.18-304.el5 (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Mon Jan 9 18:12:44 EST 2012

How reproducible:
???

Steps to Reproduce:
1. submit attached job
  
Actual results:
kernel-panic during boot

Expected results:
normal boot

Additional info:

Oops: 0002 [#1] 
SMP  
last sysfs file: /firmware/edd/int13_dev80/mbr_signature 
Modules linked in: xts xcbc wp512 twofish tgr192 tea sha512 sha256 serpent seqiv michael_mic md5 md4 khazad hmac gf128mul eseqiv ecb des crypto_null deflate zlib_deflate ctr cryptomgr crypto_hash chainiv ccm cbc cast6 cast5 blowfish authenc crypto_blkcipher anubis krng ansi_cprng rng aes_generic aead dm_crypt crypto_algapi dm_emc dm_round_robin dm_multipath scsi_dh dm_snapshot dm_mirror dm_zero lock_nolock gfs2 ext3 jbd ext4 crc16 jbd2 msdos dm_raid45 dm_message dm_mem_cache dm_region_hash dm_log dm_mod raid456 xor raid10 raid1 raid0 ata_piix libata cciss be2net 8021q bnx2 be2iscsi ehci_hcd uhci_hcd ib_ipoib ib_cm ib_sa ib_mad ib_core ipoib_helper iscsi_ibft iscsi_tcp libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi sr_mod sd_mod scsi_mod ide_cd cdrom ipv6 xfrm_nalgo crypto_api squashfs pcspkr edd loop nfs nfs_acl lockd sunrpc vfat fat cramfs 
CPU:    0 
EIP:    0060:[<c0624b83>]    Not tainted VLI 
EFLAGS: 00010006   (2.6.18-304.el5 #1)  
EIP is at _spin_lock_irqsave+0x3/0x27 
eax: 000008b4   ebx: 000008b4   ecx: 00000286   edx: 00000206 
esi: 00000000   edi: 00000000   ebp: dfc35078   esp: c0753f9c 
ds: 007b   es: 007b   ss: 0068 
Process swapper (pid: 0, ti=c0753000 task=c06933c0 task.ti=c070e000) 
Stack: f8b6815d 00000001 000008b4 f7800000 00000000 c070ef80 00000000 c0704b08  
       0000000f c31bce80 dfc35080 00000001 c0704b20 0000000a c04e59ba c0753fd8  
       c0753fd8 c070ef68 c042a91d 00000000 c070ef68 c070e000 00000046 00000063  
Call Trace: 
 [<f8b6815d>] cciss_softirq_done+0x245/0x35f [cciss] 
 [<c04e59ba>] blk_done_softirq+0x4d/0x58 
 [<c042a91d>] __do_softirq+0x87/0x114 
 [<c04073f9>] do_softirq+0x4e/0x92 
 [<c04507b8>] __do_IRQ+0x0/0x118 
 [<c04074f4>] do_IRQ+0xb7/0xc3 
 [<c040597a>] common_interrupt+0x1a/0x20 
 [<c04031f7>] mwait_idle_with_hints+0x4b/0x4f 
 [<c0403207>] mwait_idle+0xc/0x1b 
 [<c0403d14>] cpu_idle+0x9f/0xb9 
 [<c07139fc>] start_kernel+0x37b/0x383 
 ======================= 
Code: d0 c3 f0 81 00 00 00 00 01 8b 04 24 e9 99 5b e0 ff f0 ff 00 8b 04 24 e9 8e 5b e0 ff b2 01 86 10 8b 04 24 e9 82 5b e0 ff 9c 5a fa <f0> fe 08 79 1c f7 c2 00 02 00 00 74 0b fb f3 90 80 38 00 7e f9  
EIP: [<c0624b83>] _spin_lock_irqsave+0x3/0x27 SS:ESP 0068:c0753f9c 
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Comment 1 Marian Csontos 2012-02-02 15:36:28 UTC
Created attachment 559073 [details]
serial console log

Comment 2 Tomas Henzl 2012-02-02 15:56:51 UTC
Marian,
how good is this reproducible - does it happen on every boot?

Comment 3 Tomas Henzl 2012-02-02 15:58:31 UTC
(In reply to comment #2)
> Marian,
> how good is this reproducible - does it happen on every boot?
could I use that machine?

Comment 4 Marian Csontos 2012-02-02 19:02:00 UTC
I have few jobs queued on that machine, I will update with better reproducibility estimate later.

Consider the machine in use now. I will ping you once it is free for your experiments.

Comment 5 Marian Csontos 2012-02-03 14:52:00 UTC
So far reproducibility 0 out of 4 jobs.

Comment 6 Tomas Henzl 2012-02-03 15:11:29 UTC
(In reply to comment #5)
> So far reproducibility 0 out of 4 jobs.

The problem is located only to this single machine there is not much we can do with it now, the likelihood this is a hw problem is high.

Mike,
on this particular machine the problem was reported twice, still this can be a hw problem. Looks the bug description somehow familiar to you?

Comment 7 Mike Miller (OS Dev) 2012-02-03 15:44:34 UTC
Created attachment 559314 [details]
System ROM update

We've seen a few failures during kdump testing on the DL380 G5. But I've never seen this during install. The system ROM is very outdated. I'm attaching the latest image available on hp.com.
NOTE: The update is listed as critical. Copy this file to the system and execute. Do not interrupt the flashing process or it may trash the system.

Comment 8 Tomas Henzl 2012-05-02 13:29:11 UTC
Hi Marian,
have you had a chance to test with the new firmware?

Comment 9 Marian Csontos 2013-01-02 12:37:49 UTC
Hi Tomas, I have not retested as I were unable to reproduce with the old firmware, so the confidence of such testing would be low.

Closing as CANTFIX - Not Reproducible.


Note You need to log in before you can comment on or make changes to this bug.