Bug 112844

Summary: cciss 2.4.49 driver panic with HP Smart Array
Product: Red Hat Enterprise Linux 2.1 Reporter: Simon Brady <simon.brady>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: cott, jason.dixon, katzj, mhernandez, mike.miller, notting, riel, steve, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: 2.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-09 15:25:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
lspci -nvv output for failing SA-5i none

Description Simon Brady 2004-01-03 21:25:53 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
The cciss 2.4.49 driver included in the 2.4.9-e.34 kernel causes a
boot-time kernel panic with certain HP Smart Array firmware revs.
Console output during boot (first line is for context):

NET4: Unix domain sockets 1.0/SMP for Linux NET 4.0.
request_module[block-major-104]: Root fs not mounted
VFS: Cannot open root device "cciss/c0d0p1" or 68:01
Please append a correct "root=" boot option
Kernel panic: VFS: Unable to mount root fs on 68:01

This has occurred on a Proliant DL380 G1 with a Smart Array 5304/256
firmware 2.92 and a DL380 G2 with onboard Smart Array 5i firmware
2.38. The problem is definately the 2.4.49 cciss.o driver in the
initrd image, because replacing it with cciss_2445.o fixes things.

However, the original 2.4.49 does work after upgrading the 5304
firmware to 3.40 (I'm unable to upgrade the 5i at present). If this
firmware dependency is by design then it would be helpful to document
it in the RHSA-2003:408 announcement.

Version-Release number of selected component (if applicable):
kernel-2.4.9-e.34

How reproducible:
Always

Steps to Reproduce:
1.Upgrade to kernel-2.4.9-e.34 with one of the above controller revs
2.Reboot
3.
    

Actual Results:  Kernel panic

Expected Results:  No kernel panic

Additional info:

Comment 1 Simon Brady 2004-01-03 21:27:48 UTC
Created attachment 96752 [details]
lspci -nvv output for failing SA-5i

Comment 2 Mike Miller (OS Dev) 2004-01-06 17:23:57 UTC
There is no intended dependency on the f/w version. I'll diff the 
drivers and check what has changed.
I have also requested that our vendor relations manager assign this 
to a test team for further investigation.

Comment 3 Mike Miller (OS Dev) 2004-01-06 22:06:07 UTC
I checked the differences between the drivers and there is no 
apparent reason why 2.4.49 should fail. Please send some config info 
such as the number & type of controllers in the system. Are there any 
drives on the embedded controller? Is the embedded running as dumb 
SCSI or as cpqarray?
Does the driver load during boot? Are there any error messages when 
the driver loads? 
I'm guessing this happens when rebooting after the install completes. 
If so, rescue the system and check /etc/modules.conf for entries for 
both versions of cciss. If they exist, delete the entry for the older 
driver and build a new initrd to see if that helps.
This has been assigned to a test team for investigation.

Comment 4 Steve Randall 2004-01-08 09:04:12 UTC
We also have this problem. It's a Compaq 5304 controller. The normal
sym53c8xx driver does NOT load, inducing the Kernel panic. I've
switched back to 2.4.9-e.27 for now (which continues to work perfectly).

Comment 5 Steve Randall 2004-01-08 09:07:26 UTC
Sorry I should have said, this is RedHat AS2.1. Let me know via e-mail
if you need any more information. The controller is running 2 arrays
and is the sole RAID controller in the machine.

Comment 6 Simon Brady 2004-01-08 21:04:58 UTC
It turns out the firmware upgrade was a red herring. The problem is
that upgrading from kernel-2.4.9-e.24 (the 2.1 AS Update 2 default) to
2.4.9-e.34 fails to create an initrd image due to the alias line in
/etc/modules.conf referencing cciss_2427 (the -e.34 install even
prints the warning "No module cciss_2427 found for kernel 2.4.9-e.34"
from mkinitrd, but I'd somehow missed that). So when you boot with the
new kernel there isn't an initrd to load cciss.o from, hence the panic.

Commenting out the reference to cciss_2427 in /etc/modules.conf prior
to installing -e.34 (or -e.35, which behaves the same way) fixes the
problem. I think at some point during my troubleshooting I changed
this to reference cciss_2445 and reinstalled -e.34, which allowed it
to create an initrd - because I hadn't noticed its absence first time
round, the firmware upgrade seemed to be the only difference between a
system that booted and one that didn't.

Sorry to have misled you, and a big thank-you to the person who
e-mailed me with this suggestion (you know who you are).


Comment 7 Jason Dixon 2004-01-20 20:14:54 UTC
I'm also experiencing the same problem.  We're using the 5300
controller in DL380 G1's.  I've flashed to 3.54, no good.  I've tried
to use cciss.o and cciss_2445.o on scsi_hostadapter2, both result in a
kernel panic.  Besides the fact that I can't fix it, e35 assigns
cciss_2427 to the controller, even though 2427 doesn't even exist in
that kernel.  What's up with that?

Comment 8 Jason Dixon 2004-01-20 20:20:27 UTC
To be precise, it's a 5302 controller.  And to clarify, this is using
the e35 kernel.

Comment 9 Tom Coughlan 2004-01-20 21:34:37 UTC
Is the kernel panic you see the same as the one described above?  

If so, then you need to re-make the initrd after you change
scsi_hostadapterX in modules.conf. If you already did that, or you are
seeing a different panic, then please provide more detailed information.

The way this was supposed to work is:

1) notice in the release notes that the cciss_2427 driver has been
removed.

2) edit modules.conf and change cciss_2427 to cciss.

3) then update the kernel.

We will update the release notes to make this requirement explicit. 
In the future this should not occur very often, because we will not be
using the driver_nnnn names as the default driver.

Comment 10 Jason Dixon 2004-01-20 23:13:25 UTC
Yes, the kernel panic was exactly the same.  As I mentioned, I've
tried it with both the cciss and cciss_2445 (which works with e34)
drivers.  Yes, I rebuilt the initrd each time.

What other "detailed information" would you like?

The exact steps I took in testing this:

1) Built server via kickstart to e30.
2) Patched everything except kernel packages.
3) Downloaded e34 UP and SMP kernels.
4) Installed e34 UP and SMP kernels (-ivh).
5) Modified modules.conf, changing cciss_2427 to cciss.
6) Built initrd for e34 UP and SMP.
7) Rebooted into kernel panic.
8) Modified modules.conf, changing cciss to cciss_2445.
7) Rebuilt initrd for e34 UP and SMP.
8) Rebooted successfully.
9) Downloaded e35 UP and SMP kernels.
10) Installed e35 UP and SMP kernels (-ivh).
11) Built initrd for e35 UP and SMP.
12) Rebooted into kernel panic.
13) Modified modules.conf, changing cciss_2445 to cciss.
14) Rebuilt initrd for e35 UP and SMP.
15) Rebooted into kernel panic.

So, as you can see, I've tried all possible permutations between the
e34 and e35.  These are all on an SMP DL380-G2 (yeah, I know, why did
I build the UP?).

Comment 11 Simon Brady 2004-01-20 23:29:03 UTC
FWIW, I just commented out the cciss_2427 line altogether, then
reinstalled e.34 and the problem Went Away.

With regard to Tom's comments, what tripped me up was the jump from
(1) to (2). Because I hadn't explicitly enabled cciss_2427, I wrongly
assumed that its removal was irrelevant to my setup (the initial
update of a freshly installed system).

Not automatically adding cciss_nnnn to modules.conf will certainly
help, but I'd also suggest adding a check on the result of mkinitrd
and an explicit "Initial ramdisk not created, your system might not
boot" warning message. This would defend against any future problems
of this type by clearly flagging them as install failures rather than
boot failures.


Comment 12 Jason Dixon 2004-01-22 11:10:04 UTC
Typo in my last post.  These are DL-380 *G1* servers, as mentioned in
my previous posts.

Comment 13 Jason Dixon 2004-01-22 19:45:03 UTC
Update.  I've also tried commenting out the cciss_2427 line as Simon
suggested (rather than changing it) on the e35 kernel.  Same kernel panic.

Comment 14 Jason Dixon 2004-01-30 02:55:32 UTC
After being contacted by an HP engineer, I rebuild the initrd without
the cpqarray driver.  The e35 kernel now boots successfully, but no
support for the integrated Smart Array controller (not acceptable).

Comment 15 Jason Dixon 2004-02-18 20:29:22 UTC
What is the status on this?  New builds on the e34-e37 kernel still
fail.   The only method for fixing this is to boot into a working
kernel (e30 or older), comment out any instances of cpqarray and
cciss_24xx, and mkinitrd.  This is NOT acceptable for
automated/kickstart build environments!

Comment 16 Jason Dixon 2004-02-18 20:47:11 UTC
*** Correction ***

This does NOT work on e37 kernels.  Using the same 5300 controller, I
booted into the e30 kernel, edited modules.conf to remove cpqarray and
cciss_2427 support, build new images with mkinitrd, and it boots into
a kernel panic.

[snip]
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
request_module[block-major-104]: Root is not mounted
VFS: Cannot open root device "cciss/c0d0p3" or 68:03
Please append a correct "root=" boot option
Kernel panic: VFS: Unable to mount root fs on 68:03
[snip]

Comment 17 Jason Dixon 2004-02-19 16:46:46 UTC
Update:

The e35, e37, and e38 kernels work when all older modules are
commented out, the initrd is rebuilt, and the bootloader is edited to
load the initrd (*blush*).  This still doesn't fix the fact that the
modules.conf is broken post-errata with erroneous entries for
cciss_2425 or cciss_2445.

Comment 18 Jason Dixon 2004-02-19 18:21:59 UTC
The 6402 controller does not work with this driver.  Attempting an
install on the 6402 finishes successfully, but panics on reboot. 
Tried with both update2 and update3.  Both 2.4.45 and 2.4.49 end in
panic.  This is on a DL380 G3 with the following:

Integrated Smart Array 5i
Compaq Smart Array 6400 (v1.32)
HP BIOS P29 10/31/2003 

Thanks,
Jason

Comment 19 Simon Brady 2004-02-20 00:12:35 UTC
> This still doesn't fix the fact that the modules.conf is broken
> post-errata with erroneous entries for cciss_2427 or cciss_2445.

In fact there are two problems with the cciss_nnnn entries. First, as
noted they stop mkinitrd working during an upgrade. Second, even
without upgrading they can cause the incorrect driver to be loaded in
a system with multiple CCISS devices.

I encountered this on a DL380 G3 with its onboard SA-5i and an added
SA-5304. After installing 2.1 AS Update 2 (e.24 kernel),
/etc/modules.conf had

alias scsi_hostadapter cciss
alias scsi_hostadapter1 cciss_2427
alias scsi_hostadapter2 cciss_2427

scsi_hostadapter2 is outright bogus as discussed, but the entry for
scsi_hostadapter1 would have silently worked - with the old driver.

So please, Red Hat, deliver us from this cciss_nnnn madness: let the
admin decide if they want to load old drivers instead of doing it for
them.

Comment 20 Steve Randall 2004-02-20 08:49:45 UTC
Please be aware there is an additional problem with cciss (this is in
response to Jason Dixon's note). When multiple Smart Array controllers
are installed in a server, and the first one discovered by cciss is
NOT the boot controller you get a Kernel panic.

I've even tried mounting the partitions using EXT2 labels - still no
joy. This is a real issue when you DONT want to boot from the 5i but
still want to use it for other things.

The usual errors are when initrd tries to pivotroot into the live
environment. I found an entry for this bug on the HP site but couldn't
find an associated RedHat Bugzilla.

Comment 21 Steve Randall 2004-02-20 10:35:50 UTC
Here's the URL for the "Multiple Smart Array" problem on the HP site:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=PSD_EU030710_CW02

Steve

Comment 23 Jason Dixon 2004-02-24 14:39:47 UTC
Update to my previous comment on the 6400.  This is an unrelated
issue... it appears the 6400 will not work when using grub.  Opening a
separate bug report...

Comment 25 Jason Dixon 2004-04-21 17:59:53 UTC
Ok, two months have passed since the last post.  What has Red Hat done
to advance this issue?

Comment 26 Cott Lang 2004-04-23 20:02:35 UTC
I don't know. I really enjoy being 6 patches behind on my RHES 2.1
systems because of this. I guess I will try e40 and see if perhaps
it's fixed.

Comment 27 Tom Coughlan 2004-05-04 15:16:37 UTC
There may be multiple issues intermixed in this report.  I'll address
the one that I think is at the center of this.  If there are other
issues, please clarify.

We have found that when AS 2.1 U2 is installed, kudzu puts two lines
in modules.conf:

alias scsi_hostadapter2 cciss
alias scsi_hostadapter3 cciss_2427

The second line is an unintended consequence of the fact that the
cciss_2427 driver calls MODULE_DEVICE_TABLE, and thereby gets listed
in the file /lib/modules/2.4.9-e.whatever/modules.pcimap. Under
certain circumstances, Kudzu refers to this file and erroneously adds
cciss_2427 to modules.conf.

The cciss_2427.o file is removed in AS 2.1 U3 and later kernels, so an
upgrade will fail on any system that has cciss_2427 in modules.conf.

The solution to this problem is to remove the cciss_2427 line from
modules.conf before doing the upgrade.  If you are using kickstart,
you should be able to do this in a %pre section in the kickstart file.

In the future we will ensure that kudzu never puts module_nnnn files
in modules.conf.  If there are versions of kudzu in the field that do
use these module names, then we will preserve these files in the
kernel, so upgrades will continue to work.




Comment 28 Marco Hernandez 2004-11-18 03:57:22 UTC
I am having problems while installing the
kernel-enterprise-2.4.9-e.49.i686.rpm in a DL 740 with AS 2.1
[root@dc10bd rhn-packages]# rpm -ivh kernel-enterprise-2.4.9-e.49.i686.rpm
Preparing...               
########################################### [100%]
   1:kernel-enterprise     
########################################### [100%]
No module cciss_2427 found for kernel 
 Any ideas ?

Comment 29 Tom Coughlan 2004-11-18 11:42:30 UTC
Look at /etc/modules.conf.  If "cciss_2427" is mentioned in there,
replace it with "cciss". Then try the rpm -i again.

Comment 30 Mike Miller (OS Dev) 2004-11-18 22:57:21 UTC
Remove the entry for cciss_2427 from /etc/modules.conf.