Bug 112844
Summary: | cciss 2.4.49 driver panic with HP Smart Array | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Simon Brady <simon.brady> | ||||
Component: | kernel | Assignee: | Tom Coughlan <coughlan> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 2.1 | CC: | cott, jason.dixon, katzj, mhernandez, mike.miller, notting, riel, steve, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-05-09 15:25:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Simon Brady
2004-01-03 21:25:53 UTC
Created attachment 96752 [details]
lspci -nvv output for failing SA-5i
There is no intended dependency on the f/w version. I'll diff the drivers and check what has changed. I have also requested that our vendor relations manager assign this to a test team for further investigation. I checked the differences between the drivers and there is no apparent reason why 2.4.49 should fail. Please send some config info such as the number & type of controllers in the system. Are there any drives on the embedded controller? Is the embedded running as dumb SCSI or as cpqarray? Does the driver load during boot? Are there any error messages when the driver loads? I'm guessing this happens when rebooting after the install completes. If so, rescue the system and check /etc/modules.conf for entries for both versions of cciss. If they exist, delete the entry for the older driver and build a new initrd to see if that helps. This has been assigned to a test team for investigation. We also have this problem. It's a Compaq 5304 controller. The normal sym53c8xx driver does NOT load, inducing the Kernel panic. I've switched back to 2.4.9-e.27 for now (which continues to work perfectly). Sorry I should have said, this is RedHat AS2.1. Let me know via e-mail if you need any more information. The controller is running 2 arrays and is the sole RAID controller in the machine. It turns out the firmware upgrade was a red herring. The problem is that upgrading from kernel-2.4.9-e.24 (the 2.1 AS Update 2 default) to 2.4.9-e.34 fails to create an initrd image due to the alias line in /etc/modules.conf referencing cciss_2427 (the -e.34 install even prints the warning "No module cciss_2427 found for kernel 2.4.9-e.34" from mkinitrd, but I'd somehow missed that). So when you boot with the new kernel there isn't an initrd to load cciss.o from, hence the panic. Commenting out the reference to cciss_2427 in /etc/modules.conf prior to installing -e.34 (or -e.35, which behaves the same way) fixes the problem. I think at some point during my troubleshooting I changed this to reference cciss_2445 and reinstalled -e.34, which allowed it to create an initrd - because I hadn't noticed its absence first time round, the firmware upgrade seemed to be the only difference between a system that booted and one that didn't. Sorry to have misled you, and a big thank-you to the person who e-mailed me with this suggestion (you know who you are). I'm also experiencing the same problem. We're using the 5300 controller in DL380 G1's. I've flashed to 3.54, no good. I've tried to use cciss.o and cciss_2445.o on scsi_hostadapter2, both result in a kernel panic. Besides the fact that I can't fix it, e35 assigns cciss_2427 to the controller, even though 2427 doesn't even exist in that kernel. What's up with that? To be precise, it's a 5302 controller. And to clarify, this is using the e35 kernel. Is the kernel panic you see the same as the one described above? If so, then you need to re-make the initrd after you change scsi_hostadapterX in modules.conf. If you already did that, or you are seeing a different panic, then please provide more detailed information. The way this was supposed to work is: 1) notice in the release notes that the cciss_2427 driver has been removed. 2) edit modules.conf and change cciss_2427 to cciss. 3) then update the kernel. We will update the release notes to make this requirement explicit. In the future this should not occur very often, because we will not be using the driver_nnnn names as the default driver. Yes, the kernel panic was exactly the same. As I mentioned, I've tried it with both the cciss and cciss_2445 (which works with e34) drivers. Yes, I rebuilt the initrd each time. What other "detailed information" would you like? The exact steps I took in testing this: 1) Built server via kickstart to e30. 2) Patched everything except kernel packages. 3) Downloaded e34 UP and SMP kernels. 4) Installed e34 UP and SMP kernels (-ivh). 5) Modified modules.conf, changing cciss_2427 to cciss. 6) Built initrd for e34 UP and SMP. 7) Rebooted into kernel panic. 8) Modified modules.conf, changing cciss to cciss_2445. 7) Rebuilt initrd for e34 UP and SMP. 8) Rebooted successfully. 9) Downloaded e35 UP and SMP kernels. 10) Installed e35 UP and SMP kernels (-ivh). 11) Built initrd for e35 UP and SMP. 12) Rebooted into kernel panic. 13) Modified modules.conf, changing cciss_2445 to cciss. 14) Rebuilt initrd for e35 UP and SMP. 15) Rebooted into kernel panic. So, as you can see, I've tried all possible permutations between the e34 and e35. These are all on an SMP DL380-G2 (yeah, I know, why did I build the UP?). FWIW, I just commented out the cciss_2427 line altogether, then reinstalled e.34 and the problem Went Away. With regard to Tom's comments, what tripped me up was the jump from (1) to (2). Because I hadn't explicitly enabled cciss_2427, I wrongly assumed that its removal was irrelevant to my setup (the initial update of a freshly installed system). Not automatically adding cciss_nnnn to modules.conf will certainly help, but I'd also suggest adding a check on the result of mkinitrd and an explicit "Initial ramdisk not created, your system might not boot" warning message. This would defend against any future problems of this type by clearly flagging them as install failures rather than boot failures. Typo in my last post. These are DL-380 *G1* servers, as mentioned in my previous posts. Update. I've also tried commenting out the cciss_2427 line as Simon suggested (rather than changing it) on the e35 kernel. Same kernel panic. After being contacted by an HP engineer, I rebuild the initrd without the cpqarray driver. The e35 kernel now boots successfully, but no support for the integrated Smart Array controller (not acceptable). What is the status on this? New builds on the e34-e37 kernel still fail. The only method for fixing this is to boot into a working kernel (e30 or older), comment out any instances of cpqarray and cciss_24xx, and mkinitrd. This is NOT acceptable for automated/kickstart build environments! *** Correction *** This does NOT work on e37 kernels. Using the same 5300 controller, I booted into the e30 kernel, edited modules.conf to remove cpqarray and cciss_2427 support, build new images with mkinitrd, and it boots into a kernel panic. [snip] NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. request_module[block-major-104]: Root is not mounted VFS: Cannot open root device "cciss/c0d0p3" or 68:03 Please append a correct "root=" boot option Kernel panic: VFS: Unable to mount root fs on 68:03 [snip] Update: The e35, e37, and e38 kernels work when all older modules are commented out, the initrd is rebuilt, and the bootloader is edited to load the initrd (*blush*). This still doesn't fix the fact that the modules.conf is broken post-errata with erroneous entries for cciss_2425 or cciss_2445. The 6402 controller does not work with this driver. Attempting an install on the 6402 finishes successfully, but panics on reboot. Tried with both update2 and update3. Both 2.4.45 and 2.4.49 end in panic. This is on a DL380 G3 with the following: Integrated Smart Array 5i Compaq Smart Array 6400 (v1.32) HP BIOS P29 10/31/2003 Thanks, Jason > This still doesn't fix the fact that the modules.conf is broken
> post-errata with erroneous entries for cciss_2427 or cciss_2445.
In fact there are two problems with the cciss_nnnn entries. First, as
noted they stop mkinitrd working during an upgrade. Second, even
without upgrading they can cause the incorrect driver to be loaded in
a system with multiple CCISS devices.
I encountered this on a DL380 G3 with its onboard SA-5i and an added
SA-5304. After installing 2.1 AS Update 2 (e.24 kernel),
/etc/modules.conf had
alias scsi_hostadapter cciss
alias scsi_hostadapter1 cciss_2427
alias scsi_hostadapter2 cciss_2427
scsi_hostadapter2 is outright bogus as discussed, but the entry for
scsi_hostadapter1 would have silently worked - with the old driver.
So please, Red Hat, deliver us from this cciss_nnnn madness: let the
admin decide if they want to load old drivers instead of doing it for
them.
Please be aware there is an additional problem with cciss (this is in response to Jason Dixon's note). When multiple Smart Array controllers are installed in a server, and the first one discovered by cciss is NOT the boot controller you get a Kernel panic. I've even tried mounting the partitions using EXT2 labels - still no joy. This is a real issue when you DONT want to boot from the 5i but still want to use it for other things. The usual errors are when initrd tries to pivotroot into the live environment. I found an entry for this bug on the HP site but couldn't find an associated RedHat Bugzilla. Here's the URL for the "Multiple Smart Array" problem on the HP site: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=PSD_EU030710_CW02 Steve Update to my previous comment on the 6400. This is an unrelated issue... it appears the 6400 will not work when using grub. Opening a separate bug report... Ok, two months have passed since the last post. What has Red Hat done to advance this issue? I don't know. I really enjoy being 6 patches behind on my RHES 2.1 systems because of this. I guess I will try e40 and see if perhaps it's fixed. There may be multiple issues intermixed in this report. I'll address the one that I think is at the center of this. If there are other issues, please clarify. We have found that when AS 2.1 U2 is installed, kudzu puts two lines in modules.conf: alias scsi_hostadapter2 cciss alias scsi_hostadapter3 cciss_2427 The second line is an unintended consequence of the fact that the cciss_2427 driver calls MODULE_DEVICE_TABLE, and thereby gets listed in the file /lib/modules/2.4.9-e.whatever/modules.pcimap. Under certain circumstances, Kudzu refers to this file and erroneously adds cciss_2427 to modules.conf. The cciss_2427.o file is removed in AS 2.1 U3 and later kernels, so an upgrade will fail on any system that has cciss_2427 in modules.conf. The solution to this problem is to remove the cciss_2427 line from modules.conf before doing the upgrade. If you are using kickstart, you should be able to do this in a %pre section in the kickstart file. In the future we will ensure that kudzu never puts module_nnnn files in modules.conf. If there are versions of kudzu in the field that do use these module names, then we will preserve these files in the kernel, so upgrades will continue to work. I am having problems while installing the kernel-enterprise-2.4.9-e.49.i686.rpm in a DL 740 with AS 2.1 [root@dc10bd rhn-packages]# rpm -ivh kernel-enterprise-2.4.9-e.49.i686.rpm Preparing... ########################################### [100%] 1:kernel-enterprise ########################################### [100%] No module cciss_2427 found for kernel Any ideas ? Look at /etc/modules.conf. If "cciss_2427" is mentioned in there, replace it with "cciss". Then try the rpm -i again. Remove the entry for cciss_2427 from /etc/modules.conf. |