Bug 106448 - GRUB failing to install on recent HP DL servers with cciss RAID controllers
Summary: GRUB failing to install on recent HP DL servers with cciss RAID controllers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Baron
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-10-07 08:20 UTC by Nick Strugnell
Modified: 2013-03-06 05:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-04-22 01:02:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Allows main to recognise /dev/cciss/c1dxxx (745 bytes, patch)
2004-02-02 06:29 UTC, John Newbigin
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2004:105 0 normal SHIPPED_LIVE Moderate: kernel security update 2004-04-21 04:00:00 UTC

Description Nick Strugnell 2003-10-07 08:20:16 UTC
Description of problem: 
 
GRUB does not install properly on recent HP DL servers (servers < 1 year old) 
I have noticed this on DL380G3 and DL580. 
 
 
Version-Release number of selected component (if applicable): 
GRUB as supplied on Q2 update disks. 
 
How reproducible: 
sometimes 
 
Steps to Reproduce: 
1. Install RHEL AS2.1 Q2 and select GRUB as the bootloader. 
2. 
3. 
     
Actual results: 
Console prints 'GRUB' and then stops. 
 
Expected results: 
Normal boot process. 
 
Additional info: 
LILO works fine.

Comment 1 Giuseppe Raimondi 2003-10-13 15:07:57 UTC
somehow this reminds me of bug#55484


Comment 2 Giuseppe Raimondi 2003-10-13 15:30:07 UTC
and more:
check bug#64428
apparently if you issue a 
grub-install (hd0)
bash tries to interpret the (
if you put single quotes
grub-install '(hd0)'
it'll work (according to your device.map)

Comment 3 Jeremy Katz 2003-10-13 21:27:57 UTC
What are all of the storage controllers on the system?  Which one is the boot
controller according to the BIOS?

Comment 4 John Newbigin 2003-12-10 23:06:54 UTC
I don't know how closely this is related but we had problems
installing a G3 with the add-in 6402 cciss card.

During boot, something appears to fail when trying to convert the root
device name /dev/cciss/c1.... into the root device.  controller 0
works (c0....) but controller 1 fails (c1....).

If you use the hex device number root=690n then it works.

I don't know if this is grub or something used by the initrd.  Grub
was installed correctly, it just did not have a working config file.

Comment 5 John Newbigin 2003-12-11 00:46:25 UTC
After doing a bit more research, I think my problem might be a kernel
issue.  init/main.c has a list of device names and numbers.  There is
only support for controller 0.  This would not be hard to fix.  I can
supply a patch.

Comment 6 John Newbigin 2004-02-02 06:29:37 UTC
Created attachment 97407 [details]
Allows main to recognise /dev/cciss/c1dxxx

Tested and works for me.

Comment 7 John Newbigin 2004-02-02 06:32:49 UTC
attachment (id=97407) is for the kernel, not grub.  I don't think this
is a grub bug.

Comment 9 Jason Dixon 2004-02-24 14:43:41 UTC
I've also experienced this issue with a 6402 as the boot controller in
a DL380 G3.  Going to wait for an official bugfix from Red Hat, the
"fix" is not permissible in our automated build environment.

Comment 10 Jason Baron 2004-02-24 15:35:45 UTC
The suggested patch makes sense, but i was just wondering how many
mappings we want to add...upstream 2.4 has entries for 8 cciss
controllers. although not more than the first disk hanging off those
controllers. the upstream patch would looke like:

--- linux/init/main.c.bak	Tue Feb 24 10:34:42 2004
+++ linux/init/main.c	Tue Feb 24 10:35:46 2004
@@ -331,6 +331,13 @@
 	{ "cciss/c0d13p",0x68D0 },
 	{ "cciss/c0d14p",0x68E0 },
 	{ "cciss/c0d15p",0x68F0 },
+	{ "cciss/c1d0p",0x6900 },
+	{ "cciss/c2d0p",0x6A00 },
+	{ "cciss/c3d0p",0x6B00 },
+	{ "cciss/c4d0p",0x6C00 },
+	{ "cciss/c5d0p",0x6D00 },
+	{ "cciss/c6d0p",0x6E00 },
+	{ "cciss/c7d0p",0x6F00 },
 	{ "ataraid/d0p",0x7200 },
 	{ "ataraid/d1p",0x7210 },
 	{ "ataraid/d2p",0x7220 },

I would prefer to use this upstream patch, which i think addresses the
case here root=6900, i think '690n' is a typo? 



Comment 11 John Newbigin 2004-02-25 23:48:59 UTC
Of course the real fix would be to extract both the controller number
and the disk number and work it out that way.  That would be more work
and probably not a good thing to add to a stable kernel.  (Of course
there are even better solutions, udev or making the cciss driver parse
the string but that is even more radical).

Given that the DL380G3 can be shipped with a second controller, that
should be supported out of the box with the same number of disks as a
single controller (16).  Does the 6404 appear as 2 extra controllers?
 If so then you should do the same for c2.

Beyond that, if you want to boot off a 4th controller then you
probably have quite a non-standard setup and editing the root= might
be an acceptable task.


Comment 12 Jason Baron 2004-02-27 22:51:31 UTC
fix from comment #10 included in U4 erratum candidate. moving to modified.

Comment 13 John Flanagan 2004-04-22 01:02:57 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-105.html


Comment 14 John Newbigin 2004-04-22 22:59:30 UTC
The patch from #10 only allows you to use the first disk on each
controller.

and '690n' is not a typo but means you have to put your partition
number in as 'n' (which I though would have been obvious).

Comment 15 Jason Dixon 2004-04-27 19:19:15 UTC
I'm not sure if it's still relevant to this bug or not, but I'll 
mention it here first.  The 6402 controller still does not work with 
U4 (e40) when booted via the pxeboot (initrd-everything.img) image...

PXELINUX 1.74 2002-06-01 Copyright (C) 1994-2002 H. Peter Anvin
UNDI data segment at:   00095540
UNDI data segment size: 4C50
UNDI code segment at:   0009A190
UNDI code segment size: 4804
PXE entry point found (we hope) at 9A19:00D6
My IP address seems to be OAD1525F 10.209.82.95
ip=10.209.82.95:10.209.83.40:10.209.82.1:255.255.255.128
TFTP prefix:
Trying to load: pxelinux.cfg/OAD1525F
boot:
Loading rhas2.1p4/vmlinuz.................
Loading rhas2.1p4/initrd-everything.img..............................
Ready.
Failed to free base memory, sorry...

[Hung]

Please reopen.

Comment 16 Mike Miller (OS Dev) 2004-05-10 14:11:25 UTC
It appears at this point the driver has not loaded. Therefore it is 
wrong to assume this problem is related to the cciss controller. Have 
you tried this PXE install with other controllers?

Comment 17 Jason Dixon 2004-05-10 14:20:59 UTC
Yes, this PXE install works fine with the other controllers.  I don't
think it's specific to the driver code... I think Red Hat has fudged
the pxeboot images for Update 4 and inserted an older driver.  The CD
images work fine with the 6402.

Comment 18 Chris Van Hoof 2004-05-19 15:16:09 UTC
From what I can tell the modules are the exact same on the pxe
initrd-everything.img and the isolinux/initrd.img on U4, 2.4.50:

(root@unplugged)(1630/pts)(11:08am:05/19/04)-
(#:~vanhoof/CUST/digex)- strings U3/modules/pxe/2.4.9-e.34BOOT/cciss.o
| egrep 2\.4\.
kernel_version=2.4.9-e.34BOOT
description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.49
<6>HP CISS Driver (v 2.4.49)

(root@unplugged)(1631/pts)(11:13am:05/19/04)-
(#:~vanhoof/CUST/digex)- strings U3/modules/cd/2.4.9-e.34BOOT/cciss.o
| egrep 2\.4\.
kernel_version=2.4.9-e.34BOOT
description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.49
<6>HP CISS Driver (v 2.4.49)

(root@unplugged)(1632/pts)(11:13am:05/19/04)-
(#:~vanhoof/CUST/digex)- strings U4/modules/pxe/2.4.9-e.40BOOT/cciss.o
| egrep 2\.4\.
kernel_version=2.4.9-e.40BOOT
description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.50
<6>HP CISS Driver (v 2.4.50)

(root@unplugged)(1633/pts)(11:13am:05/19/04)-
(#:~vanhoof/CUST/digex)- strings U4/modules/cd/2.4.9-e.40BOOT/cciss.o
| egrep 2\.4\.
kernel_version=2.4.9-e.40BOOT
description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.50
<6>HP CISS Driver (v 2.4.50)

Comment 19 Chris Van Hoof 2004-05-19 15:19:55 UTC
(root@unplugged)(1636/pts)(11:18am:05/19/04)-
(#:~vanhoof/CUST/digex)- md5sum U4/modules/pxe/2.4.9-e.40BOOT/cciss.o
64db0d6c8fca37f8b641b3e63f5b1ba4  U4/modules/pxe/2.4.9-e.40BOOT/cciss.o

(root@unplugged)(1637/pts)(11:19am:05/19/04)-
(#:~vanhoof/CUST/digex)- md5sum U4/modules/cd/2.4.9-e.40BOOT/cciss.o
64db0d6c8fca37f8b641b3e63f5b1ba4  U4/modules/cd/2.4.9-e.40BOOT/cciss.o


Comment 20 Jason Dixon 2004-05-21 17:27:06 UTC
I think I forgot to mention it, but IIRC, this only occurs with
pxeboot and GRUB.  I believe I tried this with LILO and it worked ok.
 I'll try and retest this in the coming days and report back.

Comment 21 Jason Dixon 2004-05-21 20:39:32 UTC
Sorry, scratch that last comment.  I was thinking of something else. 
I can't even get it to boot pxe, how would grub be a problem, duh?  :-P


Note You need to log in before you can comment on or make changes to this bug.