Description of problem: GRUB does not install properly on recent HP DL servers (servers < 1 year old) I have noticed this on DL380G3 and DL580. Version-Release number of selected component (if applicable): GRUB as supplied on Q2 update disks. How reproducible: sometimes Steps to Reproduce: 1. Install RHEL AS2.1 Q2 and select GRUB as the bootloader. 2. 3. Actual results: Console prints 'GRUB' and then stops. Expected results: Normal boot process. Additional info: LILO works fine.
somehow this reminds me of bug#55484
and more: check bug#64428 apparently if you issue a grub-install (hd0) bash tries to interpret the ( if you put single quotes grub-install '(hd0)' it'll work (according to your device.map)
What are all of the storage controllers on the system? Which one is the boot controller according to the BIOS?
I don't know how closely this is related but we had problems installing a G3 with the add-in 6402 cciss card. During boot, something appears to fail when trying to convert the root device name /dev/cciss/c1.... into the root device. controller 0 works (c0....) but controller 1 fails (c1....). If you use the hex device number root=690n then it works. I don't know if this is grub or something used by the initrd. Grub was installed correctly, it just did not have a working config file.
After doing a bit more research, I think my problem might be a kernel issue. init/main.c has a list of device names and numbers. There is only support for controller 0. This would not be hard to fix. I can supply a patch.
Created attachment 97407 [details] Allows main to recognise /dev/cciss/c1dxxx Tested and works for me.
attachment (id=97407) is for the kernel, not grub. I don't think this is a grub bug.
I've also experienced this issue with a 6402 as the boot controller in a DL380 G3. Going to wait for an official bugfix from Red Hat, the "fix" is not permissible in our automated build environment.
The suggested patch makes sense, but i was just wondering how many mappings we want to add...upstream 2.4 has entries for 8 cciss controllers. although not more than the first disk hanging off those controllers. the upstream patch would looke like: --- linux/init/main.c.bak Tue Feb 24 10:34:42 2004 +++ linux/init/main.c Tue Feb 24 10:35:46 2004 @@ -331,6 +331,13 @@ { "cciss/c0d13p",0x68D0 }, { "cciss/c0d14p",0x68E0 }, { "cciss/c0d15p",0x68F0 }, + { "cciss/c1d0p",0x6900 }, + { "cciss/c2d0p",0x6A00 }, + { "cciss/c3d0p",0x6B00 }, + { "cciss/c4d0p",0x6C00 }, + { "cciss/c5d0p",0x6D00 }, + { "cciss/c6d0p",0x6E00 }, + { "cciss/c7d0p",0x6F00 }, { "ataraid/d0p",0x7200 }, { "ataraid/d1p",0x7210 }, { "ataraid/d2p",0x7220 }, I would prefer to use this upstream patch, which i think addresses the case here root=6900, i think '690n' is a typo?
Of course the real fix would be to extract both the controller number and the disk number and work it out that way. That would be more work and probably not a good thing to add to a stable kernel. (Of course there are even better solutions, udev or making the cciss driver parse the string but that is even more radical). Given that the DL380G3 can be shipped with a second controller, that should be supported out of the box with the same number of disks as a single controller (16). Does the 6404 appear as 2 extra controllers? If so then you should do the same for c2. Beyond that, if you want to boot off a 4th controller then you probably have quite a non-standard setup and editing the root= might be an acceptable task.
fix from comment #10 included in U4 erratum candidate. moving to modified.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-105.html
The patch from #10 only allows you to use the first disk on each controller. and '690n' is not a typo but means you have to put your partition number in as 'n' (which I though would have been obvious).
I'm not sure if it's still relevant to this bug or not, but I'll mention it here first. The 6402 controller still does not work with U4 (e40) when booted via the pxeboot (initrd-everything.img) image... PXELINUX 1.74 2002-06-01 Copyright (C) 1994-2002 H. Peter Anvin UNDI data segment at: 00095540 UNDI data segment size: 4C50 UNDI code segment at: 0009A190 UNDI code segment size: 4804 PXE entry point found (we hope) at 9A19:00D6 My IP address seems to be OAD1525F 10.209.82.95 ip=10.209.82.95:10.209.83.40:10.209.82.1:255.255.255.128 TFTP prefix: Trying to load: pxelinux.cfg/OAD1525F boot: Loading rhas2.1p4/vmlinuz................. Loading rhas2.1p4/initrd-everything.img.............................. Ready. Failed to free base memory, sorry... [Hung] Please reopen.
It appears at this point the driver has not loaded. Therefore it is wrong to assume this problem is related to the cciss controller. Have you tried this PXE install with other controllers?
Yes, this PXE install works fine with the other controllers. I don't think it's specific to the driver code... I think Red Hat has fudged the pxeboot images for Update 4 and inserted an older driver. The CD images work fine with the 6402.
From what I can tell the modules are the exact same on the pxe initrd-everything.img and the isolinux/initrd.img on U4, 2.4.50: (root@unplugged)(1630/pts)(11:08am:05/19/04)- (#:~vanhoof/CUST/digex)- strings U3/modules/pxe/2.4.9-e.34BOOT/cciss.o | egrep 2\.4\. kernel_version=2.4.9-e.34BOOT description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.49 <6>HP CISS Driver (v 2.4.49) (root@unplugged)(1631/pts)(11:13am:05/19/04)- (#:~vanhoof/CUST/digex)- strings U3/modules/cd/2.4.9-e.34BOOT/cciss.o | egrep 2\.4\. kernel_version=2.4.9-e.34BOOT description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.49 <6>HP CISS Driver (v 2.4.49) (root@unplugged)(1632/pts)(11:13am:05/19/04)- (#:~vanhoof/CUST/digex)- strings U4/modules/pxe/2.4.9-e.40BOOT/cciss.o | egrep 2\.4\. kernel_version=2.4.9-e.40BOOT description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.50 <6>HP CISS Driver (v 2.4.50) (root@unplugged)(1633/pts)(11:13am:05/19/04)- (#:~vanhoof/CUST/digex)- strings U4/modules/cd/2.4.9-e.40BOOT/cciss.o | egrep 2\.4\. kernel_version=2.4.9-e.40BOOT description=Driver for HP SA5xxx SA6xxx Controllers version 2.4.50 <6>HP CISS Driver (v 2.4.50)
(root@unplugged)(1636/pts)(11:18am:05/19/04)- (#:~vanhoof/CUST/digex)- md5sum U4/modules/pxe/2.4.9-e.40BOOT/cciss.o 64db0d6c8fca37f8b641b3e63f5b1ba4 U4/modules/pxe/2.4.9-e.40BOOT/cciss.o (root@unplugged)(1637/pts)(11:19am:05/19/04)- (#:~vanhoof/CUST/digex)- md5sum U4/modules/cd/2.4.9-e.40BOOT/cciss.o 64db0d6c8fca37f8b641b3e63f5b1ba4 U4/modules/cd/2.4.9-e.40BOOT/cciss.o
I think I forgot to mention it, but IIRC, this only occurs with pxeboot and GRUB. I believe I tried this with LILO and it worked ok. I'll try and retest this in the coming days and report back.
Sorry, scratch that last comment. I was thinking of something else. I can't even get it to boot pxe, how would grub be a problem, duh? :-P