Bug 149587
Summary: | Grub installation fails | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Mikael M. Hansen <mhansen> | ||||||||
Component: | grub | Assignee: | Peter Jones <pjones> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.0 | CC: | james.brown, jmarquart, jsin, k.georgiou, marius.andreiana, nic, redhat, rmj, sfolkwil, s.j.thompson | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-07-28 15:04:39 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Mikael M. Hansen
2005-02-24 08:00:58 UTC
Well. Did another (kickstart) install and this one didn't fail. This was on an ATA disk. What is the hardware on the system which is having this problem? Well. It happend on two systems so far: 1. Fujitsu-Siemens P300. Ultra-DMA-100 disk. On this I did an upgrade from the CD's 2. Dell PowerEdge 2650 with the PERC3/DI RAID controller On this I did a kickstart-based install. An it failed to boot. The relevant part of the kickstart installation is: bootloader --location=mbr --md5pass=XXXXXXXXXX zerombr yes clearpart --all --initlabel I've now installed (kickstart-based) on two machines like the first one without the problem appearing. I've also seen this on an HP DL360 G3. I installed RHEL ES 4 on two identical boxes yesterday - one was fine, the other just instantly rebooted after showing GRUB Loading stage2 Like Mikael above, I also booted off the CD and did a grub-install /dev/cciss/c0d0 The servers aren't networked yet to get to them, but another HP DL360 G3 reports: # head -1 /proc/driver/cciss/cciss0 cciss0: HP Smart Array 5i Controller I doubt it matters, but we're using RAID 1 (but Linux can't see that). No third-party drivers were used. fwiw, there is a similar issue with grub and i2o_block devices. i added grub-install /dev/i2o/hda to the ks %post as a work around. I experienced this exact same problem w/ IBM x345 w/ the latest firmware (using ServerRaid card w/ 3drive RAID5 configured - also latest firmware) The grub-install /dev/sda solution also worked for me. I did an install on 12 identical cluster nodes. Three of them failed with this grub stage2 error. I did a full reinstall on them and it worked the second time. Seems totally random. I am going to add 'grub-install /dev/hda' to the %post section on my kickstart and see if that cures it. Mikael, can you please add the following files from the failing system this bug: /proc/partitions , /boot/grub/grub.conf , /boot/grub/device.map That should aid in determining what's going wrong. Created attachment 113310 [details]
device.map
Created attachment 113311 [details]
grub.conf
Created attachment 113312 [details]
/proc/partitions
Ok, so those all look resonably correct. Does your raid controller's bios give you any indication as to the state of the mirror? There's nothing wrong with the config files, which makes me think the write is somehow failing. Given that you're on a RAID-1, the obvious question is if it's getting written to one drive and not the other, and then rebooting before the RAID is synced up. If that happens, you the BIOS might be reading from the disk that doesn't have the data yet. So if the BIOS can tell you if the raid is synced up, that'd be good to check next time it fails. If it can't, you might try swapping the drives. Let me know if you get the chance to try either of these... Well. I sure hope so. We've done no modificatioins to them. Ther are as they where left from the installation (and updates naturally). I don't have a mirror. One of the systems that failed (also the one the attachments are from is running on a RAID5 across 4 76 GB disks). On the second system (dekstop PC) i've seen it fail (that was before I added grub-install to the post section of the kickstart install) it was just a plain raw ATA disk wiped completely during the install. So your RAID-1 suggestion seems not to be the problem. Atleast I cannot see where the problem should be. Also woth to note is that we have installed RHEL3 upd2 and later on roughly 20 Dell PW2650 with RAID5 and never experienced the problem. So it was introduced between RHEL3 upd2 and RHEL4. Perhaps it should be noted that /boot is located on the / partition. But if that should be the problem, then why does it only fail on some and not all? I hope this gives a little more info to work with. Please do not hesitate to ask for more if needed I've had similar problems on systems with the following configuration: old dual-proc p3-600, adaptec scsi, 2x disk new via C3 EPIA motherboard, 2x IDE disk on motherboard IDE channels. In each case, i've configured the system for software RAID, mirrored across the two disks. System looks fine when booted from rescue cd but grub won't boot it correctly. I haven't been able to figure out the magic grub incantation to make it work. Also, I had maybe the same problem on a Dell 750 server with hardware SATA RAID. I left the Dell Utility partition on the disk, installed EL4. the system would hang at the grub "loading stage2" message. In that case, deleting the dell utility partition and reinstalling fixed it. We have seen this problem (repeatably) on Sun X2100 servers with dual disk. We use kickstart installs and md mirrors across two disks and the bootloader is not installed on either disk (we've tried swapping them over to see if it boots from the other disk). Initially the systems come with Solaris 10 installed, we use clearpart to remove all partitions and then install over both disks. On the first boot you get a CRC error as the Solaris bootloader is still on the disk. We've tried clearing the bootloader (dd /dev/null onto the start of the disk), but still grub does not get installed when you kickstart install RHEL4 with md partitions. During our testing we have seen a 'working' install, however this was following and install without md paritions onto just one disk - i.e. the bootloader seems to get installed correctly if you aren't using md partitions. I'm guessing that this is what is causing the effect of comment #1 - i.e. you get "GRUB Loading stage2" and then nothing else when upgrading from RHEL3 - the grub that is being called is left over from the previous install as grub has not been properly installed during the RHEL4 install. Perhaps this is actually a problem with anaconda? i.e. how does grub know which device it should be installing on - if that is (incorrectly) passed from anaconda then it wouldn't work. Though I don't claim to know what interraction happens between anaconda and grub! For info, as a work-around we've added the following to the %post section of our kickstart install: # # Force grub onto MBR's # LogFile="/var/tmp/GrubFix.log" # Determine root device RootDev=`/bin/df / | /usr/bin/tail -1 | /bin/awk '{print $1}' | \ /bin/sed -e 's/\/dev\///'` RootType=`echo $RootDev | /bin/sed -e 's/^\([a-zA-Z]*\)[0-9]*$/\1/'` # Is root device a raid partition if [ "$RootType" = "md" ]; then RootDev=`/bin/grep "$RootDev" /proc/mdstat | \ /bin/sed -e 's/^[^ ]* : [^ ]* [^ ]* //' | \ /bin/sed -e 's/\[[^]]*\]//g'` fi hd=0 for dev in $RootDev; do /sbin/grub --batch <<EOT > $LogFile root (hd$hd,0) /dev/$dev setup --stage2=/boot/grub/stage2 (hd$hd) EOT let "hd = $hd + 1" done This also happens on FC6, Dell Latitude 620, 1 SATA drive. grub-install /dev/sda from rescue solves the problem. Any other information required to fix this bug? Thanks Same thing on a T60, 1 SATA drive, fresh stock FC6 install. What solved it was: grub-install --recheck /dev/sda from a rescue session. Well. I give up waiting for a solution. Adding grub-install to post section of kickstart always works. As far as I am conserned it can be closed. This appears to be a dupe of 202101 . *** This bug has been marked as a duplicate of 202101 *** |