Bug 149587 - Grub installation fails
Grub installation fails
Status: CLOSED DUPLICATE of bug 202101
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: grub (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Peter Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-24 03:00 EST by Mikael M. Hansen
Modified: 2008-07-28 11:04 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-28 11:04:39 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
device.map (82 bytes, text/plain)
2005-04-18 03:30 EDT, Mikael M. Hansen
no flags Details
grub.conf (1.15 KB, text/plain)
2005-04-18 03:31 EDT, Mikael M. Hansen
no flags Details
/proc/partitions (297 bytes, text/plain)
2005-04-18 03:34 EDT, Mikael M. Hansen
no flags Details

  None (edit)
Description Mikael M. Hansen 2005-02-24 03:00:58 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
During both an upgrade and a new install (kickstart based) of RHEL4WS
I've come accross the following message when the system is rebooted
for the final steps of the installation:

GRUB Loading stage2

and then nothing happens. 


Booting from CD in rescue mode and doing a: grub-install /dev/{s|h}da
solves the problem.

One other bug (149305) exists describing  something similar, except it
indicates it is only on non IDE/ATA/SATA disks. I've come accross my
problem on ATA diske also.

Regarding the reproducability I've installed RHEL4 on two machine
until now and the problem came on both. I'll install it on one more
machine today, and report back if the problem also appeared during
this install. 

Version-Release number of selected component (if applicable):
The one included in the ISO 

How reproducible:
Didn't try

Steps to Reproduce:
1. Install OS via kickstart / or upgrade RHEL3 fully updated
2. reboot as part of installation
3.
    

Actual Results:  Systems cannot boot 

Expected Results:  System should have booted

Additional info:
Comment 1 Mikael M. Hansen 2005-02-24 06:25:21 EST
Well. Did another (kickstart) install and this one didn't fail. This
was on an ATA disk.
Comment 2 Suzanne Hillman 2005-02-24 14:37:37 EST
What is the hardware on the system which is having this problem?
Comment 3 Mikael M. Hansen 2005-02-25 04:10:37 EST
Well. It happend on two systems so far:

1. Fujitsu-Siemens P300. Ultra-DMA-100 disk. 

On this I did an upgrade from the CD's

2. Dell PowerEdge 2650 with the PERC3/DI RAID controller 

On this I did a kickstart-based install. An it failed to boot. The
relevant part of the kickstart installation is:

bootloader --location=mbr --md5pass=XXXXXXXXXX
zerombr yes
clearpart --all --initlabel


I've now installed (kickstart-based) on two machines like the first
one without the problem appearing. 
Comment 4 Nic Doye 2005-02-25 11:32:43 EST
I've also seen this on an HP DL360 G3. I installed RHEL ES 4 on two
identical boxes yesterday - one was fine, the other just instantly
rebooted after showing

GRUB Loading stage2

Like Mikael above, I also booted off the CD and did a 

grub-install /dev/cciss/c0d0

The servers aren't networked yet to get to them, but another HP DL360
G3 reports:

# head -1 /proc/driver/cciss/cciss0
cciss0: HP Smart Array 5i Controller

I doubt it matters, but we're using RAID 1 (but Linux can't see that).

No third-party drivers were used.

Comment 6 rob 2005-03-04 08:09:32 EST
fwiw, there is a similar issue with grub and i2o_block devices.  i
added grub-install /dev/i2o/hda to the ks %post as a work around.
Comment 7 John Marquart 2005-03-17 17:11:44 EST
I experienced this exact same problem w/ IBM x345 w/ the latest firmware (using
ServerRaid card w/ 3drive RAID5 configured - also latest firmware)

The grub-install /dev/sda solution also worked for me.

Comment 8 Paul Raines 2005-04-14 10:43:04 EDT
I did an install on 12 identical cluster nodes.  Three of them failed with this
grub stage2 error.  I did a full reinstall on them and it worked the second
time. Seems totally random.  I am going to add 'grub-install /dev/hda' to the
%post section on my kickstart and see if that cures it.
Comment 9 Peter Jones 2005-04-14 11:22:07 EDT
Mikael, can you please add the following files from the failing system this bug:
/proc/partitions , /boot/grub/grub.conf , /boot/grub/device.map

That should aid in determining what's going wrong.
Comment 10 Mikael M. Hansen 2005-04-18 03:30:39 EDT
Created attachment 113310 [details]
device.map
Comment 11 Mikael M. Hansen 2005-04-18 03:31:39 EDT
Created attachment 113311 [details]
grub.conf
Comment 12 Mikael M. Hansen 2005-04-18 03:34:04 EDT
Created attachment 113312 [details]
/proc/partitions
Comment 13 Peter Jones 2005-04-19 12:00:52 EDT
Ok, so those all look resonably correct.  Does your raid controller's bios give
you any indication as to the state of the mirror?

There's nothing wrong with the config files, which makes me think the write is
somehow failing.  Given that you're on a RAID-1, the obvious question is if it's
getting written to one drive and not the other, and then rebooting before the
RAID is synced up.  If that happens, you the BIOS might be reading from the disk
that doesn't have the data yet.

So if the BIOS can tell you if the raid is synced up, that'd be good to check
next time it fails.  If it can't, you might try swapping the drives.

Let me know if you get the chance to try either of these...
Comment 14 Mikael M. Hansen 2005-04-20 03:08:35 EDT
Well. I sure hope so. We've done no modificatioins to them.  Ther are as they
where left from the installation (and updates naturally). 

I don't have a mirror. One of the systems that failed (also the one the
attachments are from is running on a RAID5 across 4 76 GB disks). On the second
system (dekstop PC) i've seen it fail (that was before I added grub-install to
the post section of the kickstart install) it was just a plain raw ATA disk
wiped completely during the install. 

So your RAID-1 suggestion seems not to be the problem. Atleast I cannot see
where the problem should be. Also woth to note is that we have installed RHEL3
upd2 and later on roughly 20 Dell PW2650 with RAID5 and never experienced the
problem. So it was introduced between RHEL3 upd2 and RHEL4.

Perhaps it should be noted that /boot is located on the / partition. But if that
should be the problem, then why does it only fail on some and not all?

I hope this gives a little more info to work with. Please do not hesitate to ask
for more if needed
Comment 16 Dan Pritts 2005-06-30 12:57:37 EDT
I've had similar problems on systems with the following configuration:

old dual-proc p3-600, adaptec scsi, 2x disk

new via C3 EPIA motherboard, 2x IDE disk on motherboard IDE channels.
 
In each case, i've configured the system for software RAID, mirrored across
the two disks.  System looks fine when booted from rescue cd but grub won't
boot it correctly.  I haven't been able to figure out the magic grub incantation
to make it work.

Also, I had maybe the same problem on a Dell 750 server with hardware SATA
RAID.  I left the Dell Utility partition on the disk, installed EL4.  the system
would hang at the grub "loading stage2" message.  In that case, deleting the
dell utility partition and reinstalling fixed it.  
Comment 17 Simon Thompson 2006-01-09 05:16:41 EST
We have seen this problem (repeatably) on Sun X2100 servers with dual disk.

We use kickstart installs and md mirrors across two disks and the bootloader is
not installed on either disk (we've tried swapping them over to see if it boots
from the other disk).

Initially the systems come with Solaris 10 installed, we use clearpart to remove
all partitions and then install over both disks. On the first boot you get a CRC
error as the Solaris bootloader is still on the disk. We've tried clearing the
bootloader (dd /dev/null onto the start of the disk), but still grub does not
get installed when you kickstart install RHEL4 with md partitions.

During our testing we have seen a 'working' install, however this was following
and install without md paritions onto just one disk - i.e. the bootloader seems
to get installed correctly if you aren't using md partitions. I'm guessing that
this is what is causing the effect of comment #1 - i.e. you get "GRUB Loading
stage2" and then nothing else when upgrading from RHEL3 - the grub that is being
called is left over from the previous install as grub has not been properly
installed during the RHEL4 install.

Perhaps this is actually a problem with anaconda? i.e. how does grub know which
device it should be installing on - if that is (incorrectly) passed from
anaconda then it wouldn't work. Though I don't claim to know what interraction
happens between anaconda and grub!

For info, as a work-around we've added the following to the %post section of our
kickstart install:

#
# Force grub onto MBR's
#
                                                                                
LogFile="/var/tmp/GrubFix.log"

# Determine root device
                                                                                
RootDev=`/bin/df / | /usr/bin/tail -1 | /bin/awk '{print $1}' | \
         /bin/sed -e 's/\/dev\///'`
                                                                                
RootType=`echo $RootDev | /bin/sed -e 's/^\([a-zA-Z]*\)[0-9]*$/\1/'`
                                                                                
# Is root device a raid partition
                                                                                
if [ "$RootType" = "md" ]; then
   RootDev=`/bin/grep "$RootDev" /proc/mdstat | \
         /bin/sed -e 's/^[^ ]* : [^ ]* [^ ]* //' | \
         /bin/sed -e 's/\[[^]]*\]//g'`
fi
                                                                                
hd=0
for dev in $RootDev; do
  /sbin/grub --batch  <<EOT > $LogFile
root (hd$hd,0) /dev/$dev
setup --stage2=/boot/grub/stage2 (hd$hd)
EOT
  let "hd = $hd + 1"
done

Comment 18 Marius Andreiana 2006-11-02 08:16:13 EST
This also happens on FC6, Dell Latitude 620, 1 SATA drive.
grub-install /dev/sda from rescue solves the problem.

Any other information required to fix this bug?

Thanks
Comment 19 Ingo Molnar 2006-11-25 03:17:23 EST
Same thing on a T60, 1 SATA drive, fresh stock FC6 install. What solved it was:

   grub-install --recheck /dev/sda

from a rescue session.
Comment 20 Mikael M. Hansen 2006-11-25 04:44:02 EST
Well. I give up waiting for a solution. Adding grub-install to post section of
kickstart always works. As far as I am conserned it can be closed.
Comment 27 Peter Jones 2008-07-28 11:04:39 EDT
This appears to be a dupe of 202101 .

*** This bug has been marked as a duplicate of 202101 ***

Note You need to log in before you can comment on or make changes to this bug.