Bug 112892

Summary: Booting by GRUB fails, possibly problem with ext2 filesystem access
Product: [Retired] Red Hat Linux Reporter: David Tonhofer <bughunt>
Component: grubAssignee: Peter Jones <pjones>
Status: CLOSED NOTABUG QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: mattdm, notting
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-25 17:48:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Tonhofer 2004-01-05 15:49:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5)
Gecko/20031007

Description of problem:
Hardware setup:
---------------

* Machine running RH9.0. Standard PC, it's a Siemens-Fujitsu desktop
  machine, nothing fancy.
* Two identical IDE harddisks: hd0 is master on IDE1, hd1 is slave
  on IDE2. Harddisks are 76319MB (=74.53GB) Western Digital WDC WD800
  JB-00CRA1. (They run hot!)
* If the kernel is running, the two disks form a Linux software mirror
  RAID (i.e. I use /dev/md devices)
* No multiboot, no nothing. Just GRUB with software RAID (N.B. I 
  did not get LILO to work in that configuration otherwise I would
  be running LILO)

Partition setup:
----------------

The first partition is mapped to /boot. An excerpt:

  /dev/hda1  + /dev/hdd1  == /dev/md0 mounted on /boot
                            (primary partitions) 511.844 MB
  /dev/hda2  + /dev/hdd2  == /dev/md1 mounted on /       
                             (primary partitions) 2047.99 MB
  /dev/hda3  + /dev/hdd3  == /dev/md2 mounted on /var/db
                             (primary partitions) 6143.98 MB
  /dev/hda5  + /dev/hdd5  == /dev/md3 mounted on /usr                
                                  
                             5120.20 MB
  .....

The filesystem  on all disks is ext3.

Kernel version:
---------------

Recently the kernel has been upgraded from 
2.4.20-8 -> 2.4.20-20.9 through up2date. No problem has been
encountered during the reboot at that time. No further changes
were made to the system prior to the fatal reboot.

The anomaly:
------------

I just though I would reboot the server. Big mistake! GRUB is
launched, but stops. The screen just says "GRUB". No GRUB 
error messages are displayed. End of story.

Analysis:
---------

It looks like GRUB's stage1 can be loaded and executed off the disk
MBR. Loading stage 1.5 and/or stage 2 seems to fail. As the RAID is
not yet running at boot time, we are in the situation where this is
a 'standard boot' from (hd0), stage 1.5 is the 'e2fs_stage1_5',
and stage2 should go looking for for grub.conf in (hd0,0)/grub
(i.e. /boot/grub). Which does not happen.

Further information:
--------------------

I thought something might have messed up stage 1 (i.e. the MBR) and
decided to reinstall from a GRUB boot floppy, created as described
in the GRUB homepage:

http://www.gnu.org/software/grub/manual/html_mono/grub.html#Installation

This gives me the GRUB commandline. As I wanted to reinstall GRUB,
I needed the file 'stage1'. Now things get weird: GRUB did not
'find' that file. 

We switch to the /boot partitions on both mirrored disks, to verify
they are there:

root (hd0,0)  -->  "ext2fs, type 0xfd"
root (hd1,0)  -->  "ext2fs, type 0xfd"

We want to find the file 'stage1'

find /grub/stage1  --> "(hd1,0)/grub/stage1"

This is bad: GRUB finds that file only on the second harddisk, not
on the first. The same happens with some other files e.g.
"vmlinuz" "kernel.h" ".module-info". Others can be found on both
harddisks, e.g. "os2_d.b" "chain.b" "boot.b". 

GRUB can also 'list' the directories correctly. I remove the second
harddisk, then re-enter the GRUB commandline, then:

find (hd0,0)/<TAB>

lists the file 'vmlinuz' for example, but 

find (hd0,0)/vmlinuz

will result in 'Error 15: File not found'.

To verify the files are actually there, I mount the disks using the
Linux rescue console (using the RedHat CD1), then look around.
The /boot partitions on both disks do not present any anomalies.

Conclusion:
-----------

Does GRUB have trouble with the ext3 filesystem? The symptons above
would be explained if it could not properly access it. The bad disk
will stay in storage for a while if someone wants to know more...

Solution:
---------

As GRUB refuses to boot off hd0, and we in a RAID setup, I decide to
just plug in the second harddisk as the first. Booting then proceeds.
The machine is now running on a dead mirror.


Version-Release number of selected component (if applicable):
grub-0.93-4

How reproducible:
Always

Steps to Reproduce:
1. Try to boot from the original 'first' harddisk
    

Actual Results:  See description

Expected Results:  See description

Additional info:

See description

Comment 1 David Tonhofer 2005-11-16 17:00:23 UTC
As the reporter, I suggest to close this bug, it's too old now. Worse, the
problem *might* be related to the harddisk in use at the time because it had a
*really* bad case of 'badblock' and was exchanged by the manufacturer. 

Sorry for letting this stew.





Comment 2 Bill Nottingham 2006-08-05 03:34:55 UTC
Red Hat apologizes that these issues have not been resolved yet. We do want to
make sure that no important bugs slip through the cracks.

Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc.
They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/)
for security updates only. If this is a security issue, please reassign to the
'Fedora Legacy' product in bugzilla. Please note that Legacy security update
support for these products will stop on December 31st, 2006.

If this is not a security issue, please check if this issue is still present
in a current Fedora Core release. If so, please change the product and version
to match, and check the box indicating that the requested information has been
provided.

If you are currently still running Red Hat Linux 7.3 or 9, please note that
Fedora Legacy security update support for these products will stop on December
31st, 2006. You are strongly advised to upgrade to a current Fedora Core release
or Red Hat Enterprise Linux or comparable. Some information on which option may
be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/.

Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be
closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a
security issue, please change the product as necessary. We thank you for your
help, and apologize again that we haven't handled these issues to this point.


Comment 3 David Tonhofer 2006-08-25 17:50:52 UTC
Closed as NOTABUG, see note of 2005-11-16 12:00 EST