From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Description of problem: Hardware setup: --------------- * Machine running RH9.0. Standard PC, it's a Siemens-Fujitsu desktop machine, nothing fancy. * Two identical IDE harddisks: hd0 is master on IDE1, hd1 is slave on IDE2. Harddisks are 76319MB (=74.53GB) Western Digital WDC WD800 JB-00CRA1. (They run hot!) * If the kernel is running, the two disks form a Linux software mirror RAID (i.e. I use /dev/md devices) * No multiboot, no nothing. Just GRUB with software RAID (N.B. I did not get LILO to work in that configuration otherwise I would be running LILO) Partition setup: ---------------- The first partition is mapped to /boot. An excerpt: /dev/hda1 + /dev/hdd1 == /dev/md0 mounted on /boot (primary partitions) 511.844 MB /dev/hda2 + /dev/hdd2 == /dev/md1 mounted on / (primary partitions) 2047.99 MB /dev/hda3 + /dev/hdd3 == /dev/md2 mounted on /var/db (primary partitions) 6143.98 MB /dev/hda5 + /dev/hdd5 == /dev/md3 mounted on /usr 5120.20 MB ..... The filesystem on all disks is ext3. Kernel version: --------------- Recently the kernel has been upgraded from 2.4.20-8 -> 2.4.20-20.9 through up2date. No problem has been encountered during the reboot at that time. No further changes were made to the system prior to the fatal reboot. The anomaly: ------------ I just though I would reboot the server. Big mistake! GRUB is launched, but stops. The screen just says "GRUB". No GRUB error messages are displayed. End of story. Analysis: --------- It looks like GRUB's stage1 can be loaded and executed off the disk MBR. Loading stage 1.5 and/or stage 2 seems to fail. As the RAID is not yet running at boot time, we are in the situation where this is a 'standard boot' from (hd0), stage 1.5 is the 'e2fs_stage1_5', and stage2 should go looking for for grub.conf in (hd0,0)/grub (i.e. /boot/grub). Which does not happen. Further information: -------------------- I thought something might have messed up stage 1 (i.e. the MBR) and decided to reinstall from a GRUB boot floppy, created as described in the GRUB homepage: http://www.gnu.org/software/grub/manual/html_mono/grub.html#Installation This gives me the GRUB commandline. As I wanted to reinstall GRUB, I needed the file 'stage1'. Now things get weird: GRUB did not 'find' that file. We switch to the /boot partitions on both mirrored disks, to verify they are there: root (hd0,0) --> "ext2fs, type 0xfd" root (hd1,0) --> "ext2fs, type 0xfd" We want to find the file 'stage1' find /grub/stage1 --> "(hd1,0)/grub/stage1" This is bad: GRUB finds that file only on the second harddisk, not on the first. The same happens with some other files e.g. "vmlinuz" "kernel.h" ".module-info". Others can be found on both harddisks, e.g. "os2_d.b" "chain.b" "boot.b". GRUB can also 'list' the directories correctly. I remove the second harddisk, then re-enter the GRUB commandline, then: find (hd0,0)/<TAB> lists the file 'vmlinuz' for example, but find (hd0,0)/vmlinuz will result in 'Error 15: File not found'. To verify the files are actually there, I mount the disks using the Linux rescue console (using the RedHat CD1), then look around. The /boot partitions on both disks do not present any anomalies. Conclusion: ----------- Does GRUB have trouble with the ext3 filesystem? The symptons above would be explained if it could not properly access it. The bad disk will stay in storage for a while if someone wants to know more... Solution: --------- As GRUB refuses to boot off hd0, and we in a RAID setup, I decide to just plug in the second harddisk as the first. Booting then proceeds. The machine is now running on a dead mirror. Version-Release number of selected component (if applicable): grub-0.93-4 How reproducible: Always Steps to Reproduce: 1. Try to boot from the original 'first' harddisk Actual Results: See description Expected Results: See description Additional info: See description
As the reporter, I suggest to close this bug, it's too old now. Worse, the problem *might* be related to the harddisk in use at the time because it had a *really* bad case of 'badblock' and was exchanged by the manufacturer. Sorry for letting this stew.
Red Hat apologizes that these issues have not been resolved yet. We do want to make sure that no important bugs slip through the cracks. Red Hat Linux 7.3 and Red Hat Linux 9 are no longer supported by Red Hat, Inc. They are maintained by the Fedora Legacy project (http://www.fedoralegacy.org/) for security updates only. If this is a security issue, please reassign to the 'Fedora Legacy' product in bugzilla. Please note that Legacy security update support for these products will stop on December 31st, 2006. If this is not a security issue, please check if this issue is still present in a current Fedora Core release. If so, please change the product and version to match, and check the box indicating that the requested information has been provided. If you are currently still running Red Hat Linux 7.3 or 9, please note that Fedora Legacy security update support for these products will stop on December 31st, 2006. You are strongly advised to upgrade to a current Fedora Core release or Red Hat Enterprise Linux or comparable. Some information on which option may be right for you is available at http://www.redhat.com/rhel/migrate/redhatlinux/. Any bug still open against Red Hat Linux 7.3 or 9 at the end of 2006 will be closed 'CANTFIX'. Again, if this bug still exists in a current release, or is a security issue, please change the product as necessary. We thank you for your help, and apologize again that we haven't handled these issues to this point.
Closed as NOTABUG, see note of 2005-11-16 12:00 EST