From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 Description of problem: If you run "grub-install /dev/md0" in a software-RAID 1 configuration with root (or /boot) partitions mirrored between two physical disks, grub-install will iterate through the disks in the array and attempt to install GRUB on each of them. After doing this, however, if you unplug the first disk and try to boot from the second disk, you get a "GRUB Hard Disk Error" message and the boot process halts. Same thing happens if you try putting the second disk in the first disk's place (i.e. as /dev/hda). I believe I've figured out why this happens. Following is a portion of the output from "grub-install --debug /dev/md0": grub> root (hd0,0) grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd0) ... grub> root (hd0,0) grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd1) Notice that when installing GRUB on the two physical drives -- hd0 and hd1 -- it specifies a root of (hd0,0) for both. This is evidently why you can't boot from the second disk (hd1) with the first disk (hd0) unplugged -- it's looking for a partition on a disk that isn't present. Attached is a patch I've devised to fix this problem. Instead of always specifying a partition on the first disk, it looks for a partition on the same disk that will be used in the "setup" command. With my patch applied, the output of "grub-install --debug /dev/md0" is as follows: grub> root (hd0,0) grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd0) ... grub> root (hd1,0) grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd1) This fixes both the "unplug first disk" and "replace first disk with second disk" cases in my tests. Version-Release number of selected component (if applicable): grub-0.95-13 How reproducible: Always Steps to Reproduce: 1. Start with a root partition mirrored across two IDE disks (/dev/hda and /dev/hdb). 2. Run "grub-install /dev/md0" 3. Reboot. It comes up fine. 4. Shut down. Unplug the first disk. Leave the second disk where it is. 5. Power on. Actual Results: Error message: "GRUB Hard Disk Error" Expected Results: It should boot from the second disk without error. Additional info:
Created attachment 115503 [details] Proposed fix
By the way, this also fixes the "grub-install /dev/hdb" and "grub-install /dev/hdb1" cases as well -- they now use (hd1,0) instead of (hd0,0) as well.
I can confirm the symptoms - root (hd0,0) called for both disks, when the second should be root (hd1,0). I can also confirm that the patch works for me with a mirrored root on /dev/md1. Many thanks. I have been chasing the same issue, and came to the same conclusion: https://sourceforge.net/tracker/index.php?func=detail&aid=1233029&group_id=96750&atid=615772
Created attachment 120181 [details] Patch, reformatted to apply in grub source RPM
Created attachment 120182 [details] Patch for grub-0.95-13 SPEC file
Hang on - I'm not positive that this is the fix. I'm still chasing this issue, and I have a feeling that it is related to performing a grub install while /boot is still resyncing.
Update: Installed using grub-0.95-13 (from FC4), two VMware disks hda and hdc, and waited until /boot was in sync: - Ran "/sbin/grub-install /dev/md1" - I can boot happily from both disks or either disk independently. Now do the same with /boot out of sync: mdadm --fail /dev/md1 /dev/hdc1 mdadm --remove /dev/md1 /dev/hdc1 mdadm --add /dev/md1 /dev/hdc1 ; /sbin/grub-install /dev/md1 - wait for /boot to be in sync - shut down - System will boot from primary drive - If primary drive removed, I get "GRUB " and nothing more
The following appears to work correctly. I'm going to work these changes into anaconda when I get a chance. #!/bin/sh #---------------------------------------------------------------------- # Copyright (C) 2005 Mitel Networks Corporation # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA #---------------------------------------------------------------------- PATH=$PATH:/sbin export PATH HDZERO=$(grep hd0 /boot/grub/device.map | awk '{ print $2 }') # We should be able to do grub-install --recheck /dev/md1, but not in # this version of grub-install # We need to rebuild the device.map file since $NEW didn't exist # at install time. echo "Forcing grub to rescan devices" grub-install --recheck $HDZERO # grub-install /dev/$NEW echo "Calling grub-install on $HDZERO" grub --batch <<HERE device (hd0) $HDZERO root (hd0,0) setup (hd0) HERE HDONE=$(grep hd1 /boot/grub/device.map | awk '{ print $2 }') if [ -z "$HDONE" ] then echo "Skipping grub-install on hd1" exit 0 fi echo "Calling grub-install on $HDONE" grub --batch <<HERE device (hd0) $HDONE root (hd0,0) setup (hd0) quit HERE exit 0
Hello, I just built grub-0.97-2 on a RH7.2 based system and tried it on a Fujitsu-Siemens TX150-S4 with 2 SCSI disks. /dev/md0 : /dev/sda2 /dev/sdb2 RAID 1 => /boot /dev/md1 : /dev/sda1 /dev/sdb1 RAID 1 => / Same problem occured and the patch works too. The issue of /boot still resyncing is perhaps not that important, since most of the time /boot is a separate partition approx. 50 MB big and therefore quickly rebuilt. When /boot is merged with /, maybe the grub-install could scan /proc/mdstat to be sure that the 2 parts of the mirror are in sync before trying to install the bootloader ? Same reasoning for anaconda : grub is launched at the end and I think that /boot is clean at that moment.
Created attachment 124046 [details] issue a warning if the RAID-1 where /boot is located is not stable (one member missing or synchronizing)
Just cross-referencing this: Bug #191449 NEW install grub incorrect when /boot is a RAID1 device Bug #170575 NEW grub fails on one of two sata disks of raid1 set during i... Bug #163460 NEW Installation failed on RAID setup (GRUB error 15 and fail... >>Bug #160563 NEW "grub-install /dev/md0" does not install on second disk i... Bug #114690 CLOSED/RAWHIDE grub-install won't install to RAID 1
This report targets the FC3 or FC4 products, which have now been EOL'd. Could you please check that it still applies to a current Fedora release, and either update the target product or close it ? Thanks.
(In reply to comment #12) > This report targets the FC3 or FC4 products, which have now been EOL'd. > > Could you please check that it still applies to a current Fedora release, and > either update the target product or close it ? I plan to check FC6 and CentOS 4.4 later this week.
The originally reported problem is reproducible on FC6 / grub-0.97-13. 1. Run "grub-install /dev/md0" 2. Shut down and physically unplug the first drive, leaving the second drive connected. 3. Try booting. It halts immediately with: "GRUB Hard Disk Error" As before, my patch eliminates the error. Any chance of incorporating it into a future version?
Why should grub-install /dev/md0 work at all? Just imagine what happens if the boot data is scribbled over two disks.
(In reply to comment #15) > Why should grub-install /dev/md0 work at all? Just imagine what happens if the > boot data is scribbled over two disks. This is exactly what should happen, I don't understand your objection. If I create a raid1 software mirror called md0, and ask grub to install the boot loader on this mirror, I would expect it to be written to both disks. This is what would happen if I created a hardware mirror, right? If I wanted to install the boot loader on only one of the disks making up the mirror then I can specify that specific disk instead of the mirror, but I don't know why you would want to do this and break the symmetry of the mirror. This bug report is about installing grub on a software raid mirror, which doesn't work as expected.
I had this problem with SATA drives on FC6 X86_64 while testing my Raid-1 drives. Basically the same problem happened on two nearly identical machines. Here is a workaround that worked for me (partially from http://www.tldp.org/HOWTO/text/Software-RAID-HOWTO). Note that I didn't apply the patch mentioned above, so the patch might have worked too. These steps are done after installation, of course. 1) Connect both drives and verify that all drives are sync'd (i.e. "cat /proc/mdstat") 2) If not, add the missing drive (i.e. If /dev/sdb1 is missing, "mdadm /dev/md0 --add /dev/sda1") 3) Wait for drives to sync up. 4) grub <enter> 5) grub> device (hd0) /dev/sda <enter> 6) grub> root (hd0,0) <enter> 7) grub> setup (hd0) <enter> 8) grub> device (hd0) /dev/sdb <enter> 9) grub> root (hd0,0) <enter> 10) grub> setup (hd0) <enter> 11) grub> quit <enter>
(In reply to comment #17) Sigh... Correction to step 2: 2) If not, add the missing drive (i.e. If /dev/sdb1 is missing, "mdadm /dev/md0 --add /dev/sdb1")
FWIW, I'm looking at a similar setup with Centos5, and can't figure out whether it'd work OK or not except by testing it. I commented out 'root (hd0,0)' from grub.conf, re-ran grub-install --recheck, and when running grub-install --debug /dev/md0, the critical lines from debug output seem to be: grub> root (hd0,0) grub> setup --stage2=/boot/grub/stage2 --prefix=/boot/grub (hd0) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded. succeeded Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded grub> quit grub> root (hd0,0) Filesystem type is ext2fs, partition type 0xfd grub> setup --stage2=/boot/grub/stage2 --prefix=/boot/grub (hd1) Checking if "/boot/grub/stage1" exists... yes Checking if "/boot/grub/stage2" exists... yes Checking if "/boot/grub/e2fs_stage1_5" exists... yes Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded. succeeded Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 d (hd1) (hd1)1+15 p (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeede Done. grub> quit Looking at various threads in mailing lists and bug reports, there seem to be about 3-4 different ways to install grub in such a manner that it should work.. not sure if this is one of them..
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
Fedora 6 -> Fedora 8
*** Bug 451979 has been marked as a duplicate of this bug. ***
It would be nice if the following could be used: device (hd0) /dev/md0 root (hd0,0) setup (hd0) However when I try this on Fedora 9, I get the following error after the root command: root(hd0,0) Unknown partition table signature Error 5: Partition table invalid or corrupt
After thinking about it some more, the above would only work simply, if the whole disk drive was mirrored. If it is mirrored at the partition level, then grub has to know that the device is a raid device and explicitly figure out where to write the MBRs.
Created attachment 309880 [details] Output from grub-install with debug=no changed to debug=yes I took a look at what grub-install does on Fedora 9 and it sure looks like it is doing the right thing. Grub appears to be being install on both hard drives ((hd0) and (hd1)) and in both cases looks at (hd0,0) to finish booting. This is what I think is the right way to do this under the assumption that the bad drive won't be detected (because it has been removed or failed completely). I don't want to pull drives right at this time to make sure that theory is correct.
Bruno, I've seeing this problem as well. From FC9 installer, I setup 3 MD's (/boot, swap and /). It works great. I make sure sdb has a MBR like sda. And I also add a 2nd kernel in grub.conf with fallback=1 so if hd0,0 (sda1) can't load, it goes to hd1,0 (sdb1) for the boot image. root=/dev/md2 (which is / in my case). Everything works great as long as both disks are present. (I can boot from hd0,0 or hd1,0). But if I remove either sda or sdb, I get invalid argument in boot-up fsck.ext3 on /dev/md0. It's like the kernel doesn't know to start up the /dev/md's before trying to fsck them. If this has been fixed - what is the work around? Thanks for bringing this back up.
It might be interesting to see how this works if you use the mdadm from F9 testing. I see the last fix might possibly apply here: * Thu Jun 26 2008 Doug Ledford <dledford> - 2.6.7-1 - Update to latest upstream version (should resolve #444237) - Drop incremental patch as it's now part of upstream - Clean up all the open() calls in the code (#437145) - Fix the build process to actually generate mdassemble (#446988) - Update the udev rules to get additional info about arrays being assembled from the /etc/mdadm.conf file (--scan option) (#447818) - Update the udev rules to run degraded arrays (--run option) (#452459)
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.