Bug 160563 - "grub-install /dev/md0" does not install on second disk in RAID 1 array correctly
Summary: "grub-install /dev/md0" does not install on second disk in RAID 1 array corre...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: grub
Version: 8
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Peter Jones
QA Contact:
URL:
Whiteboard: bzcl34nup
: 451979 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-15 20:06 UTC by Jordan Russell
Modified: 2009-01-09 06:52 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-09 06:52:49 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Proposed fix (1.53 KB, patch)
2005-06-15 20:12 UTC, Jordan Russell
no flags Details | Diff
Patch, reformatted to apply in grub source RPM (1.69 KB, patch)
2005-10-20 05:44 UTC, Gordon Rowell
no flags Details | Diff
Patch for grub-0.95-13 SPEC file (1.31 KB, patch)
2005-10-20 05:50 UTC, Gordon Rowell
no flags Details | Diff
issue a warning if the RAID-1 where /boot is located is not stable (one member missing or synchronizing) (933 bytes, patch)
2006-02-02 13:04 UTC, loic
no flags Details | Diff
Output from grub-install with debug=no changed to debug=yes (26.45 KB, text/plain)
2008-06-19 21:06 UTC, Bruno Wolff III
no flags Details

Description Jordan Russell 2005-06-15 20:06:59 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
If you run "grub-install /dev/md0" in a software-RAID 1 configuration with root (or /boot) partitions mirrored between two physical disks, grub-install will iterate through the disks in the array and attempt to install GRUB on each of them.

After doing this, however, if you unplug the first disk and try to boot from the second disk, you get a "GRUB Hard Disk Error" message and the boot process halts. Same thing happens if you try putting the second disk in the first disk's place (i.e. as /dev/hda).


I believe I've figured out why this happens. Following is a portion of the output from "grub-install --debug /dev/md0":

grub> root (hd0,0)
grub> setup  --stage2=/boot/grub/stage2 --prefix=/grub (hd0)
...
grub> root (hd0,0)
grub> setup  --stage2=/boot/grub/stage2 --prefix=/grub (hd1)

Notice that when installing GRUB on the two physical drives -- hd0 and hd1 -- it specifies a root of (hd0,0) for both. This is evidently why you can't boot from the second disk (hd1) with the first disk (hd0) unplugged -- it's looking for a partition on a disk that isn't present.


Attached is a patch I've devised to fix this problem. Instead of always specifying a partition on the first disk, it looks for a partition on the same disk that will be used in the "setup" command.

With my patch applied, the output of "grub-install --debug /dev/md0" is as follows:

grub> root (hd0,0)
grub> setup  --stage2=/boot/grub/stage2 --prefix=/grub (hd0)
...
grub> root (hd1,0)
grub> setup  --stage2=/boot/grub/stage2 --prefix=/grub (hd1)

This fixes both the "unplug first disk" and "replace first disk with second disk" cases in my tests.

Version-Release number of selected component (if applicable):
grub-0.95-13

How reproducible:
Always

Steps to Reproduce:
1. Start with a root partition mirrored across two IDE disks (/dev/hda and /dev/hdb).
2. Run "grub-install /dev/md0"
3. Reboot. It comes up fine.
4. Shut down. Unplug the first disk. Leave the second disk where it is.
5. Power on.

Actual Results:  Error message: "GRUB Hard Disk Error"

Expected Results:  It should boot from the second disk without error.

Additional info:

Comment 1 Jordan Russell 2005-06-15 20:12:26 UTC
Created attachment 115503 [details]
Proposed fix

Comment 2 Jordan Russell 2005-06-15 20:15:27 UTC
By the way, this also fixes the "grub-install /dev/hdb" and "grub-install
/dev/hdb1" cases as well -- they now use (hd1,0) instead of (hd0,0) as well.

Comment 3 Gordon Rowell 2005-10-20 05:21:10 UTC
I can confirm the symptoms - root (hd0,0) called for both disks, when the second
should be root (hd1,0).

I can also confirm that the patch works for me with a mirrored root on /dev/md1.

Many thanks.

I have been chasing the same issue, and came to the same conclusion:

https://sourceforge.net/tracker/index.php?func=detail&aid=1233029&group_id=96750&atid=615772


Comment 4 Gordon Rowell 2005-10-20 05:44:58 UTC
Created attachment 120181 [details]
Patch, reformatted to apply in grub source RPM

Comment 5 Gordon Rowell 2005-10-20 05:50:07 UTC
Created attachment 120182 [details]
Patch for grub-0.95-13 SPEC file

Comment 6 Gordon Rowell 2005-10-21 04:30:26 UTC
Hang on - I'm not positive that this is the fix. I'm still chasing this issue,
and  I have a feeling that it is related to performing a grub install while
/boot is still resyncing.

Comment 7 Gordon Rowell 2005-10-21 07:49:23 UTC
Update:

Installed using grub-0.95-13 (from FC4), two VMware disks hda and hdc, and
waited until /boot was in sync:

- Ran "/sbin/grub-install /dev/md1"
- I can boot happily from both disks or either disk independently.

Now do the same with /boot out of sync:

mdadm --fail /dev/md1 /dev/hdc1
mdadm --remove /dev/md1 /dev/hdc1

mdadm --add /dev/md1 /dev/hdc1 ; /sbin/grub-install /dev/md1

- wait for /boot to be in sync
- shut down
- System will boot from primary drive
- If primary drive removed, I get "GRUB " and nothing more


Comment 8 Gordon Rowell 2005-10-26 00:59:20 UTC
The following appears to work correctly. I'm going to work these changes into
anaconda when I get a chance.


#!/bin/sh
#----------------------------------------------------------------------
# Copyright (C) 2005 Mitel Networks Corporation
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307  USA
#----------------------------------------------------------------------
PATH=$PATH:/sbin
export PATH

HDZERO=$(grep hd0 /boot/grub/device.map | awk '{ print $2 }')

# We should be able to do grub-install --recheck /dev/md1, but not in
# this version of grub-install
# We need to rebuild the device.map file since $NEW didn't exist
# at install time.
echo "Forcing grub to rescan devices"

grub-install --recheck $HDZERO
# grub-install /dev/$NEW

echo "Calling grub-install on $HDZERO"

grub --batch <<HERE
device (hd0) $HDZERO
root (hd0,0)
setup (hd0)
HERE

HDONE=$(grep hd1 /boot/grub/device.map | awk '{ print $2 }')

if [ -z "$HDONE" ]
then
    echo "Skipping grub-install on hd1"
    exit 0
fi

echo "Calling grub-install on $HDONE"

grub --batch <<HERE
device (hd0) $HDONE
root (hd0,0)
setup (hd0)
quit
HERE

exit 0

Comment 9 loic 2006-01-27 13:07:25 UTC
Hello,
I just built grub-0.97-2 on a RH7.2 based system and tried it on a
Fujitsu-Siemens TX150-S4 with 2 SCSI disks.
/dev/md0 : /dev/sda2 /dev/sdb2 RAID 1 => /boot
/dev/md1 : /dev/sda1 /dev/sdb1 RAID 1 => /
Same problem occured and the patch works too.

The issue of /boot still resyncing is perhaps not that important, since most of
the time /boot is a separate partition approx. 50 MB big and therefore quickly
rebuilt.
When /boot is merged with /, maybe the grub-install could scan /proc/mdstat to
be sure that the 2 parts of the mirror are in sync before trying to install the
bootloader ?
Same reasoning for anaconda : grub is launched at the end and I think that /boot
is clean at that moment.

Comment 10 loic 2006-02-02 13:04:02 UTC
Created attachment 124046 [details]
issue a warning if the RAID-1 where /boot is located is not stable (one member missing or synchronizing)

Comment 11 David Tonhofer 2006-08-24 20:41:32 UTC
Just cross-referencing this:

Bug #191449 NEW install grub incorrect when /boot is a RAID1 device
Bug #170575 NEW grub fails on one of two sata disks of raid1 set during i...
Bug #163460 NEW Installation failed on RAID setup (GRUB error 15 and fail...
>>Bug #160563 NEW "grub-install /dev/md0" does not install on second disk i...
Bug #114690 CLOSED/RAWHIDE grub-install won't install to RAID 1



Comment 12 Christian Iseli 2007-01-22 10:28:43 UTC
This report targets the FC3 or FC4 products, which have now been EOL'd.

Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?

Thanks.

Comment 13 Gordon Rowell 2007-01-22 20:06:39 UTC
(In reply to comment #12)
> This report targets the FC3 or FC4 products, which have now been EOL'd.
> 
> Could you please check that it still applies to a current Fedora release, and
> either update the target product or close it ?

I plan to check FC6 and CentOS 4.4 later this week.

Comment 14 Jordan Russell 2007-01-24 21:01:53 UTC
The originally reported problem is reproducible on FC6 / grub-0.97-13.

1. Run "grub-install /dev/md0"
2. Shut down and physically unplug the first drive, leaving the second drive
connected.
3. Try booting. It halts immediately with: "GRUB Hard Disk Error"

As before, my patch eliminates the error.

Any chance of incorporating it into a future version?

Comment 15 Jan Engelhardt 2007-01-30 11:23:34 UTC
Why should grub-install /dev/md0 work at all? Just imagine what happens if the
boot data is scribbled over two disks.

Comment 16 Jason Smith 2007-01-30 12:14:05 UTC
(In reply to comment #15)
> Why should grub-install /dev/md0 work at all? Just imagine what happens if the
> boot data is scribbled over two disks.

This is exactly what should happen, I don't understand your objection.  If I
create a raid1 software mirror called md0, and ask grub to install the boot
loader on this mirror, I would expect it to be written to both disks.  This is
what would happen if I created a hardware mirror, right?  If I wanted to install
the boot loader on only one of the disks making up the mirror then I can specify
that specific disk instead of the mirror, but I don't know why you would want to
do this and break the symmetry of the mirror.  This bug report is about
installing grub on a software raid mirror, which doesn't work as expected.


Comment 17 street14@patrickind.com 2007-02-06 21:40:24 UTC
I had this problem with SATA drives on FC6 X86_64 while testing my Raid-1
drives.  Basically the same problem happened on two nearly identical machines. 
Here is a workaround that worked for me (partially from
http://www.tldp.org/HOWTO/text/Software-RAID-HOWTO).  Note that I didn't apply
the patch mentioned above, so the patch might have worked too.  These steps are
done after installation, of course.
1) Connect both drives and verify that all drives are sync'd (i.e. "cat
/proc/mdstat")
2) If not, add the missing drive (i.e. If /dev/sdb1 is missing, "mdadm /dev/md0
--add /dev/sda1")
3) Wait for drives to sync up.
4) grub <enter>
5) grub> device (hd0) /dev/sda <enter>
6) grub> root (hd0,0) <enter>
7) grub> setup (hd0) <enter>
8) grub> device (hd0) /dev/sdb <enter>
9) grub> root (hd0,0) <enter>
10) grub> setup (hd0) <enter>
11) grub> quit <enter>



Comment 18 street14@patrickind.com 2007-02-06 21:44:53 UTC
(In reply to comment #17)
Sigh... Correction to step 2:

2) If not, add the missing drive (i.e. If /dev/sdb1 is missing, "mdadm /dev/md0
--add /dev/sdb1")


Comment 19 Pekka Savola 2007-04-23 06:35:26 UTC
FWIW, I'm looking at a similar setup with Centos5, and can't figure out whether
it'd work OK or not except by testing it.

I commented out 'root (hd0,0)' from grub.conf, re-ran grub-install --recheck, and
when running grub-install --debug /dev/md0, the critical lines from debug output
seem to be:

grub> root (hd0,0)
grub> setup  --stage2=/boot/grub/stage2 --prefix=/boot/grub (hd0)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) (hd0)1+15 p
(hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
grub> quit

grub> root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup  --stage2=/boot/grub/stage2 --prefix=/boot/grub (hd1)
 Checking if "/boot/grub/stage1" exists... yes
 Checking if "/boot/grub/stage2" exists... yes
 Checking if "/boot/grub/e2fs_stage1_5" exists... yes
 Running "embed /boot/grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install --stage2=/boot/grub/stage2 /boot/grub/stage1 d (hd1) (hd1)1+15
p (hd0,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeede
Done.
grub> quit

Looking at various threads in mailing lists and bug reports, there seem to be
about 3-4 different ways to install grub in such a manner that it should work..
not sure if this is one of them..

Comment 20 Bug Zapper 2008-04-04 01:57:33 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 21 Jordan Russell 2008-04-04 06:17:22 UTC
Fedora 6 -> Fedora 8

Comment 22 Bruno Wolff III 2008-06-18 14:46:28 UTC
*** Bug 451979 has been marked as a duplicate of this bug. ***

Comment 23 Bruno Wolff III 2008-06-18 15:16:32 UTC
It would be nice if the following could be used:
device (hd0) /dev/md0
root (hd0,0)
setup (hd0)

However when I try this on Fedora 9, I get the following error after the root
command:
root(hd0,0)
Unknown partition table signature

Error 5: Partition table invalid or corrupt

Comment 24 Bruno Wolff III 2008-06-18 16:06:08 UTC
After thinking about it some more, the above would only work simply, if the
whole disk drive was mirrored. If it is mirrored at the partition level, then
grub has to know that the device is a raid device and explicitly figure out
where to write the MBRs.

Comment 25 Bruno Wolff III 2008-06-19 21:06:44 UTC
Created attachment 309880 [details]
Output from grub-install with debug=no changed to debug=yes

I took a look at what grub-install does on Fedora 9 and it sure looks like it
is doing the right thing. Grub appears to be being install on both hard drives
((hd0) and (hd1)) and in both cases looks at (hd0,0) to finish booting. This is
what I think is the right way to do this under the assumption that the bad
drive won't be detected (because it has been removed or failed completely). I
don't want to pull drives right at this time to make sure that theory is
correct.

Comment 26 Jean-Luc Cooke 2008-07-02 14:50:28 UTC
Bruno,

I've seeing this problem as well.

From FC9 installer, I setup 3 MD's (/boot, swap and /).  It works great.  I make
sure sdb has a MBR like sda.  And I also add a 2nd kernel in grub.conf with
fallback=1 so if hd0,0 (sda1) can't load, it goes to hd1,0 (sdb1) for the boot
image.  root=/dev/md2 (which is / in my case).

Everything works great as long as both disks are present.  (I can boot from
hd0,0 or hd1,0).

But if I remove either sda or sdb, I get invalid argument in boot-up fsck.ext3
on /dev/md0.  It's like the kernel doesn't know to start up the /dev/md's before
trying to fsck them.

If this has been fixed - what is the work around?  Thanks for bringing this back up.

Comment 27 Bruno Wolff III 2008-07-02 16:17:54 UTC
It might be interesting to see how this works if you use the mdadm from F9
testing. I see the last fix might possibly apply here:
* Thu Jun 26 2008 Doug Ledford <dledford> - 2.6.7-1
- Update to latest upstream version (should resolve #444237)
- Drop incremental patch as it's now part of upstream
- Clean up all the open() calls in the code (#437145)
- Fix the build process to actually generate mdassemble (#446988)
- Update the udev rules to get additional info about arrays being assembled
  from the /etc/mdadm.conf file (--scan option) (#447818)
- Update the udev rules to run degraded arrays (--run option) (#452459)


Comment 28 Bug Zapper 2008-11-26 06:50:54 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 29 Bug Zapper 2009-01-09 06:52:49 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.