Bug 158426 - grub won't setup to raid 1 members
grub won't setup to raid 1 members
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: grub (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Peter Jones
Mike McLean
:
Depends On:
Blocks: FC5Blocker
  Show dependency treegraph
 
Reported: 2005-05-22 01:55 EDT by Alexandre Oliva
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version: grub-0.97-4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-13 09:16:40 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Patch that fixes the regression introduced by grub-0.97-nxstack.patch (1.82 KB, patch)
2006-03-09 09:39 EST, Alexandre Oliva
no flags Details | Diff

  None (edit)
Description Alexandre Oliva 2005-05-22 01:55:44 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050512 Fedora/1.0.4-2 Firefox/1.0.4

Description of problem:
Given a raid 1 /boot, grub refuses to install itself within the same session to more than one member of the same raid 1 device.  Consider:

md5 : active raid1 sdb5[1] sda5[2] hda5[0]

hd0 maps to hda, hd1 to sda and hd2 to sdb.

# grub
root (hd0,5)
setup (hd0,5)
setup (hd1,5)
setup (hd2,5)

If I run the four commands above in the same grub session, the last two fail with errors such as:

 Running "install /grub/stage1 d (hd1,5) /grub/stage2 p /grub/grub.conf "... failed

Error 16: Inconsistent filesystem structure

If, however, I quit grub after every successful setup command, then no such error arises.

I don't see any errors returned by the syscalls issued by grub, with strace.  Is grub actually performing some legitimate test, and the buffer-flushing ioctls it issues before exiting manage to remove whatever inconsistency it would find there otherwise, or is it a bug that it finds an inconsistency?

FWIW, the problem happens not only when grub is started within Linux, but also when the commands are issued from the boot loader environment, so I suspect the latter.  Same problem if I use install directly, instead of setup.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Create two partitions in two disks, at the exact same positions
2.Create a raid 1 device out of them
3.Attempt to install grub on both of them, while the raid device is running

Actual Results:  Inconsistent filesystem detected for any device other than the first to have setup run on it.

Expected Results:  It works with FC3's grub.

Additional info:
Comment 2 Alexandre Oliva 2006-01-13 10:43:03 EST
This still happens in today's rawhide, and actually causes grub to fail to
install on all but the first disk containing RAID 1 members of /boot.
Comment 3 Alexandre Oliva 2006-02-16 14:20:41 EST
The `Inconsistent filesystem structure´ error still occurs in today's rawhide,
and it still causes grub to fail to install correctly on RAIDed /boots.
Comment 4 Peter Jones 2006-03-02 11:58:34 EST
This appears to be a user error -- you need to "sync" in between the setup
calls, which effectively means running grub 3 times.  Also, you need to specify
"root" for the same drive as "setup" if you want this to work correctly.  The
respective commands to run in grub are:

root (hd0,5)
setup (hd0,5)
quit

root (hd1,5)
setup (hd1,5)
quit

root (hd2,5)
setup (hd2,5)
quit
Comment 5 Alexandre Oliva 2006-03-02 12:41:28 EST
Make it anaconda/grubby/booty/whatever error, then, cause I see this on one of
the VTs during installation.

Now how do you sync when you're on the grub *boot* prompt (i.e., no underlying
OS).  The same problem is present there, are you *sure* it's not just a
regression/bug?  It worked with earlier versions of grub, and it still does if I
use them with current kernels.
Comment 6 Peter Jones 2006-03-02 13:11:44 EST
Huh, you're installing the bootloader at the boot prompt?  How are you doing
that, and what do you expect it to accomplish?

Comment 7 Alexandre Oliva 2006-03-02 13:19:29 EST
I *could* install the bootloader at the boot prompt.  I often do that to
overcome the anaconda misfeature that stops me from using a raid 1 /boot without
messing with the MBR of the corresponding disks.  So I boot up with the
corrupted MBR, reinstall grub on all of the raid replicas, chainload the image
that should have been left alone in the MBR and then re-install grub on the MBR.

But that's beside the point, isn't it?  The problem is that either anaconda is
relying on a grub feature that is broken, or anaconda needs fixing to not rely
on that no-longer-supported feature.  It currently fails to set up booting from
all but the first replica of a raid 1 because of the symtoms described in this
bug report.

Since this behavior used to be supported and no longer is, I believe it's a
regression, not a removed feature.  If it were a new (mis)feature, the error
message would probably make sense, but it doesn't.
Comment 8 Alexandre Oliva 2006-03-09 08:07:07 EST
Are you really saying that our installer's failure to install the boot loader
correctly when /boot is in RAID 1, as in comment #5, is not a bug?  I suppose
you closed this by mistake.
Comment 9 Alexandre Oliva 2006-03-09 08:55:52 EST
One more point showing it's more likely to be a bug in grub's filesystem
handling (as mentioned in the grub documentation related with this error) than a
confirmation of your theory that it has to do with syncing: if I repeat the same
setup command for the *same* partition, it also fails:

root (hd0,1)
setup (hd0,1)
[ok]
setup (hd0,1)
[inconsistent filesystem structure]

Even if instead of setup I use just the install command twice, it fails:

root (hd0,1)
install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf
[ok]
install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf
[inconsistent filesystem structure]

Could you perhaps explain how this could possibly not be a bug?  I'm looking at
the code right now, trying to find my way around it, and it fails at this point
(watching errnum):


Old value = ERR_NONE
New value = ERR_UNALIGNED
install_blocklist_helper (sector=343059, offset=0, length=512)
    at builtins.c:1972
1972          return;
(gdb) where
#0  install_blocklist_helper (sector=343059, offset=0, length=512)
    at builtins.c:1972
#1  0x0804db66 in rawread (drive=128, sector=343059, byte_offset=0,
    byte_len=512, buf=0xf7c93c00 "�p\202") at disk_io.c:268
#2  0x0804dcf2 in devread (sector=86019, byte_offset=0, byte_len=512,
    buf=0xf7c93c00 "�p\202") at disk_io.c:327
#3  0x08050bef in ext2fs_read (buf=0xf7c93c00 "�p\202", len=101884)
    at fsys_ext2fs.c:440
#4  0x080501bf in grub_read (buf=0xf7c93c00 "�p\202", len=101884)
    at disk_io.c:1739
#5  0x0805fee0 in install_func (
    arg=0xf7ba9c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf",
    flags=1) at builtins.c:2235
#6  0x08063dfb in enter_cmdline (
    heap=0xf7ba9c60 "install /grub/stage1 (hd0,1) /grub/stage2 p
/grub/grub.conf", forever=1) at cmdline.c:177
#7  0x0805c703 in cmain () at stage2.c:1177
#8  0x0804d7ad in init_bios_info () at common.c:336
#9  0x0804a2e6 in doit.5509 () at asmstub.c:183
#10 0x0804a18e in grub_stage2 () at asmstub.c:266
#11 0x08049e39 in main (argc=2, argv=0xffffbd44) at main.c:264
(gdb) c
Continuing.
Hardware watchpoint 1: errnum

Old value = ERR_UNALIGNED
New value = ERR_FSYS_CORRUPT
ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329
329               return -1;

(gdb) where
#0  ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329
#1  0x08050b3b in ext2fs_read (buf=0xf7cc3a00 "��a", len=90108)
    at fsys_ext2fs.c:423
#2  0x080501bf in grub_read (buf=0xf7cc0c00 "�p\202", len=101884)
    at disk_io.c:1739
#3  0x0805fee0 in install_func (
    arg=0xf7bd6c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf",
    flags=1) at builtins.c:2235
#4  0x08063dfb in enter_cmdline (
    heap=0xf7bd6c60 "install /grub/stage1 (hd0,1) /grub/stage2 p
/grub/grub.conf", forever=1) at cmdline.c:177
#5  0x0805c703 in cmain () at stage2.c:1177
#6  0x0804d7ad in init_bios_info () at common.c:336
#7  0x0804a2e6 in doit.5509 () at asmstub.c:183
#8  0x0804a18e in grub_stage2 () at asmstub.c:266
#9  0x08049e39 in main (argc=2, argv=0xffffccd4) at main.c:264

The UNALIGNED error does not occur the first time.  I suspect it causes the
FSYS_CORRUPT error.
Comment 10 Alexandre Oliva 2006-03-09 09:23:58 EST
The problem is caused by the static variable last_length in function
install_blocklist_helper.  If I reset it to its initial value, then install
succeeds multiple times.  I suspect it might make sense to reset
install_func_context as well.
Comment 11 Alexandre Oliva 2006-03-09 09:39:36 EST
Created attachment 125876 [details]
Patch that fixes the regression introduced by grub-0.97-nxstack.patch

The transformation from nested functions to non-nested functions did not
preserve the semantics for variable save_length, originally in install_func,
causing the value left over from a previous invocation to break subsequent
invocations.

This patch uses the same idiom used to handle the other variables formerly
accessed from nested functions for this one, such that grub works again.
Comment 12 Alexandre Oliva 2006-03-09 12:01:25 EST
jkeating tells me a build with the fix is already in and will make FC5, yay!
Comment 13 Alexandre Oliva 2006-03-13 09:16:40 EST
Confirmed fixed, thanks.

Note You need to log in before you can comment on or make changes to this bug.