Bug 158426 - grub won't setup to raid 1 members
Summary: grub won't setup to raid 1 members
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: grub
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Peter Jones
QA Contact: Mike McLean
URL:
Whiteboard:
Depends On:
Blocks: FC5Blocker
TreeView+ depends on / blocked
 
Reported: 2005-05-22 05:55 UTC by Alexandre Oliva
Modified: 2007-11-30 22:11 UTC (History)
0 users

Fixed In Version: grub-0.97-4
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-13 14:16:40 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch that fixes the regression introduced by grub-0.97-nxstack.patch (1.82 KB, patch)
2006-03-09 14:39 UTC, Alexandre Oliva
no flags Details | Diff

Description Alexandre Oliva 2005-05-22 05:55:44 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050512 Fedora/1.0.4-2 Firefox/1.0.4

Description of problem:
Given a raid 1 /boot, grub refuses to install itself within the same session to more than one member of the same raid 1 device.  Consider:

md5 : active raid1 sdb5[1] sda5[2] hda5[0]

hd0 maps to hda, hd1 to sda and hd2 to sdb.

# grub
root (hd0,5)
setup (hd0,5)
setup (hd1,5)
setup (hd2,5)

If I run the four commands above in the same grub session, the last two fail with errors such as:

 Running "install /grub/stage1 d (hd1,5) /grub/stage2 p /grub/grub.conf "... failed

Error 16: Inconsistent filesystem structure

If, however, I quit grub after every successful setup command, then no such error arises.

I don't see any errors returned by the syscalls issued by grub, with strace.  Is grub actually performing some legitimate test, and the buffer-flushing ioctls it issues before exiting manage to remove whatever inconsistency it would find there otherwise, or is it a bug that it finds an inconsistency?

FWIW, the problem happens not only when grub is started within Linux, but also when the commands are issued from the boot loader environment, so I suspect the latter.  Same problem if I use install directly, instead of setup.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Create two partitions in two disks, at the exact same positions
2.Create a raid 1 device out of them
3.Attempt to install grub on both of them, while the raid device is running

Actual Results:  Inconsistent filesystem detected for any device other than the first to have setup run on it.

Expected Results:  It works with FC3's grub.

Additional info:

Comment 2 Alexandre Oliva 2006-01-13 15:43:03 UTC
This still happens in today's rawhide, and actually causes grub to fail to
install on all but the first disk containing RAID 1 members of /boot.

Comment 3 Alexandre Oliva 2006-02-16 19:20:41 UTC
The `Inconsistent filesystem structure´ error still occurs in today's rawhide,
and it still causes grub to fail to install correctly on RAIDed /boots.

Comment 4 Peter Jones 2006-03-02 16:58:34 UTC
This appears to be a user error -- you need to "sync" in between the setup
calls, which effectively means running grub 3 times.  Also, you need to specify
"root" for the same drive as "setup" if you want this to work correctly.  The
respective commands to run in grub are:

root (hd0,5)
setup (hd0,5)
quit

root (hd1,5)
setup (hd1,5)
quit

root (hd2,5)
setup (hd2,5)
quit

Comment 5 Alexandre Oliva 2006-03-02 17:41:28 UTC
Make it anaconda/grubby/booty/whatever error, then, cause I see this on one of
the VTs during installation.

Now how do you sync when you're on the grub *boot* prompt (i.e., no underlying
OS).  The same problem is present there, are you *sure* it's not just a
regression/bug?  It worked with earlier versions of grub, and it still does if I
use them with current kernels.

Comment 6 Peter Jones 2006-03-02 18:11:44 UTC
Huh, you're installing the bootloader at the boot prompt?  How are you doing
that, and what do you expect it to accomplish?



Comment 7 Alexandre Oliva 2006-03-02 18:19:29 UTC
I *could* install the bootloader at the boot prompt.  I often do that to
overcome the anaconda misfeature that stops me from using a raid 1 /boot without
messing with the MBR of the corresponding disks.  So I boot up with the
corrupted MBR, reinstall grub on all of the raid replicas, chainload the image
that should have been left alone in the MBR and then re-install grub on the MBR.

But that's beside the point, isn't it?  The problem is that either anaconda is
relying on a grub feature that is broken, or anaconda needs fixing to not rely
on that no-longer-supported feature.  It currently fails to set up booting from
all but the first replica of a raid 1 because of the symtoms described in this
bug report.

Since this behavior used to be supported and no longer is, I believe it's a
regression, not a removed feature.  If it were a new (mis)feature, the error
message would probably make sense, but it doesn't.

Comment 8 Alexandre Oliva 2006-03-09 13:07:07 UTC
Are you really saying that our installer's failure to install the boot loader
correctly when /boot is in RAID 1, as in comment #5, is not a bug?  I suppose
you closed this by mistake.

Comment 9 Alexandre Oliva 2006-03-09 13:55:52 UTC
One more point showing it's more likely to be a bug in grub's filesystem
handling (as mentioned in the grub documentation related with this error) than a
confirmation of your theory that it has to do with syncing: if I repeat the same
setup command for the *same* partition, it also fails:

root (hd0,1)
setup (hd0,1)
[ok]
setup (hd0,1)
[inconsistent filesystem structure]

Even if instead of setup I use just the install command twice, it fails:

root (hd0,1)
install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf
[ok]
install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf
[inconsistent filesystem structure]

Could you perhaps explain how this could possibly not be a bug?  I'm looking at
the code right now, trying to find my way around it, and it fails at this point
(watching errnum):


Old value = ERR_NONE
New value = ERR_UNALIGNED
install_blocklist_helper (sector=343059, offset=0, length=512)
    at builtins.c:1972
1972          return;
(gdb) where
#0  install_blocklist_helper (sector=343059, offset=0, length=512)
    at builtins.c:1972
#1  0x0804db66 in rawread (drive=128, sector=343059, byte_offset=0,
    byte_len=512, buf=0xf7c93c00 "�p\202") at disk_io.c:268
#2  0x0804dcf2 in devread (sector=86019, byte_offset=0, byte_len=512,
    buf=0xf7c93c00 "�p\202") at disk_io.c:327
#3  0x08050bef in ext2fs_read (buf=0xf7c93c00 "�p\202", len=101884)
    at fsys_ext2fs.c:440
#4  0x080501bf in grub_read (buf=0xf7c93c00 "�p\202", len=101884)
    at disk_io.c:1739
#5  0x0805fee0 in install_func (
    arg=0xf7ba9c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf",
    flags=1) at builtins.c:2235
#6  0x08063dfb in enter_cmdline (
    heap=0xf7ba9c60 "install /grub/stage1 (hd0,1) /grub/stage2 p
/grub/grub.conf", forever=1) at cmdline.c:177
#7  0x0805c703 in cmain () at stage2.c:1177
#8  0x0804d7ad in init_bios_info () at common.c:336
#9  0x0804a2e6 in doit.5509 () at asmstub.c:183
#10 0x0804a18e in grub_stage2 () at asmstub.c:266
#11 0x08049e39 in main (argc=2, argv=0xffffbd44) at main.c:264
(gdb) c
Continuing.
Hardware watchpoint 1: errnum

Old value = ERR_UNALIGNED
New value = ERR_FSYS_CORRUPT
ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329
329               return -1;

(gdb) where
#0  ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329
#1  0x08050b3b in ext2fs_read (buf=0xf7cc3a00 "��a", len=90108)
    at fsys_ext2fs.c:423
#2  0x080501bf in grub_read (buf=0xf7cc0c00 "�p\202", len=101884)
    at disk_io.c:1739
#3  0x0805fee0 in install_func (
    arg=0xf7bd6c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf",
    flags=1) at builtins.c:2235
#4  0x08063dfb in enter_cmdline (
    heap=0xf7bd6c60 "install /grub/stage1 (hd0,1) /grub/stage2 p
/grub/grub.conf", forever=1) at cmdline.c:177
#5  0x0805c703 in cmain () at stage2.c:1177
#6  0x0804d7ad in init_bios_info () at common.c:336
#7  0x0804a2e6 in doit.5509 () at asmstub.c:183
#8  0x0804a18e in grub_stage2 () at asmstub.c:266
#9  0x08049e39 in main (argc=2, argv=0xffffccd4) at main.c:264

The UNALIGNED error does not occur the first time.  I suspect it causes the
FSYS_CORRUPT error.

Comment 10 Alexandre Oliva 2006-03-09 14:23:58 UTC
The problem is caused by the static variable last_length in function
install_blocklist_helper.  If I reset it to its initial value, then install
succeeds multiple times.  I suspect it might make sense to reset
install_func_context as well.

Comment 11 Alexandre Oliva 2006-03-09 14:39:36 UTC
Created attachment 125876 [details]
Patch that fixes the regression introduced by grub-0.97-nxstack.patch

The transformation from nested functions to non-nested functions did not
preserve the semantics for variable save_length, originally in install_func,
causing the value left over from a previous invocation to break subsequent
invocations.

This patch uses the same idiom used to handle the other variables formerly
accessed from nested functions for this one, such that grub works again.

Comment 12 Alexandre Oliva 2006-03-09 17:01:25 UTC
jkeating tells me a build with the fix is already in and will make FC5, yay!

Comment 13 Alexandre Oliva 2006-03-13 14:16:40 UTC
Confirmed fixed, thanks.


Note You need to log in before you can comment on or make changes to this bug.