From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050512 Fedora/1.0.4-2 Firefox/1.0.4 Description of problem: Given a raid 1 /boot, grub refuses to install itself within the same session to more than one member of the same raid 1 device. Consider: md5 : active raid1 sdb5[1] sda5[2] hda5[0] hd0 maps to hda, hd1 to sda and hd2 to sdb. # grub root (hd0,5) setup (hd0,5) setup (hd1,5) setup (hd2,5) If I run the four commands above in the same grub session, the last two fail with errors such as: Running "install /grub/stage1 d (hd1,5) /grub/stage2 p /grub/grub.conf "... failed Error 16: Inconsistent filesystem structure If, however, I quit grub after every successful setup command, then no such error arises. I don't see any errors returned by the syscalls issued by grub, with strace. Is grub actually performing some legitimate test, and the buffer-flushing ioctls it issues before exiting manage to remove whatever inconsistency it would find there otherwise, or is it a bug that it finds an inconsistency? FWIW, the problem happens not only when grub is started within Linux, but also when the commands are issued from the boot loader environment, so I suspect the latter. Same problem if I use install directly, instead of setup. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Create two partitions in two disks, at the exact same positions 2.Create a raid 1 device out of them 3.Attempt to install grub on both of them, while the raid device is running Actual Results: Inconsistent filesystem detected for any device other than the first to have setup run on it. Expected Results: It works with FC3's grub. Additional info:
This still happens in today's rawhide, and actually causes grub to fail to install on all but the first disk containing RAID 1 members of /boot.
The `Inconsistent filesystem structure´ error still occurs in today's rawhide, and it still causes grub to fail to install correctly on RAIDed /boots.
This appears to be a user error -- you need to "sync" in between the setup calls, which effectively means running grub 3 times. Also, you need to specify "root" for the same drive as "setup" if you want this to work correctly. The respective commands to run in grub are: root (hd0,5) setup (hd0,5) quit root (hd1,5) setup (hd1,5) quit root (hd2,5) setup (hd2,5) quit
Make it anaconda/grubby/booty/whatever error, then, cause I see this on one of the VTs during installation. Now how do you sync when you're on the grub *boot* prompt (i.e., no underlying OS). The same problem is present there, are you *sure* it's not just a regression/bug? It worked with earlier versions of grub, and it still does if I use them with current kernels.
Huh, you're installing the bootloader at the boot prompt? How are you doing that, and what do you expect it to accomplish?
I *could* install the bootloader at the boot prompt. I often do that to overcome the anaconda misfeature that stops me from using a raid 1 /boot without messing with the MBR of the corresponding disks. So I boot up with the corrupted MBR, reinstall grub on all of the raid replicas, chainload the image that should have been left alone in the MBR and then re-install grub on the MBR. But that's beside the point, isn't it? The problem is that either anaconda is relying on a grub feature that is broken, or anaconda needs fixing to not rely on that no-longer-supported feature. It currently fails to set up booting from all but the first replica of a raid 1 because of the symtoms described in this bug report. Since this behavior used to be supported and no longer is, I believe it's a regression, not a removed feature. If it were a new (mis)feature, the error message would probably make sense, but it doesn't.
Are you really saying that our installer's failure to install the boot loader correctly when /boot is in RAID 1, as in comment #5, is not a bug? I suppose you closed this by mistake.
One more point showing it's more likely to be a bug in grub's filesystem handling (as mentioned in the grub documentation related with this error) than a confirmation of your theory that it has to do with syncing: if I repeat the same setup command for the *same* partition, it also fails: root (hd0,1) setup (hd0,1) [ok] setup (hd0,1) [inconsistent filesystem structure] Even if instead of setup I use just the install command twice, it fails: root (hd0,1) install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf [ok] install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf [inconsistent filesystem structure] Could you perhaps explain how this could possibly not be a bug? I'm looking at the code right now, trying to find my way around it, and it fails at this point (watching errnum): Old value = ERR_NONE New value = ERR_UNALIGNED install_blocklist_helper (sector=343059, offset=0, length=512) at builtins.c:1972 1972 return; (gdb) where #0 install_blocklist_helper (sector=343059, offset=0, length=512) at builtins.c:1972 #1 0x0804db66 in rawread (drive=128, sector=343059, byte_offset=0, byte_len=512, buf=0xf7c93c00 "�p\202") at disk_io.c:268 #2 0x0804dcf2 in devread (sector=86019, byte_offset=0, byte_len=512, buf=0xf7c93c00 "�p\202") at disk_io.c:327 #3 0x08050bef in ext2fs_read (buf=0xf7c93c00 "�p\202", len=101884) at fsys_ext2fs.c:440 #4 0x080501bf in grub_read (buf=0xf7c93c00 "�p\202", len=101884) at disk_io.c:1739 #5 0x0805fee0 in install_func ( arg=0xf7ba9c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf", flags=1) at builtins.c:2235 #6 0x08063dfb in enter_cmdline ( heap=0xf7ba9c60 "install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf", forever=1) at cmdline.c:177 #7 0x0805c703 in cmain () at stage2.c:1177 #8 0x0804d7ad in init_bios_info () at common.c:336 #9 0x0804a2e6 in doit.5509 () at asmstub.c:183 #10 0x0804a18e in grub_stage2 () at asmstub.c:266 #11 0x08049e39 in main (argc=2, argv=0xffffbd44) at main.c:264 (gdb) c Continuing. Hardware watchpoint 1: errnum Old value = ERR_UNALIGNED New value = ERR_FSYS_CORRUPT ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329 329 return -1; (gdb) where #0 ext2fs_block_map (logical_block=0) at fsys_ext2fs.c:329 #1 0x08050b3b in ext2fs_read (buf=0xf7cc3a00 "��a", len=90108) at fsys_ext2fs.c:423 #2 0x080501bf in grub_read (buf=0xf7cc0c00 "�p\202", len=101884) at disk_io.c:1739 #3 0x0805fee0 in install_func ( arg=0xf7bd6c68 "/grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf", flags=1) at builtins.c:2235 #4 0x08063dfb in enter_cmdline ( heap=0xf7bd6c60 "install /grub/stage1 (hd0,1) /grub/stage2 p /grub/grub.conf", forever=1) at cmdline.c:177 #5 0x0805c703 in cmain () at stage2.c:1177 #6 0x0804d7ad in init_bios_info () at common.c:336 #7 0x0804a2e6 in doit.5509 () at asmstub.c:183 #8 0x0804a18e in grub_stage2 () at asmstub.c:266 #9 0x08049e39 in main (argc=2, argv=0xffffccd4) at main.c:264 The UNALIGNED error does not occur the first time. I suspect it causes the FSYS_CORRUPT error.
The problem is caused by the static variable last_length in function install_blocklist_helper. If I reset it to its initial value, then install succeeds multiple times. I suspect it might make sense to reset install_func_context as well.
Created attachment 125876 [details] Patch that fixes the regression introduced by grub-0.97-nxstack.patch The transformation from nested functions to non-nested functions did not preserve the semantics for variable save_length, originally in install_func, causing the value left over from a previous invocation to break subsequent invocations. This patch uses the same idiom used to handle the other variables formerly accessed from nested functions for this one, such that grub works again.
jkeating tells me a build with the fix is already in and will make FC5, yay!
Confirmed fixed, thanks.