Bug 718722 - Mismatched or corrupt version of stage1/stage2 [NEEDINFO]
Mismatched or corrupt version of stage1/stage2
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: grub (Show other bugs)
18
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Peter Jones
Fedora Extras Quality Assurance
RejectedBlocker
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-04 08:25 EDT by Bruno Wolff III
Modified: 2014-02-05 06:49 EST (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-02-05 06:49:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
udovdh: needinfo? (pjones)


Attachments (Terms of Use)

  None (edit)
Description Bruno Wolff III 2011-07-04 08:25:54 EDT
Description of problem:
When using grub-0.97-75.fc16.i686 and grub-0.97-74.fc16.i686 I get the following error message:
 Running "install /grub/stage1 (hd0) (hd0)1+31 p (hd0,0)/grub/stage2 /grub/grub.conf"... failed

Error 62�: Mismatched or corrupt version of stage1/stage2


Version-Release number of selected component (if applicable):


How reproducible:
Seems to be 100%

Steps to Reproduce:
1. grub-install
2. grub
3. root(hd0,0)
4. setup (hd0)
  
Actual results:
[root@bruno bruno]# grub-install /dev/md11
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0)	/dev/fd0
(hd0)	/dev/sda
(hd1)	/dev/sdb
[root@bruno bruno]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97-74.fc16  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  31 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+31 p (hd0,0)/grub/stage2 /grub/grub.conf"... failed

Error 62�: Mismatched or corrupt version of stage1/stage2
grub> quit


Expected results:
[root@bruno bruno]# grub-install /dev/md11
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(fd0)	/dev/fd0
(hd0)	/dev/sda
(hd1)	/dev/sdb
[root@bruno bruno]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97-71.fc15  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  26 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+26 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> setup (hd1)  
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  26 sectors are embedded.
succeeded
 Running "install /grub/stage1 d (hd1) (hd1)1+26 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit


Additional info:
grub-0.97-71.fc15.i686 appears to work correctly.
/boot is mounted on /dev/md11 which is a raid1 version 1.0 md array.
Comment 1 Bruno Wolff III 2011-07-04 08:51:35 EDT
I forgot to note that /boot is an ext4 file system.
Comment 2 Joshua Covington 2011-07-06 16:32:01 EDT
I experience the same problem. The only version that works for me is the one before the fedora mass-rebuild (0.97-71).

Trying to narrow down this I found that even the working version 0.97-71 doesn't work if I recompile the package with gcc >= 4.6. Comparing both I see that I get

 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  26 sectors are embedded. (with the downloaded package)
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  31 sectors are embedded. (with the recompiled package)

Maybe it's a problem with the compiler itself or just an application bug. See also http://lists.fedoraproject.org/pipermail/devel/2011-July/153799.html
Comment 3 Joshua Covington 2011-07-06 17:58:10 EDT
I just recompiled v0.97-75 under fc14 which has gcc-4.5.1. It works perfeclty without any flaws.

During the installation I got
 Running "embed /boot/grub/e2fs_stage1_5 (hd0)"...  26 sectors are embedded.
(with the recompiled package)

and not the 31 sectors as when the package is compiled with gcc >= 4.6.0.

The only difference that I saw is that the compiler sets the host as x86_64-unknown and the koji builds set it as x86_64-redhat. As a result the recompiled package installs in /usr/share/grub/x86_64-unknown instead of /usr/share/grub/x86_64-redhat as is the case with the koji builds.
Comment 4 Bruno Wolff III 2011-07-07 11:36:24 EDT
I ran some scratch builds of the f16 grub against rawhide, f15 and f14 and when I get a chance I'm going to look through the build logs to see if I can find anything interesting.

I'm also marking this as an F16 alpha blocker, since I expect this bug is going to break the installs. Which is covered by the following critera:

In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account
Comment 5 Bruno Wolff III 2011-07-07 20:57:08 EDT
This does appear to be triggered by a change outside of grub, as I also rebuilt 0.97-71 on a local machine and see the same issue.
When I tried this, I also got an error running grub-install that might help point to where the problem is:
[root@bruno bruno]# grub-install /dev/md11
The file /boot/grub/stage1 not read correctly.
Comment 6 Bruno Wolff III 2011-07-07 21:27:46 EDT
I tried rebuilding the master (f16) branch with -O0 instead of -Os.
I still got the "The file /boot/grub/stage1 not read correctly." message. The setup command in grub appeared to work. The system failed to boot with grub error 16.
Comment 7 Joshua Covington 2011-07-08 01:31:35 EDT
I just changed the CFLAGS from -Os to -O0 and -O1 so that it reads CFLAGS="-O1 -g -fno-strict-aliasing -Wall -Werror -Wno-shadow -Wno-unused"

The package recomplies and installs (grub> setup (hd0) also works) fine but grub for some reason doesn't want to boot with it.
Comment 8 Joshua Covington 2011-07-08 12:15:07 EDT
I comapred the optimization levels -Os and -O1 and it turned out that the errors should come from -finline-functions and -finline-small-functions. Building with -01 works fine.

However when I set CFLAGS="-Os -g -fno-inline-functions -fno-inline-small-functions -fno-strict-aliasing -Wall -Werror -Wno-shadow -Wno-unused" I still get the mismatch error.
Comment 9 Joshua Covington 2011-07-13 09:47:57 EDT
Any progress on this bug?
Comment 10 Adam Williamson 2011-07-15 13:28:14 EDT
Discussed at the 2011-07-15 blocker review meeting. We agreed that we cannot fully evaluate this until we know whether it will impact a fresh install of Fedora 16; we should have information on that by the middle of next week, and be able to review this issue at next week's review meeting.
Comment 11 Joshua Covington 2011-07-15 18:35:00 EDT
I'll put it this way:

By default a user is asked to install grub in the mbr. This is the step where grub runs the commands in comment #1 which result in the error message. At the end the end-user cannot boot the system and there's no chance he can know what to do.

This is definitely a blocker for me.
Comment 12 ap1821 2011-07-18 12:07:13 EDT
I had the same issue after I reinstalled GRUB on my fedora 15. I installed older package of grub from fc14, then I backuped /boot to /boot_cp and deleted /boot.
Then launching grub-install installed the older grub and after copying .img and the other file from /boot_cp to /boot I was able to boot my Fedora (using second boot option, but it worked that way)
Comment 13 James Laska 2011-07-19 14:48:33 EDT
In talking with pjones regarding this issue, it sounds that this will only hit users that upgrade to a grub built by the newer compiler.  The version of grub currently in rawhide (grub-1:0.97-75.fc16.x86_64) was built using the newer compiler.  Pjones intends to rollback the version of grub in rawhide to an older version built against the older compiler to resolve this issue for now.

Until we can confirm that *fresh* rawhide/f16 installs hit this problem, it's hard to consider it an Alpha blocker.  However, I expect once we have fresh install feedback on rawhide/f16, this very well may be a blocker.  

Until then, pjones is aware of the issue, and may have resolved the problem by then.
Comment 14 James Laska 2011-07-19 16:08:15 EDT
I've finally managed to install a f16/rawhide system.  My virtual guest fails to boot after install.  I am accessing the system via serial console at the moment, so I'm not able to confirm that my failure case matches the boot error observed in comment#0.  However, it seems likely that these are the same problems.  

Starting to feel like this impacts Alpha criteria ...
Comment 15 Tim Flink 2011-07-22 17:47:31 EDT
Discussed at the 2011-07-22 blocker bug review meeting. Since this only affects upgrades, it does not hit any of the alpha release criteria. However, it does hit the following beta release criterion and was accepted as a Fedora 16 beta blocker.

The installer must be able to successfully complete an upgrade installation from a clean, fully updated default installation (from any official install medium) of the previous stable Fedora release, either via preupgrade or by booting to the installer manually. The upgraded system must meet all release criteria.
Comment 16 Joshua Covington 2011-07-23 02:42:24 EDT
As pointed in comment #14 by James Laska (and others) a clean install (no update) cannot boot because of the missmatched versions of stage1/ stage2. I think this hits of the following Fedora 16 Alpha Release Criteria:

14. In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account

15. Following on from the previous criterion, after firstboot is completed and on subsequent boots, a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to a working graphical environment without unintended user intervention. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied 

I'm not sure that the system can "boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention" as stated in rule 14. I still think this is a Alpha blocker.
Comment 17 Bruno Wolff III 2011-07-23 08:17:43 EDT
Currently it is expected that grub2 will be used for fresh installs by the time of the alpha release, so grub1 problems will not be an alpha blocker. If the grub2 feature (http://fedoraproject.org/wiki/Features/Grub2) isn't ready for testing on time, than that may change things. (The current grub2 feature page isn't very encouraging.)
Comment 18 Joshua Covington 2011-08-18 14:59:54 EDT
Any progress here?
Comment 19 Adam Williamson 2011-09-09 14:57:38 EDT
only confusion. =)

f16 fresh installs use grub2, and no-one has reported encountering this with a fresh install. f15 -> f16 upgrades are currently somewhat broken, though I intend to test a yum-based upgrade this afternoon. dgilmore claims to have hit this bug while building an EC2 image.

So, adding a needinfo flag and some CCs.

Note that if this was hidden by rolling back the grub build in F16 it may have re-emerged, as grub has recently been updated to 0.97-76 and then 0.97-77.
Comment 20 Bruno Wolff III 2011-09-09 15:08:14 EDT
I ran across this running grub-install after changing my partition layout on a mirrored system. A quick test is just to run grub-install and see if it breaks things. That will probably be a lot quicker than trying an upgrade. However, the upgrade test is important for determining blocker status.
Comment 21 Dennis Gilmore 2011-09-09 17:03:32 EDT
ec2 requires grub1 it doesnt support grub2, though they use pv-grub so only really need a grub config apparently.  but the installation of grub in the image creation fails so the image if bootable in ec2 is not bootable outside of a pv-grub based environment.
Comment 22 Joshua Covington 2011-09-10 05:14:15 EDT
I've been testing with the grub versions since version 0.97-71. All of them show this bug. I even asked in one of the mailing lists and got this: http://lists.fedoraproject.org/pipermail/devel/2011-July/153799.html

I hope someone finally figures it out.
Comment 23 Adam Williamson 2011-09-14 18:13:16 EDT
joshua: what's really at issue is not whether this bug exists, but whether it's actually a major problem for fedora 16.
Comment 24 Joshua Covington 2011-09-14 19:56:09 EDT
It depends on how you define "major". 

If you do an update from f15 the system will _not_ boot and you won't see any suspicioss message during the update. This has been proven by some posters here. 

It's up to you to decide for yourself how "major" problem this is! I hope that those affected (and the number will be "quite" high) will be able to find this bug report and figure out that grub is the culprit of it. And hopefully they will be able to downgrade grub to a working verion without the need to reinstall everything.
Comment 25 Adam Williamson 2011-09-14 20:09:16 EDT
"If you do an update from f15 the system will _not_ boot and you won't see any
suspicioss message during the update. This has been proven by some posters
here. "

That's not quite the whole story. At present, f15->f16 upgrades do fail; that is being tracked and fixed in https://bugzilla.redhat.com/show_bug.cgi?id=735730 . The intended default behaviour on f15 -> f16 upgrades is that anaconda writes you a new grub2 bootloader, which works. The two bugs preventing this in TC2 are a) the default choice is 'skip bootloader configuration', which does what it says on the tin - nothing at all and b) the choice which should be default, 'install new bootloader configuration', had a bug and didn't work.

For TC3/RC1, both of these will be fixed, and hence f15->f16 upgrades will work, regardless of this bug. I've tested that already.
Comment 26 Adam Williamson 2011-09-15 00:37:47 EDT
I'm proposing we reverse this to a RejectedBlocker, as there is no clear impact on any of the Beta criteria after two rounds of TC testing; this bug does not cause any criteria violations in the Beta. I re-vote -1 blocker. Any other votes?
Comment 27 Tim Flink 2011-09-15 00:52:21 EDT
I'm also going to re-vote -1 here because the issue should be fixed by other bugs and the impact to EC2 image generation isn't clear.
Comment 28 Adam Williamson 2011-09-15 03:07:43 EDT
demoting from blocker status with mine and tflink's votes, for now; if this proves to cause any problems in RC1, we can re-vote it again.
Comment 29 Ryan Hill 2011-11-19 00:57:28 EST
Just thought I'd mention that we're also seeing this in Gentoo so it isn't a Fedora-specific issue.  I found that adding -fno-reorder-functions made the "mismatched or corrupt" install error go away and the system boot successfully, but some users have reported that they still can't boot.  I understand that this flag isn't the cause of this bug and just manages to paper over whatever the real problem is. -fno-inline-small-functions / -fno-inline-functions also worked but not consistently with all -O levels.

One other note: copying over "stage2" from grub built with 4.5 gives you a working system.

https://bugs.gentoo.org/360513
Comment 30 Ryan Hill 2011-12-07 23:23:20 EST
The Ubuntu guys seem to have figured it out: 
https://bugs.launchpad.net/ubuntu/+source/grub/+bug/837815
Comment 31 udo 2012-10-18 09:18:01 EDT
grub-0.97-71.fc15.x86_64.rpm works fine.
the f17 versions give error 6 as described above.
f16 grub gives error 16.
Comment 32 Fedora End Of Life 2013-04-03 13:02:37 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Comment 33 udo 2013-05-12 02:45:49 EDT
Any updates, progress?
The only version that works is grub-0.97-71.fc15.x86_64.
That is right. I could not find a working version of grub from Fedora 16 or 17.
That means that this problem has been there for a while.
Please fix as grub2 is whole different beast and does not do what grub does.
Comment 34 Bruno Wolff III 2013-05-12 11:43:38 EDT
Grub has been dropped from the distribution so I don't think this is going to get fixed.
Comment 35 udo 2013-05-12 11:55:35 EDT
This bug will be another token of the lack of care for quality of the Fedora project.
Why abandon something that works for something else that is a different beast until the beast can do all tricks the old tool can?
Why not properly maintain the old tool? (the lack of working versions for recent fedora's shows)
(This is just one of the many shortcomings in Fedora.)
Comment 36 Bruno Wolff III 2013-05-12 12:04:56 EDT
I don't think anyone has been maintaining grub 1 and that is part of the problem. Upstream is working on grub 2 now. grub 1 doesn't work on some newer file systems and was going to need a lot of work to do that. In the process it could run into the issue that grub 2 did and become too large and end up with some of the same downsides that grub 2 does.

If you really want to run grub 1, you can keep the old version that still works installed. I still do that on a couple of my systems.
Comment 37 Adam Williamson 2013-05-12 12:20:30 EDT
udo: as Bruno said, grub-legacy has been unmaintained upstream for years. Fedora was one of the last hold-outs still using it, and it required RH to pay for someone to work full time just to maintain it. We switched to grub2 to be consistent with other distros and in line with what was maintained upstream.

If you don't like grub2, you may be interested in https://fedoraproject.org/wiki/Features/SyslinuxOption .

The grub package is dead as of F19. Setting back to F18 as the last release for which grub exists.
Comment 38 Fedora End Of Life 2013-12-21 03:28:19 EST
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 39 Fedora End Of Life 2014-02-05 06:49:23 EST
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.