Bug 743273

Summary: grub2 fails to install on IMSM raid device
Product: [Fedora] Fedora Reporter: Jes Sorensen <Jes.Sorensen>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: agajan, awilliam, dennis, dledford, jensting, mads, mattyclarkson, pjones, vserbine, xaphir
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-13 21:09:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
device.map file, with the md device added
none
grub.cfg none

Description Jes Sorensen 2011-10-04 13:03:05 UTC
Description of problem:
grub2 fails to install on IMSM raid device, making it impossible to complete
an install or upgrade on a system which uses only IMSM (BIOS) raid for it's
storage.

This particular system was installed as F15 and then upgraded to F16 beta
using yum, then manually upgraded from grub1 to grub2.

I can install grub on /dev/sdc without problems (non raid device), but
installing it to /dev/md126 fails, despite it being a bootable device.

Version-Release number of selected component (if applicable):


How reproducible:
Every time

Steps to Reproduce:
1.
2.
3.
  
Actual results:
[root@mahomaho ~]# grub2-install /dev/md126
/sbin/grub2-setup: error: no such disk.


Expected results:


Additional info:
[root@mahomaho ~]# mdadm --misc --detail /dev/md126
/dev/md126:
      Container : /dev/md127, member 0
     Raid Level : raid1
     Array Size : 488383488 (465.76 GiB 500.10 GB)
  Used Dev Size : 488383620 (465.76 GiB 500.10 GB)
   Raid Devices : 2
  Total Devices : 2

          State : clean, resyncing
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 30% complete


           UUID : 76574ed6:ba038538:d18f42a3:3cde95d1
    Number   Major   Minor   RaidDevice State
       1       8        0        0      active sync   /dev/sda
       0       8       16        1      active sync   /dev/sdb

Comment 1 Jes Sorensen 2011-10-04 13:03:33 UTC
Forgot to include the grub version number:

grub2-1.99-6.fc16.x86_64

Comment 3 Adam Williamson 2011-10-04 20:49:52 UTC
I did complete a successful install from F16 Beta DVD media to an Intel BIOS RAID-1 array while testing https://bugzilla.redhat.com/show_bug.cgi?id=742226 - the RAID-1 array was the only install target, so it got the bootloader on there somehow, though I'm not sure precisely what command the installer ran.

It would help if you could test this from the installer rather than manually.

Comment 4 Jes Sorensen 2011-10-05 05:44:17 UTC
Trust me, I have *tried* doing this from the installer, about 20 times,
but it isn't possible due to BZ#742888

I did manage somehow to get the installer to run it at some point as an
upgrade I think, but even the grub2 install failed.

Jes

Comment 5 David Lehman 2011-10-07 18:12:05 UTC
What if you add md126 to /boot/grub2/device.map (assuming it's not already there) and then try again?

Comment 6 Adam Williamson 2011-10-07 18:16:49 UTC
jes: so, you can't reproduce 742888 any more, and the other bug you hit there (731356) has a known workaround (delete the LANG= parameter), so can you try again with a clean install and report the result?

Comment 7 Adam Williamson 2011-10-07 18:17:19 UTC
or specify the device to install to in the format used in device.map (on my system it seems to use /dev/disk/by-id names)?

Comment 8 Adam Williamson 2011-10-07 18:19:16 UTC
Discussed at 2011-10-07 blocker review meeting. Agreed that it's unclear what's wrong here (if anything) and we need more information to evaluate the blocker status of this bug. Jes, please test more and provide more data (at least grub config), and pjones when you can, let us know what's going wrong. thanks!

Comment 9 Adam Williamson 2011-10-14 17:50:13 UTC
*** Bug 744054 has been marked as a duplicate of this bug. ***

Comment 10 Adam Williamson 2011-10-14 17:51:22 UTC
I hit a similar error message when testing on my laptop so it does seem like there's a real issue here, though it would be good to know jes can still reproduce and test, as I can't (I had to have my laptop working so I converted it to soft RAID with a separate /boot partition). jes, can you confirm you're still able to test this?

Discussed at 2011-10-14 blocker review meeting. Agreed to punt on this again as we really need pjones to take a look at what's going wrong here.

Comment 11 Jes Sorensen 2011-10-17 08:03:54 UTC
*** Bug 746460 has been marked as a duplicate of this bug. ***

Comment 12 xaphir 2011-10-17 15:04:52 UTC
Jes: 746460 was strictly software raid; there was no bios raid involved.

Comment 13 xaphir 2011-10-17 18:45:51 UTC
Jes:  I was able to fix the problem by downloading a system rescue cd (http://www.sysresccd.org) and doing a chroot on the raid array.  Then I had to run "rmmod floppy" to get past an fd0 error that grub2-install will generate, which will happen if the bios floppy controller is enabled where there is no floppy.  Disabling floppy controller in the bios fixed that.  (grub2-install will fail to recover from the fd0 error if it occurs.)  Then you run

grub2-install --recheck /dev/sda

from the chroot terminal.  After that, the array should boot.  The original uuid parameters in /boot/grub2/grub.cfg and in /etc were all there as the F16 installer left them.

Comment 14 Adam Williamson 2011-10-17 20:50:04 UTC
I'm pretty sure that didn't help my laptop case, but again, I can't test that one any more. :/

Comment 15 Jes Sorensen 2011-10-18 07:25:30 UTC
Sorry for the late reply, yes I can still reproduce this problem.

Unlike xaphir's case, I don't have a fake floppy controller in this
system, so that didn't make a difference.

However adding it to /boot/grub2/device-map makes the problem go away.

It looks like grub2 has issues handling missing devices.

I will upload my grub.cfg and my device-map files in a moment. Note these
are the updated files, and also note in this test case I was trying to put
grub onto a different partition than the one specified in grub.cfg for
testing purposes.

Cheers,
Jes

Comment 16 Jes Sorensen 2011-10-18 07:27:44 UTC
Created attachment 528736 [details]
device.map file, with the md device added

Comment 17 Jes Sorensen 2011-10-18 07:29:26 UTC
Created attachment 528737 [details]
grub.cfg

Note the grub.cfg specifies a different install device than the one causing
the failure. I was unable to install grub onto the raid device from anaconda
so I installed on a different partition and reproduced the error manually
on the command line instead.

Comment 18 Adam Williamson 2011-10-21 23:49:24 UTC
neither pjones nor I can reproduce with a simple supported test case:

1) install f15 to Intel BIOS RAID
2) upgrade from an F16 DVD

for both of us this works: the upgraded system has a working (bootable) grub2 config. The case where I hit a similar error, I did a yum upgrade. It's possible there's still a bug lurking here but we may need more detail to figure out exactly what it is. As things stand I don't think we can say for sure there's a blocker here.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Jes Sorensen 2011-10-24 15:08:17 UTC
Ok, I ran some more testing - here is the data.

My system is setup as follows:

4 x 500GB SATA drives
/dev/sd[ab] are assembled as a raid1 (/dev/md126)
/dev/sd[cd] are left as standalone drives

Fedora-16-Beta-TC1 iso put onto a USB stick using livecd-iso-to-disk

During install, I create two regular partitions on /dev/md126 for boot and /
Anaconda is not allowing me to install anaconda onto /dev/md126, but only
offers me to put it onto the /boot partition (/dev/md126p1).

Everything installs fine, no errors. Post boot I use the BIOS boot menu
to ask for boot from the raid device, rather than one of the standalone
disks.

At this point it just hangs - I get a flashing cursor and nothing..... :(

Jes

Comment 20 Adam Williamson 2011-10-24 15:26:04 UTC
"Anaconda is not allowing me to install anaconda onto /dev/md126, but only
offers me to put it onto the /boot partition (/dev/md126p1)."

This is a separate bug - https://bugzilla.redhat.com/show_bug.cgi?id=744088

"Post boot I use the BIOS boot menu
to ask for boot from the raid device, rather than one of the standalone
disks.

At this point it just hangs - I get a flashing cursor and nothing..... :("

Well, yes. There's no bootloader on the MBR of the RAID device. Of course it ain't going to work.

744088 should be 'fixed' in Final TC2 by popping up another disk selection screen earlier in installation that lets you pick the bootloader target disk. Can you please try with Final TC2 and let us know how it goes?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 21 Adam Williamson 2011-10-24 16:39:16 UTC
Discussed at 2011-10-24 QA meeting, functioning as a blocker review meeting. This is reading more and more like a niche case and/or pilot error, but we're punting on it again just until this afternoon, when pjones hopes to have more data.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Adam Williamson 2011-10-25 00:05:50 UTC
Jes, can you please provide the feedback requested asap? we're short on time for RC...

Comment 23 Jes Sorensen 2011-10-25 07:24:44 UTC
I am happy to provide feedback, but you haven't told me what you want me
to provide.

I am pretty sure this is not pilot error, but it might be masked by
744088 at this point. I can't cannot confirm that before I can get access
to an iso test image.

Comment 24 Adam Williamson 2011-10-25 15:44:00 UTC
See comment #20. As I understand your description of your previous test, you installed the bootloader to the first partition on the disk, not to the MBR, so the disk was left with no MBR bootloader; naturally you can't boot from it.

What I'd like you to do is do a test which doesn't hit that bug so we can tell whether anything is actually broken.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 25 Adam Williamson 2011-10-25 15:44:46 UTC
I already said that 744088 is fixed in TC2, so just test with TC2. http://dl.fedoraproject.org/pub/alt/stage/16.TC2/

Comment 26 Jes Sorensen 2011-10-25 16:05:11 UTC
It works if I install to a regular disk partition, but I haven't been able
to install to the MBR as explained above.

This is the first I have heard of TC2, so I will pull that now and report
back as soon as I get it down.

Comment 27 Jes Sorensen 2011-10-25 19:49:04 UTC
Ok I retested using TC2. The results are mixed, but less bad than before:

1) If I boot the installer and let Anaconda do auto-partitioning onto the
raid device, grub installs and boots ok.

2) If I do custom partitioning, it fails with grub saying the size of the
ELF header of stage1 is wrong, or similar. The custom partitioning 
including creating the magic boot partition that isn't referenced anywhere,
but adamw told me about on irc (I never needed this when I installed anything
else on this system, so at least having a slightly more informative help/error
message would be kinda good).

I didn't save the logs from the failed boot, but I can try to reproduce it
later and save them.

This might be good enough for release, even if it isn't ideal.

Jes

Comment 28 Jes Sorensen 2011-10-26 12:46:47 UTC
I tried running a few more tests on this, and I cannot reproduce the issue
with grub2 getting into a weird state post install, at least for now.

It may have to do with using pre-created partitions at the initial install.
However I wiped the partition table since then, so reproducing the exact
same scenario will be hard.

It seems to work for most cases with the latest fixes in place, so I recommend
we don't keep this is a block for F16.

Worth noting that TC2 pretty much hangs solid every time, upon reboot once the
install has finished.

Jes

Comment 29 Adam Williamson 2011-10-26 20:20:35 UTC
Reporter has requested this be un-proposed, so un-proposing.

Comment 30 Aram Agajanian 2011-10-29 16:13:26 UTC
I am using IMSM on the boot drive.  My typical partitioning is as follows:

partition boot
partition boot2

logical volume lv0/root
logical volume lv0/root2
logical volume lv0/home
logical volume lv0/opt

Then, I alternate boot and root partitions with each new installation.  So I might use boot and root for F15 and LVs boot2 and root2 for F16.  This way, I don't have to restore my home directory from backup.

Should I expect problems when if I install F16 and keep this custom partitioning?  Will it work if I install grub2 to the boot partition instead of the MBR?

By the way, I don't really understand statement 2) in Comment #27.

Comment 31 Adam Williamson 2011-10-29 17:18:30 UTC
from what we know right now I'd expect it to most likely work, but we really don't have a huge amount of data. the only problem you might hit would be https://bugzilla.redhat.com/show_bug.cgi?id=737508 ; you might want to check the alignment of the first boot partition, I guess. But even if you hit that it should be workaround-able. But really, we only have maybe 5-6 different reports from IMSM so far.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 32 Mads Kiilerich 2012-04-16 21:21:11 UTC
Was this really fixed for f16? Or is it still a problem in f17?

Comment 33 Matt Clarkson 2012-06-07 13:01:29 UTC
I just tried a clean install of F17 using the Anaconda GUI install and it fails to install the bootloader on my Intel RAID Mirror 1.

What information can I provide to help fix this bug?

Comment 34 Matt Clarkson 2012-06-07 13:04:53 UTC
I used custom partitioning and have Windows 7 already installed.

Comment 35 Matt Clarkson 2012-06-07 13:42:20 UTC
I tried installing GRUB2 via the LiveCD:

[root@localhost liveuser]# mdadm --misc --detail /dev/md126
/dev/md126:
      Container : /dev/md127, member 0
     Raid Level : raid1
     Array Size : 488383488 (465.76 GiB 500.10 GB)
  Used Dev Size : 488383620 (465.76 GiB 500.10 GB)
   Raid Devices : 2
  Total Devices : 2

          State : clean, resyncing 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

  Resync Status : 22% complete


           UUID : a55f0c47:51511d71:fbb9100b:b6f83315
    Number   Major   Minor   RaidDevice State
       1       8        0        0      active sync   /dev/sda
       0       8       16        1      active sync   /dev/sdb

[root@localhost liveuser]# grub2-install /dev/md126
/usr/share/grub/grub-mkconfig_lib: line 53:  1790 Segmentation fault      (core dumped) "${grub_probe}" -t fs "$path" > /dev/null 2>&1
Path `/boot/grub2' is not readable by GRUB on boot. Installation is impossible. Aborting.

Comment 36 Mads Kiilerich 2012-06-07 13:58:14 UTC
That looks like a different problem than what is tracked here. It is probably a bug that has been fixed in http://koji.fedoraproject.org/koji/buildinfo?buildID=322368 - please give that a try and file a new issue if you see the same problem with that version.

Comment 37 Matt Clarkson 2012-06-07 14:15:25 UTC
Thanks, Mads.

Comment 38 Matt Clarkson 2012-06-07 14:43:20 UTC
I updated - currently have grub2-2.0.0.beta4 and I get the following error:

[root@localhost /]# grub2-install /dev/md126
/usr/sbin/grub2-bios-setup: warning: disk isn't LDM.
/usr/sbin/grub2-bios-setup: warning: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
/usr/sbin/grub2-bios-setup: error: will not proceed with blocklists.

I used the folowing guide www.webtechquery.com/index.php/2010/04/install-grub2-from-live-cd//usr to try an install grub2

Will beta6 be in one of the nine fedora repos and fix the above problem?  Or is this a user (my) error?

Thanks,

Matt

Comment 39 Mads Kiilerich 2012-06-07 14:53:34 UTC
(In reply to comment #38)

Please clarify: Is that with beta4 or beta6?

You say Intel raid - it looks more like software raid to me.

I am not uptodate with raid, but AFAIK you shouldn't install to the raid device but to each of the disks. A raid that spans whole disks might however not leave any room for installing a boot loader and is thus not a good idea for a bootable disks. Please research this elsewhere - I am probably wrong.

Yes, this looks like a user error - or at least an error unrelated to the issue reported here.

Comment 40 Matt Clarkson 2012-06-07 15:17:17 UTC
That was with beta4.  I updated grub2 using rawhide and tried beta5 as well - same error.

It's Intel RAID - maybe it is getting picked up wrongly.  P55 chipset.

Thanks for the info and feedback, it's been helpful.  I'll see what I can do from here on out.

Comment 41 Adam Williamson 2012-06-07 21:29:15 UTC
Mads: Intel firmware RAID uses mdraid.

Comment 42 Vladimir Serbinenko 2012-06-14 15:07:12 UTC
(In reply to comment #38)
> I updated - currently have grub2-2.0.0.beta4 and I get the following error:
> 
> [root@localhost /]# grub2-install /dev/md126
> /usr/sbin/grub2-bios-setup: warning: disk isn't LDM.
This is bad. It's basically an assert failure. For some reason GRUB thinks that you use LDM and then it sees that it's not the case. Can you try
> grub2-install --debug /dev/md126

Comment 43 Peter Jones 2012-08-08 18:50:28 UTC
*** Bug 832872 has been marked as a duplicate of this bug. ***

Comment 44 Fedora End Of Life 2013-01-16 16:57:11 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 45 Fedora End Of Life 2013-02-13 21:09:57 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.