Bug 595951 - May have failed writing grub to /boot/grub
Summary: May have failed writing grub to /boot/grub
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: anaconda
Version: 13
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Anaconda Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-26 01:04 UTC by Bob Gustafson
Modified: 2011-06-27 16:40 UTC (History)
5 users (show)

Fixed In Version: anaconda-15.21-1
Clone Of:
Environment:
Last Closed: 2011-06-27 16:40:16 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
anaconda log file from Fedora 13 install (1.34 MB, text/plain)
2010-05-26 01:04 UTC, Bob Gustafson
no flags Details
Install.log from failed Fedora 13 install (52.82 KB, text/plain)
2010-05-26 01:06 UTC, Bob Gustafson
no flags Details
Install.log.syslog from failed Fedora 13 install (10.23 KB, text/plain)
2010-05-26 01:07 UTC, Bob Gustafson
no flags Details
Make sure a device is selected (1.45 KB, patch)
2011-03-05 00:52 UTC, Brian Lane
no flags Details | Diff

Description Bob Gustafson 2010-05-26 01:04:20 UTC
Created attachment 416592 [details]
anaconda log file from Fedora 13 install

Description of problem:
I just attempted to install Fedora 13 and anaconda quit after doing almost everything. I am attaching log files generated during the install.

I was given the opportunity to save the debug dump which I did (to local..), but I cannot find this file. Does it have a name? What directory would it be saved to?

I am currently in rescue mode and it appears as though grub is not written to /boot/grub There are some files, but when I try to manually write the boot block [grub> root (hd0,0) grub> setup (hd0)] grub cannot find the stage files.

Version-Release number of selected component (if applicable):

  Fedora 13, Anaconda 13.42 from F13 install DVD

How reproducible:

I haven't tried to reinstall because I probably would encounter the same failure.

Steps to Reproduce:
1. Try to install Fedora 13
2.
3.
  
Actual results:

 As mentioned above. Dead in the water now.

Expected results:

  An installed Fedora 13
Additional info:

I am running a raid1 which was customized within Anaconda. It has two disks, two partitions each, the first a 500MB /boot, the second with a LVM group containing swap and /.  Software Raid with /dev/md0 and /dev/md1

Comment 1 Bob Gustafson 2010-05-26 01:06:07 UTC
Created attachment 416593 [details]
Install.log from failed Fedora 13 install

Comment 2 Bob Gustafson 2010-05-26 01:07:32 UTC
Created attachment 416594 [details]
Install.log.syslog from failed Fedora 13 install

Comment 3 Bob Gustafson 2010-05-26 04:47:14 UTC
Ho, I got it to boot.

Guessing that the (only) problem was the lack of grub on /boot/grub and in the MBR, I did:

# grub-install /dev/sda

# grub-install /dev/sdb

# grub

grub> root (hd0,0)
grub> setup (hd0)

grub> root (hd1,0)
grub> setup (hd1)

This is to ensure that both disks of the /dev/md0 pair have grub and the proper contents of their respective MBRs.

A way to test this is to dump the first sector of each disk and compare/see if reasonable

dd if=/dev/sda bs=512 count=1 | od -c

dd if=/dev/sdb bs=512 count=1 | od -c

My disks start out 353  H 220 020 216 ....

Booting up this machine goes well although there is a pause for selinux to relabel the 500GB disks. I notice that the default is now permissive. Is this to smooth over problems? Can I change it to enforcing?

Anyway, I am now logged into my (new) machine. Thanks much.

Comment 4 Bob Gustafson 2010-05-26 05:01:47 UTC
Using the nice Disk Utility, I see that my two RAID1 partitions have different metadata. Is this odd?

/dev/md0 which is used to host /boot is Raid Metadata 1.0 and
/dev/md1 which is used to host lvm is Raid Metadata 1.1

FWIW

Comment 5 Chris Lumens 2010-05-26 15:07:25 UTC
anaconda 13.42 exception report
Traceback (most recent call first):
  File "/usr/lib/anaconda/booty/util.py", line 5, in getDiskPart
    path = storage.devicetree.getDeviceByName(dev).path[5:]
  File "/usr/lib/anaconda/booty/x86.py", line 450, in grubbyPartitionName
    (name, partNum) = getDiskPart(dev, self.storage)
  File "/usr/lib/anaconda/booty/x86.py", line 365, in writeGrubConf
    f.write('\trootnoverify %s\n' % self.grubbyPartitionName(device))
  File "/usr/lib/anaconda/booty/x86.py", line 219, in writeGrub
    chainList, grubTarget, grubPath, cfPath)
  File "/usr/lib/anaconda/booty/x86.py", line 510, in write
    not self.useGrubVal)
  File "/usr/lib/anaconda/bootloader.py", line 217, in writeBootloader
    kernelList, otherList, defaultDev)
  File "/usr/lib/anaconda/dispatch.py", line 205, in moveStep
    rc = stepFunc(self.anaconda)
  File "/usr/lib/anaconda/dispatch.py", line 126, in gotoNext
    self.moveStep()
  File "/usr/lib/anaconda/gui.py", line 1313, in nextClicked
    self.anaconda.dispatch.gotoNext()
  File "/usr/lib/anaconda/iw/progress_gui.py", line 79, in renderCallback
    self.intf.icw.nextClicked()
  File "/usr/lib/anaconda/gui.py", line 1334, in handleRenderCallback
    self.currentWindow.renderCallback()
AttributeError: 'NoneType' object has no attribute 'path'

Comment 7 Alejandro Segovia 2010-06-29 05:03:04 UTC
I think this bug report has important debug info for tickets: 450143 and 608785.

My take is that in all cases getDiskPart is unable to locate a device called whatever the "dev" variable contains:

File "/usr/lib/anaconda/booty/util.py", line 5, in getDiskPart
    path = storage.devicetree.getDeviceByName(dev).path[5:]
...
AttributeError: 'NoneType' object has no attribute 'path'

Any Anaconda developers who have an idea on why "dev" may up with a wrong device identifier? Where could I find the source code for Anaconda?

Alejandro.-

Comment 8 Hans de Goede 2010-06-29 15:24:30 UTC
Hi,

Thanks for the bug report.

I've spend some time investigating this and I've managed to reproduce this. The problem is that when presented with the bootloader configuration screen you added an entry, but as there are no normal (raid are not considered normal in the case) partitions on your system the combobox to select the device to chain to was empty, resulting in an entry in our bootloader configuration table (called images internally) with a device which is set to None, resulting in this backtrace.

So that is how / why this happened. Below is a note to self on how to fix this:

1) We should not allow the add button to work (maybe grey it out) when there are no devices to populate the device combobox. 

2) As we allow putting /boot on a raid mirror we should also allow chaining to a raid partition

I'll try to write a patch for this tomorrow (iow in plenty time for Fedora 14).

Regards,

Hans

Comment 9 Bob Gustafson 2010-06-29 15:49:48 UTC
After doing a couple more Fedora 13 installs (all with RAID1), I discovered another wrinkle with installing and using grub installed in the MBR.

I am using software RAID1 and I think that during the bios boot sequence, the bios doesn't know about RAID. Thus it boots from the MBR on either /dev/sda or /dev/sdb instead of /dev/md0.

The bios (ASUS P5K..) has a configuration for *which* hard disk is configured into the boot selection sequence.

If you have a disk failure, the bad disk may be the one configured to boot !!

When I reconfigured the bios to avoid the bad disk - it seemed like the configuration change did not work. In this case, opening the box and switching the SATA cables plugged into the motherboard did the trick.

------

After installing a new operating system on RAID1 disks, it is a good idea to make sure that grub is installed on both disks (i.e., /dev/sda & /dev/sdb). That way, when you do have a disk failure, there won't be so much fumbling around on the next reboot.

In most cases, booting with a rescue CD/DVD will allow you to stamp grub onto the MBR of the surviving disk and partition any blank replacement disk.

Comment 10 Hans de Goede 2010-06-30 11:28:12 UTC
Hi,

(In reply to comment #9)
> After doing a couple more Fedora 13 installs (all with RAID1), I discovered
> another wrinkle with installing and using grub installed in the MBR.
> 
> I am using software RAID1 and I think that during the bios boot sequence, the
> bios doesn't know about RAID. Thus it boots from the MBR on either /dev/sda or
> /dev/sdb instead of /dev/md0.
> 
> The bios (ASUS P5K..) has a configuration for *which* hard disk is configured
> into the boot selection sequence.
> 
> If you have a disk failure, the bad disk may be the one configured to boot !!
> 

Right, if the disk dies in such a way that the BIOs autodetection still sees the disk, but other then that it does not work properly then your machine won;t boot. If you want a solution for this you will need a machine with either hardware or firmware RAID.

> When I reconfigured the bios to avoid the bad disk - it seemed like the
> configuration change did not work. In this case, opening the box and switching
> the SATA cables plugged into the motherboard did the trick.

This is probably BIOS dependent normally when you change the BIOS drive order, the second / spare disk becomes BIOS device 80, iow the first disk, as it is the disk the system is booting from. This is what our raid mirror grub installation code assumes, that the second drive will become the first one when the 1st one fails, iow the GRUB in the bootsector of that disk tries to load the rest of itself using BIOS device 80. If for some reason the BIOS keeps seeing the second disk as device 81 when the first one is gone, there is nothing we can do, esp. as if the first disk *really* dies or gets unplugged the second disk will be the only disk and thus device 80.

As you've proven with swapping the cables, this actually works as advertised.

> After installing a new operating system on RAID1 disks, it is a good idea to
> make sure that grub is installed on both disks (i.e., /dev/sda & /dev/sdb).
> That way, when you do have a disk failure, there won't be so much fumbling
> around on the next reboot.
> 

Right, this is exactly what we do.

Comment 11 Brian Lane 2011-03-05 00:52:07 UTC
Created attachment 482403 [details]
Make sure a device is selected

This patch will make sure there is a device selected when adding other os's

Comment 12 Bug Zapper 2011-06-02 13:29:04 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bob Gustafson 2011-06-02 15:19:01 UTC
I am running Fedora 15 now.

If I remember correctly, the Anaconda option for F15 was to write the MBR on one or the other of the two disks in the /boot RAID array, or the /dev/md0 RAID array itself.

I chose to write the MBR to /dev/sda. One of these days I will do /dev/sdb.

I could have chosen to write to the /dev/md0 and in theory, both /dev/sda and /dev/sdb would have had their MBR blocks written. I am not so sure this would have happened, so I chose to do the /dev/sda.

It would be good to have check boxes rather than a radio, so that BOTH disks in the RAID array could be written individually.

Comment 14 Bug Zapper 2011-06-27 16:40:16 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.