Bug 533621

Summary: Can't Boot After F12 b2 DVD Upgrade on sytem with RAID1 /boot
Product: [Fedora] Fedora Reporter: Justin Newman <eqisow>
Component: anacondaAssignee: Radek Vykydal <rvykydal>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: awilliam, ddumas, eqisow, jlaska, lili, vanmeeuwen+fedora
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: anaconda-13.9-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-02-23 19:51:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
requested logs and configs
none
rpmsave and backup devicemap none

Description Justin Newman 2009-11-07 22:37:08 UTC
Description of problem:
After F12 DVD upgrade, cannot boot. Grub error 17 on one disk, and blinking cursor on the other.

How reproducible:
Upgrade from F11 using optical installer with a raid1 /boot

Actual results:
Can't boot.

Expected results:
First boot successful.

Additional info:
/boot is on /dev/md0 (RAID1).

I was able to correct the issue by remapping the device drives and reinstalling grub from a live CD. During the install I selected "Update GRUB," which was the recommended option.

I was using the beta2 disc and haven't tested with the latest nightly.

Seems to possibly be a carry-over from this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=505966

Comment 1 Adam Williamson 2009-11-07 23:30:10 UTC
there is no 'beta 2', can you be more specific what you tested with?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 2 Justin Newman 2009-11-07 23:48:51 UTC
Sorry, my mistake. I don't know how I got the two stuck in my head; was just the beta.

http://mirrors.kernel.org/fedora/releases/test/12-Beta/Fedora/x86_64/iso/Fedora-12-Beta-x86_64-netinst.iso

Comment 3 Adam Williamson 2009-11-08 07:25:29 UTC
Liam, can you please check whether you can reproduce this, during your installer testing? Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Adam Williamson 2009-11-08 20:35:08 UTC
looks like this may be the same as 533533.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 5 Adam Williamson 2009-11-09 16:33:10 UTC
more questions: is this hardware, BIOS or software RAID? what type exactly?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 6 Radek Vykydal 2009-11-09 17:04:50 UTC
Justin, could you describe how you remapped the devices and where grub was installed in F11 - boot partition or mbr (F11 anaconda would install it in mbr for both boot partition and mbr choices, but you may have fixed it manually)?

These log files from the upgraded machine can be helpful:
/boot/grub/grub.conf
/boot/grub/device.map
/root/anaconda-ks.cfg
/var/log/program.log
/var/log/storage.log
/var/log/anaconda.log
/var/log/anaconda.syslog
/etc/sysconfig/grub

Also some more detailed description of your configuration could be. Isn't it that of bug #533545?

Comment 7 Justin Newman 2009-11-09 18:30:52 UTC
Created attachment 368266 [details]
requested logs and configs

To answer the first question, this is software raid with mdadm.

Sure, to fix it afterwards I boot into a live disk, assembled the /boot raid and mounted it under /boot in the live system (so I didn't have to bother assembling my root fs). Then 'grub-install --recheck /dev/sdc'. Without the recheck parameter I received the error, "does not have any corresponding BIOS drive." I thought that was interesting, because in the case of bug #533545 the --recheck parameter was not necessary to fix it. (Short answer, it was installed on the MBR of sdc)

Yes, the system configuration is the same as bug #533545.

Requested files are attached.

Comment 8 Radek Vykydal 2009-11-10 13:09:59 UTC
Thanks for the logs,
I can read this from them:

F11 was installed with this driveorder specified: sdc,sdd,sda,sdb,sde,sdf, grub was installed into mbr of sdc, which was mapped to (hd0). Anaconda generated /boot/grub/device.map containing this mapping. Present /boot/grub/device.map is different:

(fd0)   /dev/fd0
(hd0)   /dev/sda
(hd1)   /dev/sdb
(hd2)   /dev/sdc
(hd3)   /dev/sdd

It maps /dev/sdc to (hd2), and was probably created as a result of "grub-install --recheck /dev/sdc". Need for using --recheck option suggests that former device.map was not valid.

The log from upgrade says:

 and grub update failed using mapping /dev/sdc to (hd2):

    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd2,0)

Error 21: Selected disk does not exist
grub> install --stage2=/boot/grub/stage2 /grub/stage1 d (hd2) /grub/stage2 p (hd2,0)/grub/grub.conf

Error 12: Invalid device requested
grub>

Anaconda read from /etc/sysconfig/grub that grub (stage1) was installed on /dev/sdc which, according to driveorder detected by anaconda during upgrade is mapped to (hd2). It seems like the grub shell from log above was using device.map generated when installing F11 (with driveorder sdc,sda,sdb,sde,sdf) where (hd2) would correspond to sdb. Anaconda should have updated original device.map if it didn't correspond to detected order, but I can't say if it really happened correctly. Could you collect

/boot/grub/device.map.backup and
/boot/grub/device.map.rpmsave from your system if they are present? They should give me the information.

Comment 9 Justin Newman 2009-11-10 14:17:18 UTC
Created attachment 368398 [details]
rpmsave and backup devicemap

Comment 10 Radek Vykydal 2009-11-10 14:42:29 UTC
Now I see - the problem is that original F11 install device.map (/boot/grub/device.map.rpmsave) is:

# this device map was generated by anaconda
(hd0)     /dev/sdc
(hd1)     /dev/sdd

Anaconda puts only devices used during grub install into it. During upgrade, only grub values already present in file are updated, so I imagine the contents of updated file would be something like this (/boot/grub/device.map.backup contains garbage, or is binary?):

# file updated by anaconda
# this device map was generated by anaconda
(hd0)     /dev/sda
(hd1)     /dev/sdb

/dev/sdc is missing - I'll come with a patch fixing update of device.map that should solve the issue.

Comment 11 Adam Williamson 2009-11-10 16:17:16 UTC
Radek, is this an issue that will hit many people, or depends on having a complex disk layout? Is it an F12 regression or would it have been the same in F11? Just trying to assess its blocker-ness. thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 12 Adam Williamson 2009-11-10 16:30:29 UTC
So, Hans de Goede states that this will not hit many people; possibly several who have /boot on software RAID, but that is not a huge amount, and not all of them. He also states it's probably not a regression since F11. Given the above, I think we can not consider it a blocker for F12 release. Will check with jkeating/notting/jlaska etc what they think.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 13 James Laska 2009-11-10 17:25:14 UTC
(In reply to comment #12)
> Given the above,
> I think we can not consider it a blocker for F12 release. Will check with
> jkeating/notting/jlaska etc what they think.

Nice job getting to the bottom of this.  I don't have any objections, let's toss it on Common_F12_Bugs for the users that will hit it.

Comment 14 Adam Williamson 2009-11-10 17:45:43 UTC
for that I'd need an explanation I can understand =) do you get the explanation, jlaska? If not, could a kind anaconda person provide an explanation ratcheted down to a level a dumb bug monkey can understand? thanks :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 15 James Laska 2009-11-10 18:25:11 UTC
(In reply to comment #14)
> for that I'd need an explanation I can understand =) do you get the
> explanation, jlaska? If not, could a kind anaconda person provide an
> explanation ratcheted down to a level a dumb bug monkey can understand? thanks
> :)

Same boat for me.  I've got a queue of entries to write, so I'll be happy to take this one.

Radek ... can you help outline the problem so that we can document this issue including details on:
 * how someone can determine if they're affected by this
 * and how to workaround the issue

Comment 16 Radek Vykydal 2009-11-11 09:43:43 UTC
(In reply to comment #15)

> 
> Radek ... can you help outline the problem so that we can document this issue
> including details on:

As Hans said, this bug is not a regression, but it is independent of having /boot on mdraid, the cause of the bug is change of driveorder between install and upgrade (for specific configurations).

* Simple reproducer:
1) Install F11 on machine with 2 drives, putting /boot partition on second disk (BIOS drive order) and bootloader in MBR of the second disk too. This can be achieved for example by specifiing something like --driveorder=sdb,sda in ks.
2) Upgrade to F12.

* The cause and explanation:
Incomplete device.map file is the cause.
The file is generated by anaconda during install and contains only info for drives used for grub - that is for drives containing /boot partition, drives where grub stage1 will go to, and drives containing chainloaded bootloaders - so in case of the reproducer, it contains only "(hd0) /dev/sdb".
When upgrading grub, if the detected driveorder is different (we can't take into account driveorder information from installation), the records are updated, so (hd0) becomes /dev/sda in reproducer. Anaconda wants to upgrade grub which is in /dev/sdb (this info is stored in /etc/sysconfig/grub), but there is no record for /dev/sdb in devices.map and so it fails.

* Fix:
Generate more complete device.map during upgrade.

>  * how someone can determine if they're affected by this

Upgraded system won't boot (probably just cursor on screen, or message from description), and when investigating it from rescue mode, /mnt/sysimage/boot/grub/device.map file will not contain <DEVICE> which is in file /mnt/sysimage/etc/sysconfig/grub on line boot=<DEVICE>.

>  * and how to workaround the issue  

chroot /mnt/sysimage
grub-install --recheck <DEVICE>

Also, the symptom of this bug is that just "grub-install <DEVICE>" without --recheck option would give
"<DEVICE> does not have any corresponding BIOS drive."
error message.

Comment 17 Radek Vykydal 2009-11-18 16:39:34 UTC
This should be fixed in version 13.9 of anaconda.