Bug 989644

Summary: anaconda does not install grub on both raid drives when using btrfs raid1
Product: [Fedora] Fedora Reporter: Rudolf Kastl <che666>
Component: anacondaAssignee: David Shea <dshea>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, bugzilla, dshea, gczarcinski, g.kaviyarasu, jonathan, mkolman, sbueno, vanmeeuwen+fedora
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: anaconda-22.9-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-04 19:21:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 864198, 1094489    
Bug Blocks:    
Attachments:
Description Flags
shot of hung boot after removing disk 2 of 2-disk btrfs raid1 none

Description Rudolf Kastl 2013-07-29 16:58:58 UTC
Description of problem:
i am using btrfs raid1 with /boot / and /home as subvolumes. Anaconda though only installed grub2 on one of the two raid drives (sda) not on the second one (sdb). I was told that the code of it is still missing and that its worth reporting the issue in bugzilla as a reminder.

Steps to Reproduce:
1. install f19 with btfs raid1
2. try to boot from your second drive (e.g. with selecting it in the bios as boot disk)
3. watch it fail.

Comment 1 David Shea 2014-05-06 17:49:58 UTC
Actually, this bug is moot, for now, since we don't allow /boot on btrfs anymore. See bug 864198. But if that changes we should totally fix this.

Comment 2 Gene Czarcinski 2014-05-06 18:42:46 UTC
If the btrfs filesystem on the multi-device volume is configured for data and metadata being raid1, then you should get the multiple copies as this is a filesystem function and not anaconda.

Since I have fixed grubby.c and have a patch to re-enabled booting off a btrfs subvolume or a btrfs volume, I will see if I can set up a test.

Comment 3 Gene Czarcinski 2014-05-06 18:45:54 UTC
It just occurred to me that you might be referring to anaconda's running grub2-install /dev/sda in which case you are correct BUT it is also simply remedied by running grub2-install /dev/sdb yourself.

What does anaconda do for other raid1 implementations?

Comment 4 David Shea 2014-05-06 18:48:17 UTC
(In reply to Gene Czarcinski from comment #2)
> If the btrfs filesystem on the multi-device volume is configured for data
> and metadata being raid1, then you should get the multiple copies as this is
> a filesystem function and not anaconda.

Well, sort-of. There's a chunk of code in pyanaconda/bootloader.py, under the install_targets property of GRUB, that handles installing the stage1 bootloader (the MBR part) to every disk in an array if stage2 is on an array. Right now the check only looks for mdarray types.

Comment 5 David Shea 2014-05-06 18:50:24 UTC
(In reply to Gene Czarcinski from comment #3)
> It just occurred to me that you might be referring to anaconda's running
> grub2-install /dev/sda in which case you are correct BUT it is also simply
> remedied by running grub2-install /dev/sdb yourself.
> 
> What does anaconda do for other raid1 implementations?

Yeah, you got there before I commented. See code referenced in comment 4, basically anaconda will do a separate grub install to the MBR of each drive in the array.

There's a bug in that chunk of code right now related to non-disk stage1 devices (e.g., ppc prepboot), but that shouldn't effect this.

Comment 6 Chris Murphy 2014-05-06 19:33:55 UTC
For reference, the UEFI equivalent are bug 1048999 and bug 1022316.

Comment 7 Gene Czarcinski 2014-05-07 13:17:04 UTC
OK, so you know that anaconda does not update all MBRs in a btrfs multi-device.

1. Have you tried post install running grub2-install /dev/sdb and then removing /dev/sda to see it it works.

2.  I assume that this will only work for data=raid1 metadata=raid1

3. Some of the syslinux bootloader developers have commented that grub2 is playing a bit fast & loose with how they support multi-device btrfs volumes and there may be situations where it will fail.

Comment 8 Gene Czarcinski 2014-05-11 11:48:48 UTC
OK, I ran some tests and this report should be closed as won't fix.

The reason is that it is a moot point.  Grub2 will NOT boot from a multi-device BTRFS unless all of the devices are present and working!

I created a qemu-kvm virtual system with two 10GB disks and formatted the disk under rescue mode mkfs.btrfs -f -d=1 -m=1 /dev/vda2 /dev/vdb2

I then installed with /boot & / & /home on btrfs subvols (I use an updated grubby and patched anaconda to support booting off btrfs).

After the install, I booted up to make sure the system worked and also to run grub2-install /dev/vdb.  Worked fine.  Shut the system down.  Remove Virt disk 1 and reboot: message from grub saying it is missing one or more devices.  Repeat with virt disk 2 and get the same result.

The above test was done with Fedora 20 and rawhide with the same results.

So I am closing this.  BTW, syslinux/extlinux do not support booting of multi-device BTRFS of any flavor although it supposedly will boot off btrfs.

Comment 9 Chris Murphy 2014-05-11 17:54:57 UTC
I'm not experiencing this effect with GRUB 2.00. With either drive disabled I get a grub menu, so clearly core.img and grub.cfg have been found. If drive 2 is missing, it still loads kernel and initramfs. If drive 1 is missing, it fails because the grub.cfg is wrongly setting a fixed grub device hd0,msdos2 which obviously it can't find and for some reason it isn't falling back to a search by uuid even though that code is in the grub.cfg. If I remove the set root, and the if then wrapping the search, and thus force the search by uuid, then it loads the kernel and initramfs. 

So GRUB2 certainly can do this, and should, it's just that grub2-mkconfig is producing a mangled grub.cfg. And that's a new bug that probably needs cross posting in the redhat BZ and savannah.gnu.org and a heads up on grub-devel; and in the redhat bz the new bug blocks field set to this bug ID.

The other issue is that by default right now Btrfs does not do degraded mounts, so at /sysroot mount time it'll fail and drop to a dracut prompt. This is intentional because there are no elegant notifications for device failure yet with Btrfs so we get this crude notification method. Rebooting with rootflags degraded option then permits degraded booting.

Comment 10 Chris Murphy 2014-05-11 19:02:34 UTC
GRUB 2.02 grub.cfg looks the same and has the same set root as GRUB 2.00 grub.cfg, yet it doesn't get stuck. It properly finds the Btrfs volume by UUID without intervention on my part, loads both kernel and initramfs properly.

The subsequent problem is failure at basic.target due to /sysroot not mounting, even when I use rootflags=degraded. But that's a separate problem.

Comment 11 Gene Czarcinski 2014-05-12 06:49:37 UTC
Chris, thanks for the info as this explains a lot for me.  However, I am not sure that btrfs works.

I created a qemu-kvm virtual system with to 10GB disks and installed rawhide on it.  The two disks were partitioned identically with vda1 a ext4 partition for /boot, vda2 a swap partition, and vda3 a btrfs partition for / and /home.

mkfs.btrfs -f -L test -d raid1 -m raid1 /dev/vda3 /dev/vdb3

was used to create the filesystme which was then manually configured during the install.

Rebooted and ran grub2-install /dev/vdb

Then shutdown and remove the second disk followed by a reboot.  From the grub2 menu, edited the linux16 line to add ",degraded" to rootflags.

The attached screenshot shows how far it got.

Comment 12 Gene Czarcinski 2014-05-12 06:51:39 UTC
Created attachment 894560 [details]
shot of hung boot after removing disk 2 of 2-disk btrfs raid1

Comment 13 Chris Murphy 2014-05-12 16:15:52 UTC
(In reply to Gene Czarcinski from comment #12)
> shot of hung boot after removing disk 2 of 2-disk btrfs raid1
Yes, that's not a GRUB or Btrfs problem at all, the kernel and initramfs are clearly loaded. It's either systemd or dracut related.

Both Fedora 20 and Rawhide fail at basic.target to mount /sysroot despite rootflags=degraded. And I know this used to work. And in fact it does work with a non-boot disk, so I know this isn't a Btrfs bug not honoring degraded mounts.

Fedora 20 systemd-209 at least eventually times out and drops to a dracut prompt, which is where I'll try to troubleshoot this failure. Rawhide on the other hand hangs at basic.target indefinitely which I think is a bug and I mentioned this on the systemd list here:
http://lists.freedesktop.org/archives/systemd-devel/2014-May/019086.html

Comment 14 Chris Murphy 2014-05-12 16:26:50 UTC
The bug for the indefinite hang in Rawhide:
https://bugzilla.redhat.com/show_bug.cgi?id=1096910

Comment 15 Gene Czarcinski 2014-10-21 17:24:43 UTC
Thank you David.  I have been doing some testing of raid1 recovery and, while it does work and grub2 handles things just fine, the recovery process is not ready for prime time ... I had to come up in rescue mode to recover the btrfs volume with the missing device.  I was able to fix it but...