Bug 1575797

Summary: grub2 does not support xfs filesystems with sparse inode allocation (causes anaconda crash or installed system boot fail if sparse xfs used for /boot)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: grub2Assignee: Peter Jones <pjones>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, bugzilla, esandeen, fzatlouk, jonathan, kellin, lkundrak, normand, pjones, robatino, vanmeeuwen+fedora, v.podzimek+fedora, vponcova, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: AcceptedBlocker
Fixed In Version: grub2-2.02-37.fc29 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-18 23:44:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1517011    
Attachments:
Description Flags
anaconda.log
none
program.log
none
storage.log
none
syslog
none
tarball of /var/log
none
strace grub2-probe xfsprogs 4.15.1 formatted boot
none
strace grub2-probe xfsprogs 4.16.0 formatted boot
none
strace grub2-probe xfsprogs 4.16.0 formatted boot none

Description Adam Williamson 2018-05-08 00:27:29 UTC
Since compose Fedora-Rawhide-20180427.n.1 , all openQA tests that do partitioning via blivet-gui when installing are failing. The BIOS tests hit an explicit error during install: 'boot loader install failed'. The UEFI tests hit no explicit error during install, but the installed system fails to boot. grub shows an error:

"error: file `/vmlinuz-4.17.0-0.rc3.git4.1.fc29.x86_64' not found.
error: you need to load the kernel first."

I will attach the installer logs from a BIOS case; I don't have the logs from a UEFI case at hand, as openQA doesn't capture them on this particular failure path. Note that these tests run with the Server DVD image; it seems the /boot partition is created as XFS in the BIOS tests. I'm not sure if it was created as XFS prior to the 0427.n.1 compose, or if that's actually the bug.

The last compose where the tests passed was Fedora-Rawhide-20180423.n.0 - there were no successful composes between 20180423.n.0 and 20180427.n.1. anaconda-29.14-1.fc29 was built between 2018-04-23 and 2018-04-27. No grub2, blivet-gui or python-blivet build occurred during that window, so I'm assigning this to anaconda.

Proposing as a Beta blocker, criterion "When using both the installer-native and the blivet-gui-based custom partitioning flow, the installer must be able to:

    Correctly interpret, and modify as described below, any disk with a valid ms-dos or gpt disk label and partition table containing ext4 partitions, LVM and/or btrfs volumes, and/or software RAID arrays at RAID levels 0, 1 and 5 containing ext4 partitions..."

You have to judo the criterion a *bit* to make it actually apply here, but it seems pretty strongly implied by the criterion that, if you do any of the things that custom partitioning is supposed to support, the install should complete and the installed system should actually boot. Which, at present, is not the case.

Comment 1 Adam Williamson 2018-05-08 00:32:12 UTC
Created attachment 1432849 [details]
anaconda.log

Comment 2 Adam Williamson 2018-05-08 00:32:30 UTC
Created attachment 1432850 [details]
program.log

Comment 3 Adam Williamson 2018-05-08 00:32:51 UTC
Created attachment 1432851 [details]
storage.log

Comment 4 Adam Williamson 2018-05-08 00:33:07 UTC
Created attachment 1432852 [details]
syslog

Comment 5 Adam Williamson 2018-05-08 00:33:33 UTC
Created attachment 1432853 [details]
tarball of /var/log

Comment 6 Adam Williamson 2018-05-08 00:35:25 UTC
I suspect this is probably down to the introduction of the autopartition module in 29.14, some bad/missing interaction between that and blivet-gui partitioning?

Comment 7 Michel Normand 2018-05-09 11:00:45 UTC
FYIO, seems same problem for ppc64/ppc64le architectures as per https://openqa.stg.fedoraproject.org/tests/300539#step/_do_install_and_reboot/54

Comment 8 Chris Murphy 2018-05-14 02:48:36 UTC
syslog

20:08:44,933 WARNING kernel:DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
20:08:44,933 WARNING kernel:WARNING: CPU: 0 PID: 8473 at kernel/locking/rwsem.c:133 up_write+0x72/0x80
[...snip...]
20:08:44,934 WARNING kernel: thaw_super_locked+0xd0/0x100
20:08:44,934 WARNING kernel: thaw_super+0x1a/0x20

And anaconda program.log

16:10:02,595 INF program: Running... xfs_freeze -f /mnt/sysimage/boot
16:10:02,622 INF program: stderr:
16:10:02,623 INF program: b'/mnt/sysimage/boot: No such file or directory'
16:10:02,623 DBG program: Return code: 1

Ignoring the hours and trusting the minutes+seconds, I'm confused how the first (if it's really the first) xfs_freeze -f happens after the kernel warning and call trace that includes thaw_super_locked.

Anyway, grub2-install and mkconfig are failing because it doesn't know what filesystem is at /mnt/sysimage/boot, and xfs_freeze is failing because basically the same thing. So I'm suspicious of the kernel warning.

Adding esandeen.

Comment 9 Chris Murphy 2018-05-14 04:01:21 UTC
Dracut does its own fsfreeze for each initramfs built, and that's what instigates the kernel warning. But I'm unconvinced the warning is related, even though there are no trace points for freeze/thaw to know if either succeeds, let alone relative to anything else.

Inside the install environment, with the file system for sure thawed, and chrooted, I can reproduce the problem:

# grub2-probe /
btrfs
# grub2-probe /boot
grub2-probe: error: unknown filesystem.

Seems to be a grub2-probe and XFS issue.

grub2-2.02-33.fc29   2018-04-11  well precedes the problem

xfsprogs-4.16.0-1.fc29   2018-04-26  right in the middle
xfsprogs-4.15.1-1.fc29   2018-02-27  well before

It might be that xfsprogs-4.16 has changed something that grub2-probe doesn't like.

Comment 10 Chris Murphy 2018-05-14 04:43:41 UTC
grub2-probe recognizes xfs when formatted using xfsprogs-4.15.1, but not 4.16.0.

I don't know how to determine whether grub2-probe's parsing is possibly flawed; or if the format change that's happened with 4.16.0 is unintended. Setting this to xfsprogs for now.

Comment 11 Chris Murphy 2018-05-14 04:44:27 UTC
Created attachment 1436007 [details]
strace grub2-probe xfsprogs 4.15.1 formatted boot

Comment 12 Chris Murphy 2018-05-14 04:44:44 UTC
Created attachment 1436008 [details]
strace grub2-probe xfsprogs 4.16.0 formatted boot

Comment 13 Chris Murphy 2018-05-14 04:51:06 UTC
Created attachment 1436009 [details]
strace grub2-probe xfsprogs 4.16.0 formatted boot

cleanup

Comment 14 Chris Murphy 2018-05-14 17:19:33 UTC
For what it's worth, this isn't just a problem with grub-probe. The GRUB core.img (whether the embedded BIOS variety, or the grubx64.efi UEFI variety) doesn't read xfsprogs 4.16.0 formatted volumes, but can read 4.15.1 formatted volumes. A format change in xfsprogs 4.16.0 from the perspective of GRUB explains both the anaconda behaviors on BIOS and UEFI systems, and also the post install behaviors for both.

Comment 15 Chris Murphy 2018-05-14 17:38:42 UTC
Upstream report
https://www.spinics.net/lists/linux-xfs/msg18438.html

I think it's the sparse inode feature, which is now the default in 4.16.0
https://www.spinics.net/lists/linux-xfs/msg16598.html

If I reformat using xfsprogs 4.15.1, mkfs.xfs -i sparse=1, I get the same problem as I do with xfsprogs 4.16.0 default mkfs. And if I use xfsprogs 4.16.0, mkfs.xfs -i sparse=0, the problem doesn't happen. So yeah, I bet GRUB has no idea how to deal with sparse inode chunk allocation. Open question if syslinux will be affected.

Comment 16 Eric Sandeen 2018-05-14 17:53:00 UTC
Yes, this is a new on-disk format which was enabled in xfsprogs-4.16

If grub wants to read & parse on disk formats unfortunately it needs to keep up with these defaults.  That said I suppose we should have coordinated better.

I have half a mind to just propose ext2 for /boot - it would eliminate all the log replay questions as well, and we'd never run into an incompatible disk format again.

On the other hand, we don't require a separate /boot today so ext2 may not a always be a choice - so grub2 should probably learn about the new format in any case....

Comment 17 Eric Sandeen 2018-05-14 17:54:11 UTC
reassigning to grub2, this is not a bug in xfsprogs.

Comment 18 Chris Murphy 2018-05-14 19:01:39 UTC
I'll ping upstream GRUB and see if they can estimate a time frame. But in the meantime I think we should ask anaconda team to proscribe /boot on XFS which is a much simpler change than asking them to enable '-i sparse=0' as they're reluctant to use non-default mkfs options.

Comment 19 Adam Williamson 2018-05-14 19:38:58 UTC
That won't help all the existing systems which have /boot-on-xfs, presumably, i.e. all Fedora Server installs for several releases. I'm not sure it makes much sense to ask anaconda to make a change which will probably be inadequate for final release in any case. I'd say the obviously preferable fix is to fix grub, if we want a short term workaround I'd be more inclined to downgrade xfsprogs than change anaconda. That is in line with the general spirit of what we are trying to do in Fedora these days: not land changes until they don't break stuff. If we had Rawhide gating enabled, the xfsprogs change would have been rejected by it.

Comment 20 Chris Murphy 2018-05-14 20:34:45 UTC
a. Why do existing /boot on XFS need help? They won't have the feature enabled and are unaffected.
b. Fedora 28 Server and older default to /boot on ext4, and / on XFS.

Obviously fixing GRUB is preferred but I can't estimate the scope of work or the time frame. Whereas anaconda (blivet probably) has code to proscribe filesystems for /boot that has variably been used for Btrfs and LVM and it would be a one line change with already tested code. And then we don't have to withhold the sparse inode feature on / which helps to avoid bogus ENOSPC,  however much of an edge case that may be.

Eric would have to speak whether reflinking and COW increases the likelihood of higher free space fragmentation, or if there are other xfsprogs 4.16.x features that shouldn't be held back just to support /boot on XFS.

Comment 21 Adam Williamson 2018-05-14 20:42:31 UTC
Ah, so this only applies to *freshly created* xfs filesystems. OK. That makes disallowing it in anaconda more viable, sure.

If F28 and earlier used ext4 for /boot , why isn't Rawhide doing that? Was it an intentional change?

Comment 22 Chris Murphy 2018-05-14 21:14:19 UTC
Correct.

I don't know. I get nothing for these:
$ git log anaconda-28.9-1..HEAD | grep -i xfs
$ git log blivet-3.0.0..HEAD | grep -i xfs

Comment 23 Adam Williamson 2018-05-14 22:11:48 UTC
I'll have a poke around, then. Ultimately if we still *intend* to use ext4 /boot for Server, that should be the first thing to fix, and it'd render this significantly less important.

Comment 24 FrantiĊĦek Zatloukal 2018-05-14 22:38:13 UTC
Discussed during the 2018-05-14 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made as it violates the following blocker criteria:

"Custom partitioning criteria, which clearly imply that a system installed with custom partitioning must actually boot."

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-05-14/f29-blocker-review.2018-05-14-16.01.log.txt

Comment 25 Adam Williamson 2018-05-14 23:25:48 UTC
Well, duh. I'm being a bit silly. There isn't really a difference here - I forgot the bug was specific to using the blivet-gui custom partitioning. blivet-gui doesn't handle /boot specially like guided partitioning or 'regular' custom partitioning do; it just has a universal 'default' filesystem type which is pre-selected in the filesystem drop-down for each newly created filesystem. anaconda passes blivet-gui the default filesystem it would use itself, so for most Fedora installs, the default is set to ext4, for Fedora Server installs, it's set to xfs. None of this is different between F28 and Rawhide.

The openQA blivet-gui tests use the 'default' filesystem for the /boot partition, so when testing a Server image, /boot will wind up as XFS in the openQA tests. https://openqa.fedoraproject.org/tests/229766#step/disk_custom_blivet_ext3/9 shows that this was actually the same in F28 Final tests, you can see it's about to create /boot as xfs there (it's about to click on the OK button and the mount point is set to /boot and the Filesystem is set to xfs).

So in fact there was no change of behaviour in anaconda here, but we can clearly say the impact of the bug is restricted to cases where the user uses either 'regular' custom partitioning or blivet-gui custom partitioning and creates /boot as xfs, and shouldn't affect any upgraded systems.

We can also say that if the user uses regular custom partitioning and clicks the 'create partitions for me' button, /boot will be created as ext4, not xfs. If the user uses regular custom partitioning *without* using the 'create partitions for me' button, the way that UI works, the user is still guided to using ext4 for /boot (in regular custom partitioning, when you first create a new mount point, you don't specify a filesystem, one is selected for you, and ext4 is what will be selected for /boot; you have to then *change* this setting if you want it to be something else).

Another consideration, though: for RHEL, xfs is the *default* filesystem for /boot . So we definitely shouldn't put xfsprogs 4.16 into anything RHEL-y until this is sorted out :)

I'm experimenting with a proposal to handle this: a patch for blivet which makes it disable sparse inodes for XFS filesystems if the mount point is /boot. I'll see if I can make that fly.

Comment 26 Adam Williamson 2018-05-15 00:12:51 UTC
https://github.com/storaged-project/blivet/pull/693

works for me in basic testing, at least. https://www.happyassassin.net/updates/1575797.2.img is an updates image with that patch applied, for testing.

Comment 27 Chris Murphy 2018-05-15 22:21:41 UTC
Question. What if the user wants / on XFS, but does not specify a separate /boot volume? Does the patch still use '-i sparse=0' or do we still blowup?

Question for Eric: how badly do you want clean installs going forward to have sparse inodes enabled for root fs? Is it better to proscribe /boot on XFS in order to ensure / has sparse inodes enabled? Or is this a coin toss situation?

I just saw feedback from GRUB upstream and they don't have time to fix it now, but gave no time frame on when they'd get around to it.

Comment 28 Adam Williamson 2018-05-16 00:47:13 UTC
"Question. What if the user wants / on XFS, but does not specify a separate /boot volume? Does the patch still use '-i sparse=0' or do we still blowup?"

This would blow up. It wouldn't really be possible to extend this fix to handle that case, as this is operating at the point when a single filesystem is to be created, with knowledge about that filesystem - including its mount type - but not about the overall operation that's going on or any *other* filesystems.

Fixing that case in this general way ('make blivet pass -i sparse=0 when we know it needs to') would, I think, require a bit more heavy lifting, approximately speaking it'd have to be done with the actual logic on the anaconda side and anaconda passing the arguments into blivet.

Comment 29 Adam Williamson 2018-05-16 00:47:54 UTC
er, "including its mount type" was meant to read "including its type and mount point".

Comment 30 Eric Sandeen 2018-05-16 14:47:56 UTC
I've sent a patch to fix grub upstream, the best resolution would be to simply include it and not need to change anything else.  This was a shortcoming of grub which has now been fixed.

https://lists.gnu.org/archive/html/grub-devel/2018-05/msg00057.html

Updating grub code to handle the new format would be a far better solution than hacking up installer bits and pieces.

Thanks,
-Eric

Comment 31 Adam Williamson 2018-05-16 16:08:02 UTC
Eric: sure, we were only considering workarounds for the case where grub couldn't be fixed promptly. Thanks for the fix!

Comment 32 Adam Williamson 2018-05-18 23:44:39 UTC
openQA results for the latest compose confirm this is fixed, great.