Bug 1762596 - XFS mount failure
Summary: XFS mount failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-17 02:05 UTC by Tad
Modified: 2020-03-04 17:15 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-04 17:15:21 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tad 2019-10-17 02:05:51 UTC
Description of problem:
Recently release kernel 5.3.5 on Fedora 30 and images of Fedora 31 fail to mount XFS partitions.
If the partition is critical (/, /home) an emergency shell is entered.
No out-of-tree kernel modules are in use.

Version-Release number of selected component (if applicable):
kernel 5.3.5-200 in Fedora 30
and kernel 5.3.4 in Fedora 31

How reproducible:
Every bootup.
Reproduced on four different systems (two F30, and two F31).

Steps to Reproduce:
1. Add an XFS partition to /etc/fstab
2. Reboot
or
1. Upgrade an F30 to F31 with an XFS partition already added/in use
2. Reboot

Actual results:
- If / or /home are XFS you are dropped to an emergency shell
- If its another partition it ends up not mounted

Expected results:
All requested partitions mount successfully

Additional info:
Relevant dmesg:
XFS (dm-4): Mounting V5 Filesystem
XFS (dm-4): log recovery read I/O error at daddr 0x0 len 1 error -5
XFS (dm-4): empty log check failed
XFS (dm-4): log mount/recovery failed: error -5
XFS (dm-4): log mount failed

The partitions don't get corrupted and mount correctly when booted using an older kernel (eg. 5.2.18).

Comment 2 Bill O'Donnell 2019-11-04 14:17:31 UTC
(In reply to tad from comment #1)
> Possible upstream fix
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?h=v5.4-rc3&id=3219e8cf0dade9884d3c6cb432d433b4ca56875d

I'm not sure how that patch is related. It deals with memory to disk leakage. Did you try it and does it affect the mount issue?

Comment 3 Eric Sandeen 2019-11-04 16:45:27 UTC
(In reply to tad from comment #0)

> Relevant dmesg:
> XFS (dm-4): Mounting V5 Filesystem
> XFS (dm-4): log recovery read I/O error at daddr 0x0 len 1 error -5
> XFS (dm-4): empty log check failed
> XFS (dm-4): log mount/recovery failed: error -5
> XFS (dm-4): log mount failed
> 
> The partitions don't get corrupted and mount correctly when booted using an
> older kernel (eg. 5.2.18).

Hi tad -

Can you please provide xfs_info output for the mountpoints under 5.2.18 when it successfully mounts?
And your fstab entries as well?
Also, what type of LVM device is dm-4, is it anything more interesting than a simple linear volume?

Can you spot-check the fedora 5.3.0 kernel to be sure this wasn't a -stable updates regression?

Comment 4 Tad 2019-11-04 17:38:00 UTC
I linked that commit because this only happens with slub_debug on.
Removing slub_debug=FZP from kernel command line allows XFS to mount successfully.
Seems I managed to not mention that in comment #1, apologies.

kernel-5.2.18.fc30
	mounts successfully with slub_debug
kernel-5.3.7-200.fc30
	mounts successfully without slub_debug
	fails to mount with slub_debug
kernel-5.4.0-0.rc5.git0.1.fc32
	mounts successfully with slub_debug

That patch doesn't apply successfully to 5.3.8 so I haven't tested it to be sure it is the fix.

I don't think it is anything specific to the drive/partiton, as I was able to reproduce it on multiple systems.
But it is just plain XFS under LUKS and fstab is just defaults.

Comment 5 Bill O'Donnell 2019-11-04 18:09:14 UTC
Hello tad, 
These are the more likely candidates from upstream to facilitate a 
guarantee for at least 512 byte alignment of buffers for IO, even 
if memory debugging options are turned on.

f8f9ee479439c xfs: add kmem_alloc_io()
d916275aa4ddf xfs: get allocation alignment from the buftarg
0ad95687c3adb xfs: add kmem allocation trace points

Could you try these?

Thanks-
Bill

Comment 6 Bill O'Donnell 2019-11-04 18:13:23 UTC
sorry, also... 
72945d86ddec1 xfs: make mem_to_page available outside of xfs_buf.c

Comment 7 Tad 2019-11-04 22:21:46 UTC
Awesome

with 5.3.8 f30 branch:
72945d86ddec1 is already applied
0ad95687c3adb had minor rejects in fs/xfs/kmem.c
d916275aa4ddf
f8f9ee479439c
3219e8cf0dade apply successfully

Compiles, and mounts with slub_debug successfully! Thanks!

$ sudo dmesg | grep -i xfs
[   26.958645] SGI XFS with ACLs, security attributes, scrub, no debug enabled
[   26.963606] XFS (sdc1): Mounting V5 Filesystem
[   27.109630] XFS (sdc1): Ending clean mount
[   34.131877] XFS (dm-4): Mounting V5 Filesystem
[   34.335554] XFS (dm-4): Ending clean mount
[   39.350164] XFS (dm-5): Mounting V5 Filesystem
[   39.490198] XFS (dm-5): Ending clean mount
[   69.608252] XFS (dm-6): Mounting V5 Filesystem
[   69.757522] XFS (dm-6): Ending clean mount
$ cat /proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.8-201.local.fc30.x86_64 [...] slub_debug=FZP

Comment 8 Eric Sandeen 2019-11-05 00:25:42 UTC
Thanks for testing.

I think LUKS started requiring the aligned IOs that this patchset fixed.

Might want to pull these into the fedora kernel and I suppose we should propose these for -stable kernels.

-eric

Comment 9 Justin M. Forbes 2020-03-03 16:32:36 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs.

Fedora 30 has now been rebased to 5.5.7-100.fc30.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31.

If you experience different issues, please open a new bug report for those.

Comment 10 Tad 2020-03-03 22:24:40 UTC
This was fixed in 5.4 and has been available for a while now.
Thanks

Comment 11 Justin M. Forbes 2020-03-04 17:15:21 UTC
Thanks for the update


Note You need to log in before you can comment on or make changes to this bug.