2296171 – XFS mount failure (timeout) after 6.9 upgrade

Bug 2296171 - XFS mount failure (timeout) after 6.9 upgrade

Summary: XFS mount failure (timeout) after 6.9 upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	39
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-07-07 10:41 UTC by David Woodhouse
Modified:	2024-09-10 02:53 UTC (History)
CC List:	18 users (show)
Fixed In Version:	kernel-6.10.8-200.fc40 kernel-6.10.8-100.fc39
Clone Of:
Environment:
Last Closed:	2024-09-10 02:09:53 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Serial console output booting successfully with 6.8.10-200.fc39.x86_64 (137.72 KB, text/plain) 2024-07-07 10:41 UTC, David Woodhouse	no flags	Details
Serial console output for failed boot with 6.9.7-100.fc39.x86_64 (including SysRq-T) (311.76 KB, text/plain) 2024-07-07 10:43 UTC, David Woodhouse	no flags	Details
Serial console output for failed boot with rawhide 6.10.0-0.rc6.20240703gite9d22f7a6655.53.fc41 kernel (132.03 KB, text/plain) 2024-07-07 10:56 UTC, David Woodhouse	no flags	Details
View All

Description David Woodhouse 2024-07-07 10:41:08 UTC

Created attachment 2039189 [details]
Serial console output booting successfully with 6.8.10-200.fc39.x86_64

1. Please describe the problem:

After a kernel upgrade, my system no longer boots:

[  125.404282] XFS (md127): Ending clean mount
[  *** ] Job sysroot.mount/start running (2min / 2min 17s)
[  125.670498] XFS (md127): Unmounting Filesystem 1780b7fe-4500-4d0a-a3b3-fc3849983039
[FAILED] Failed to mount sysroot.mount - /sysroot.
See 'systemctl status sysroot.mount' for details.

...

/root# systemctl status sysroot.mount | cat
^M× sysroot.mount - /sysroot
     Loaded: loaded (/proc/cmdline; generated)
     Active: failed (Result: timeout) since Sun 2024-07-07 10:32:18 UTC; 44s ago
      Where: /sysroot
       What: /dev/disk/by-uuid/1780b7fe-4500-4d0a-a3b3-fc3849983039
       Docs: man:fstab(5)
             man:systemd-fstab-generator(8)
        CPU: 1.448s

Jul 07 10:30:19 desiato.infradead.org systemd[1]: Mounting sysroot.mount - /sysroot...
Jul 07 10:31:04 desiato.infradead.org systemd[1]: sysroot.mount: Mounting timed out. Terminating.
Jul 07 10:31:49 desiato.infradead.org systemd[1]: sysroot.mount: Mount process timed out. Killing.
Jul 07 10:31:49 desiato.infradead.org systemd[1]: sysroot.mount: Killing process 744 (mount) with signal SIGKILL.
Jul 07 10:32:18 desiato.infradead.org systemd[1]: sysroot.mount: Mount process exited, code=killed, status=15/TERM
Jul 07 10:32:18 desiato.infradead.org systemd[1]: sysroot.mount: Failed with result 'timeout'.
Jul 07 10:32:18 desiato.infradead.org systemd[1]: Failed to mount sysroot.mount - /sysroot.
Jul 07 10:32:18 desiato.infradead.org systemd[1]: sysroot.mount: Consumed 1.448s CPU time.


2. What is the Version-Release number of the kernel:

6.9.7-100.fc39.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes. 6.9.4-100.fc39.x86_64 was the first to fail, 6.8.10-200.fc39 is the latest I have working.

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Boot any 6.9 kernel.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 David Woodhouse 2024-07-07 10:43:06 UTC

Created attachment 2039190 [details]
Serial console output for failed boot with 6.9.7-100.fc39.x86_64 (including SysRq-T)

[   47.560567] task:mount           state:D stack:0     pid:744   tgid:744   ppid:1      flags:0x00004002
[   47.560567] Call Trace:
[   47.592564]  <TASK>
[   47.592564]  __schedule+0x410/0x1710
[   47.592564]  schedule+0x27/0xf0
[   47.592564]  schedule_timeout+0x15d/0x170
[   47.592564]  __down_common+0x119/0x220
[   47.592564]  down+0x47/0x60
[   47.592564]  xfs_buf_lock+0x31/0xe0 [xfs]
[   47.592564]  xfs_buf_find_lock+0x5f/0x190 [xfs]
[   47.592564]  xfs_buf_get_map+0x1e5/0xb30 [xfs]
[   47.624590]  ? blk_finish_plug+0x28/0x40
[   47.624590]  xfs_buf_read_map+0x62/0x310 [xfs]
[   47.624590]  ? xfs_btree_read_buf_block+0xa7/0x120 [xfs]
[   47.637593]  xfs_trans_read_buf_map+0x12e/0x310 [xfs]
[   47.637593]  ? xfs_btree_read_buf_block+0xa7/0x120 [xfs]
[   47.637593]  ? __xfs_buf_submit+0xa7/0x1e0 [xfs]
[   47.637593]  xfs_btree_read_buf_block+0xa7/0x120 [xfs]
[   47.637593]  xfs_btree_lookup_get_block+0xa6/0x1f0 [xfs]
[   47.637593]  ? __pfx_xfs_btree_count_blocks_helper+0x10/0x10 [xfs]
[   47.670917]  xfs_btree_visit_block+0x101/0x170 [xfs]
[   47.670917]  ? __pfx_xfs_btree_count_blocks_helper+0x10/0x10 [xfs]
[   47.670917]  xfs_btree_visit_blocks+0xd4/0x150 [xfs]
[   47.670917]  xfs_finobt_calc_reserves+0xbe/0x170 [xfs]
[   47.670917]  ? xfs_refcountbt_calc_reserves+0xf6/0x120 [xfs]
[   47.670917]  xfs_ag_resv_init+0xb7/0x260 [xfs]
[   47.702570]  xfs_fs_reserve_ag_blocks+0x3d/0xf0 [xfs]
[   47.702570]  xfs_mountfs+0x63e/0xa50 [xfs]
[   47.702570]  ? __pfx_xfs_fstrm_free_func+0x10/0x10 [xfs]
[   47.702570]  xfs_fs_fill_super+0x6ae/0x970 [xfs]
[   47.702570]  ? __pfx_xfs_fs_fill_super+0x10/0x10 [xfs]
[   47.702570]  get_tree_bdev+0x123/0x1c0
[   47.702570]  vfs_get_tree+0x26/0xf0
[   47.734776]  vfs_cmd_create+0x65/0xf0
[   47.734776]  ? __do_sys_fsconfig+0x24d/0x620
[   47.734776]  __do_sys_fsconfig+0x43c/0x620
[   47.734776]  do_syscall_64+0x82/0x170
[   47.734776]  ? __do_sys_prctl+0x43/0xa10
[   47.734776]  ? syscall_exit_to_user_mode+0x75/0x230
[   47.734776]  ? do_syscall_64+0x8f/0x170
[   47.734776]  ? syscall_exit_to_user_mode+0x75/0x230
[   47.768591]  ? do_syscall_64+0x8f/0x170
[   47.768591]  ? do_user_addr_fault+0x30f/0x6c0
[   47.768591]  ? exc_page_fault+0x7e/0x180
[   47.781597]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   47.781597] RIP: 0033:0x7f1410dbbaae
[   47.781597] RSP: 002b:00007ffe895f8258 EFLAGS: 00000246 ORIG_RAX: 00000000000001af
[   47.781597] RAX: ffffffffffffffda RBX: 0000563a84497e30 RCX: 00007f1410dbbaae
[   47.804594] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
[   47.804594] RBP: 00007ffe895f83a0 R08: 0000000000000000 R09: 00007f1410e7db20
[   47.804594] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1410f02b60
[   47.804594] R13: 0000000000000000 R14: 0000563a8449f500 R15: 0000000000000065
[   47.804594]  </TASK>

Comment 2 David Woodhouse 2024-07-07 10:44:00 UTC

[root@desiato ~]# lsblk
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda         8:0    0   7.3T  0 disk  
├─sda1      8:1    0   512M  0 part  
│ └─md126   9:126  0 511.9M  0 raid1 /boot/efi
├─sda2      8:2    0     2G  0 part  [SWAP]
└─sda3      8:3    0   7.3T  0 part  
  └─md127   9:127  0  29.1T  0 raid5 /
sdb         8:16   0   7.3T  0 disk  
├─sdb1      8:17   0   512M  0 part  
│ └─md126   9:126  0 511.9M  0 raid1 /boot/efi
├─sdb2      8:18   0     2G  0 part  [SWAP]
└─sdb3      8:19   0   7.3T  0 part  
  └─md127   9:127  0  29.1T  0 raid5 /
sdc         8:32   0   7.3T  0 disk  
├─sdc1      8:33   0   512M  0 part  
│ └─md126   9:126  0 511.9M  0 raid1 /boot/efi
├─sdc2      8:34   0     2G  0 part  [SWAP]
└─sdc3      8:35   0   7.3T  0 part  
  └─md127   9:127  0  29.1T  0 raid5 /
sdd         8:48   0   7.3T  0 disk  
├─sdd1      8:49   0   512M  0 part  
│ └─md126   9:126  0 511.9M  0 raid1 /boot/efi
├─sdd2      8:50   0     2G  0 part  [SWAP]
└─sdd3      8:51   0   7.3T  0 part  
  └─md127   9:127  0  29.1T  0 raid5 /
sde         8:64   0   7.3T  0 disk  
├─sde1      8:65   0   512M  0 part  
│ └─md126   9:126  0 511.9M  0 raid1 /boot/efi
├─sde2      8:66   0     2G  0 part  [SWAP]
└─sde3      8:67   0   7.3T  0 part  
  └─md127   9:127  0  29.1T  0 raid5 /
[root@desiato ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] [raid1] 
md126 : active raid1 sda1[0] sde1[7] sdd1[8] sdc1[6] sdb1[5]
      524224 blocks super 1.0 [5/5] [UUUUU]
      
md127 : active raid5 sda3[7] sdc3[8] sde3[5] sdd3[9] sdb3[6]
      31245086720 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      bitmap: 7/59 pages [28KB], 65536KB chunk

unused devices: <none>

Comment 3 David Woodhouse 2024-07-07 10:56:21 UTC

Created attachment 2039191 [details]
Serial console output for failed boot with rawhide 6.10.0-0.rc6.20240703gite9d22f7a6655.53.fc41 kernel

5. Does this problem occur with the latest Rawhide kernel? 

Yes.

Comment 4 Anders Blomdell 2024-08-09 16:49:35 UTC

I have just bisected a similar (same?) problem, hera are my findings:

With a filesystem that contains a very large amount of hardlinks
the time to mount the filesystem skyrockets to around 15 minutes
on 6.9.11-200.fc40.x86_64 as compared to around 1 second on  
6.8.10-300.fc40.x86_64, this of course makes booting drop
into emergency mode if the filesystem is in /etc/fstab. A git bisect
nails the offending commit as 14dd46cf31f4aaffcf26b00de9af39d01ec8d547.

The filesystem is a collection of daily snapshots of a live filesystem
collected over a number of years, organized as a storage of unique files,
that are reflinked to inodes that contain the actual {owner,group,permission,
mtime}, and these inodes are hardlinked into the daily snapshot trees.

The numbers for the filesystem are:

  Total file size:           3.6e+12 bytes
  Unique files:             12.4e+06
  Reflink inodes:           18.6e+06
  Hardlinks:                15.7e+09
  
Timing between the systems are:

  6.8.10-300.fc40.x86_64:

    # time mount /dev/vg1/test /test
    real	0m0.835s
    user	0m0.002s
    sys	        0m0.014s

  6.9.11-200.fc40.x86_64:

    # time mount /dev/vg1/test /test
    real	15m36.508s
    user	0m0.000s
    sys	        0m27.628s (iotop reports 1-4 MB/s mostly in two threads)

Comment 5 Anders Blomdell 2024-08-16 09:55:35 UTC

Probably thi patch fixes the problem:

https://lore.kernel.org/lkml/ZrslIPV6%2Fqk6cLVy@dread.disaster.area/

Dave Chinner
david

xfs: xfs_finobt_count_blocks() walks the wrong btree

From: Dave Chinner <dchinner>

As a result of the factoring in commit 14dd46cf31f4 ("xfs: split
xfs_inobt_init_cursor"), mount started taking a long time on a
user's filesystem.  For Anders, this made mount times regress from
under a second to over 15 minutes for a filesystem with only 30
million inodes in it.

Anders bisected it down to the above commit, but even then the bug
was not obvious. In this commit, over 20 calls to
xfs_inobt_init_cursor() were modified, and some we modified to call
a new function named xfs_finobt_init_cursor().

If that takes you a moment to reread those function names to see
what the rename was, then you have realised why this bug wasn't
spotted during review. And it wasn't spotted on inspection even
after the bisect pointed at this commit - a single missing "f" isn't
the easiest thing for a human eye to notice....

The result is that xfs_finobt_count_blocks() now incorrectly calls
xfs_inobt_init_cursor() so it is now walking the inobt instead of
the finobt. Hence when there are lots of allocated inodes in a
filesystem, mount takes a -long- time run because it now walks a
massive allocated inode btrees instead of the small, nearly empty
free inode btrees. It also means all the finobt space reservations
are wrong, so mount could potentially given ENOSPC on kernel
upgrade.

In hindsight, commit 14dd46cf31f4 should have been two commits - the
first to convert the finobt callers to the new API, the second to
modify the xfs_inobt_init_cursor() API for the inobt callers. That
would have made the bug very obvious during review.

Fixes: 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor")
Reported-by: Anders Blomdell <anders.blomdell>
Signed-off-by: Dave Chinner <dchinner>
---
 fs/xfs/libxfs/xfs_ialloc_btree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 496e2f72a85b..797d5b5f7b72 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -749,7 +749,7 @@ xfs_finobt_count_blocks(
 	if (error)
 		return error;
 
-	cur = xfs_inobt_init_cursor(pag, tp, agbp);
+	cur = xfs_finobt_init_cursor(pag, tp, agbp);
 	error = xfs_btree_count_blocks(cur, tree_blocks);
 	xfs_btree_del_cursor(cur, error);
 	xfs_trans_brelse(tp, agbp);

Comment 6 David Woodhouse 2024-09-04 07:47:44 UTC

Thanks. This still isn't in the latest -stable kernels.

Comment 7 Anders Blomdell 2024-09-04 10:36:09 UTC

Notified -stable + GKH 2024-09-02, so will hopefully arrive in the stable tree soon (6.10):

https://lore.kernel.org/stable/348d0bbc-050e-4d3d-9956-2dc8888188f1@gmail.com/T/#t

Comment 8 Fedora Update System 2024-09-05 02:19:36 UTC

FEDORA-2024-c4b4460831 (kernel-6.10.8-200.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-c4b4460831

Comment 9 Fedora Update System 2024-09-05 02:19:38 UTC

FEDORA-2024-c2e066226b (kernel-6.10.8-100.fc39) has been submitted as an update to Fedora 39.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-c2e066226b

Comment 10 Anders Blomdell 2024-09-05 09:13:56 UTC

Patch still not applied in 6.10.8, so bug is still present.

Comment 11 Christopher Klooz 2024-09-05 12:19:21 UTC

> Patch still not applied in 6.10.8, so bug is still present.

It was applied to the Fedora build of 6.10.8: https://koji.fedoraproject.org/koji/buildinfo?buildID=2541219 (F40)

See its changelog:
```
* Wed Sep 04 2024 Augusto Caringi <acaringi> [6.10.8-0]
- Add to BugsFixed (Augusto Caringi)
- xfs: xfs_finobt_count_blocks() walks the wrong btree (Dave Chinner)
- Linux v6.10.8
...
...
...
```

-> it is "xfs: xfs_finobt_count_blocks() walks the wrong btree (Dave Chinner)"

It is also in the F39 build: https://koji.fedoraproject.org/koji/buildinfo?buildID=2541217

Comment 12 Anders Blomdell 2024-09-05 15:12:26 UTC

Thanks!

 https://koji.fedoraproject.org/koji/buildinfo?buildID=2541219 works!

Comment 13 Fedora Update System 2024-09-06 05:34:50 UTC

FEDORA-2024-c4b4460831 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-c4b4460831`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-c4b4460831

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 14 Fedora Update System 2024-09-06 08:54:38 UTC

FEDORA-2024-c2e066226b has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-c2e066226b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-c2e066226b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 15 Fedora Update System 2024-09-10 02:09:53 UTC

FEDORA-2024-c4b4460831 (kernel-6.10.8-200.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 16 Fedora Update System 2024-09-10 02:53:45 UTC

FEDORA-2024-c2e066226b (kernel-6.10.8-100.fc39) has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.