Bug 2394998

Summary: during btrfs scrub, Freezing user space processes failed
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: kernelAssignee: fedora-kernel-btrfs
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 43CC: acaringi, adscvr, airlied, daan.j.demeyer, dtardon, fedoraproject, hans, hdegoede, hpa, josef, kernel-maint, linville, lnykryn, masami256, mchehab, msekleta, ptalbert, steved, suraj.ghimire7, systemd-maint, yuwatana, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full dmesg none

Description Chris Murphy 2025-09-13 21:54:16 UTC
Created attachment 2106546 [details]
full dmesg

Created attachment 2106546 [details]
full dmesg

Created attachment 2106546 [details]
full dmesg

6.17.0-0.rc5.42.fc43.x86_64
btrfs-progs-6.16-1.fc42.x86_64

[ 8088.052124] kernel: BTRFS info (device dm-1): scrub: started on devid 1
[ 9662.647055] kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 9662.689046] kernel: Lockdown: systemd-logind: hibernation is restricted; see man kernel_lockdown.7
[ 9662.793052] kernel: wlp0s20f3: deauthenticating from a4:22:49:b2:cb:a6 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 9727.984200] kernel: PM: suspend entry (deep)
[ 9727.991082] kernel: Filesystems sync: 0.007 seconds
[ 9748.172951] kernel: Freezing user space processes
[ 9748.173350] kernel: Freezing user space processes failed after 20.001 seconds (1 tasks refusing to freeze, wq_busy=0):
[ 9748.173520] kernel: task:btrfs           state:D stack:0     pid:15156 tgid:15155 ppid:4043   task_flags:0x440140 flags:0x00004006
[ 9748.173653] kernel: Call Trace:
[ 9748.173768] kernel:  <TASK>
[ 9748.173884] kernel:  __schedule+0x2f9/0x7b0
[ 9748.174026] kernel:  schedule+0x27/0x80
[ 9748.174166] kernel:  io_schedule+0x46/0x70
[ 9748.174295] kernel:  blk_mq_get_tag+0x11d/0x2d0
[ 9748.174444] kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
[ 9748.174545] kernel:  __blk_mq_alloc_requests+0xb0/0x2b0
[ 9748.174651] kernel:  blk_mq_submit_bio+0x2c3/0x890
[ 9748.174764] kernel:  __submit_bio+0x74/0x280
[ 9748.174855] kernel:  __submit_bio_noacct+0x90/0x210
[ 9748.174925] kernel:  btrfs_submit_chunk+0x1a2/0x6c0
[ 9748.175027] kernel:  ? __pfx_scrub_read_endio+0x10/0x10
[ 9748.175118] kernel:  btrfs_submit_bbio+0x1a/0x30
[ 9748.175184] kernel:  submit_initial_group_read+0x8a/0x1d0
[ 9748.175264] kernel:  scrub_simple_mirror+0x26f/0x310
[ 9748.175372] kernel:  scrub_stripe+0x512/0x7a0
[ 9748.175445] kernel:  scrub_chunk+0xd0/0x170
[ 9748.175508] kernel:  scrub_enumerate_chunks+0x319/0x710
[ 9748.175571] kernel:  btrfs_scrub_dev+0x225/0x660
[ 9748.175641] kernel:  btrfs_ioctl+0xe77/0x15d0
[ 9748.175710] kernel:  __x64_sys_ioctl+0x94/0xe0
[ 9748.175779] kernel:  do_syscall_64+0x82/0x2c0
[ 9748.175848] kernel:  ? __lruvec_stat_mod_folio+0x85/0xd0
[ 9748.175919] kernel:  ? xas_load+0x11/0x100
[ 9748.176032] kernel:  ? xas_find+0x83/0x1b0
[ 9748.176116] kernel:  ? next_uptodate_folio+0xa0/0x350
[ 9748.176186] kernel:  ? filemap_map_pages+0x35c/0x5a0
[ 9748.176255] kernel:  ? memcg1_check_events+0x60/0x1d0
[ 9748.176325] kernel:  ? do_read_fault+0x107/0x260
[ 9748.176393] kernel:  ? handle_pte_fault+0x118/0x240
[ 9748.176461] kernel:  ? do_fault+0x150/0x260
[ 9748.176523] kernel:  ? __handle_mm_fault+0x551/0x6a0
[ 9748.176591] kernel:  ? count_memcg_events+0xd6/0x220
[ 9748.176670] kernel:  ? handle_mm_fault+0x248/0x360
[ 9748.176740] kernel:  ? do_user_addr_fault+0x21a/0x690
[ 9748.176803] kernel:  ? exc_page_fault+0x74/0x180
[ 9748.176873] kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 9748.176943] kernel: RIP: 0033:0x7f4a739060ed
[ 9748.176996] kernel: RSP: 002b:00007f4a737aec50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 9748.177102] kernel: RAX: ffffffffffffffda RBX: 000055e4140b79e0 RCX: 00007f4a739060ed
[ 9748.177181] kernel: RDX: 000055e4140b79e0 RSI: 00000000c400941b RDI: 0000000000000003
[ 9748.177251] kernel: RBP: 00007f4a737aeca0 R08: 0000000000000020 R09: 31203a6b63617473
[ 9748.177330] kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4a737af6c0
[ 9748.177399] kernel: R13: 00007ffe1aba7a10 R14: 00007f4a737afcdc R15: 00007ffe1aba7b17
[ 9748.177461] kernel:  </TASK>
[ 9748.177531] kernel: OOM killer enabled.
[ 9748.177593] kernel: Restarting tasks: Starting
[ 9748.177678] kernel: Restarting tasks: Done
[ 9748.177746] kernel: random: crng reseeded on system resumption
[ 9748.318065] kernel: PM: suspend exit


The storage stack is: USB flash drive -> dm-crypt -> Btrfs


Upstream report: https://lore.kernel.org/linux-btrfs/d93b2a2d-6ad9-4c49-809f-11d769a6f30a@app.fastmail.com/T/#u

Comment 1 Chris Murphy 2025-09-15 23:53:08 UTC
Upstream is aware of the issue. It's not a regression. And a couple of solutions are being mulled over.

Comment 2 Chris Murphy 2025-10-12 22:16:54 UTC
Per response upstream https://lore.kernel.org/linux-btrfs/20251012085256.8628-1-safinaskar@gmail.com/ switching to systemd.

See also bug and patch: https://github.com/systemd/systemd/issues/38337

The problem isn't happening on Fedora 43. But does happen on Fedora 42 with systemd-257.9-2.fc42.

Comment 3 David Tardon 2025-10-16 13:20:58 UTC
(In reply to Chris Murphy from comment #2)
> See also bug and patch: https://github.com/systemd/systemd/issues/38337

The fix for this is already included in v257.8...

Comment 4 Chris Murphy 2025-10-24 01:47:20 UTC
OK so then systemd isn't the right component after all. Switching back to kernel.