Bug 2082022 - btrfs filesystem resize (shrink) sometimes fails, systemd-homework
Summary: btrfs filesystem resize (shrink) sometimes fails, systemd-homework
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 36
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:2120918b369ab9668e1554bac0f...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-05 08:45 UTC by Robert Mader
Modified: 2022-08-14 17:27 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-14 17:27:35 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: dmesg (100.32 KB, text/plain)
2022-05-05 08:45 UTC, Robert Mader
no flags Details
journalctr -b log (367.17 KB, text/plain)
2022-06-26 18:33 UTC, Robert Mader
no flags Details
Log according to comment 12 (858.03 KB, text/plain)
2022-06-30 08:11 UTC, Robert Mader
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kdave btrfs-progs issues 271 0 None open bogus min size estimates by 'btrfs inspect min' 2022-08-14 17:25:53 UTC
Github systemd systemd issues 19398 0 None open systemd-homed cannot resize home 2022-08-14 17:25:55 UTC

Description Robert Mader 2022-05-05 08:45:33 UTC
Description of problem:
Apparently happened during booting.

Additional info:
reporter:       libreport-2.17.1
WARNING: CPU: 11 PID: 81628 at fs/btrfs/extent-tree.c:2159 btrfs_run_delayed_refs+0x196/0x1e0
Modules linked in: binfmt_misc tls uinput dm_crypt loop rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc vfat fat iwlmvm mac80211 intel_rapl_msr intel_rapl_common libarc4 snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd ledtrig_audio iwlwifi snd_hda_codec_hdmi btusb snd_hda_intel btrtl snd_intel_dspcfg btbcm snd_intel_sdw_acpi snd_usb_audio snd_hda_codec iwlmei btintel uvcvideo snd_hda_core snd_usbmidi_lib btmtk videobuf2_vmalloc snd_hwdep videobuf2_memops cfg80211 kvm bluetooth snd_seq snd_rawmidi videobuf2_v4l2 snd_seq_device videobuf2_common snd_pcm videodev snd_timer irqbypass rapl mc gigabyte_wmi wmi_bmof snd i2c_piix4 k10temp mei ecdh_generic rfkill soundcore acpi_cpufreq gpio_amdpt gpio_generic zram
 amdgpu crct10dif_pclmul crc32_pclmul drm_ttm_helper crc32c_intel nvme ttm ghash_clmulni_intel ccp r8169 iommu_v2 sp5100_tco nvme_core gpu_sched wmi ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse i2c_dev
CPU: 11 PID: 81628 Comm: systemd-homewor Not tainted 5.17.5-300.fc36.x86_64 #1
Hardware name: CSL-Computer GmbH & Co. KG 5946/B550 AORUS ELITE V2, BIOS F14e 10/13/2021
RIP: 0010:btrfs_run_delayed_refs+0x196/0x1e0
Code: 48 8d 91 48 0a 00 00 f0 48 0f ba 2a 03 72 20 83 f8 fb 74 39 83 f8 e2 74 34 89 c6 48 c7 c7 60 11 65 97 89 04 24 e8 e3 0f 7d 00 <0f> 0b 8b 04 24 89 c1 ba 6f 08 00 00 48 89 df 89 04 24 48 c7 c6 80
RSP: 0018:ffffba6e04ad7b68 EFLAGS: 00010296
RAX: 0000000000000026 RBX: ffff9f7d0c70a958 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff97665ad5 RDI: 00000000ffffffff
RBP: ffff9f7e0a146b78 R08: 0000000000000000 R09: ffffba6e04ad79a8
R10: ffffba6e04ad79a0 R11: 0000000000000003 R12: ffff9f7e0a146a00
R13: ffff9f7c881f3000 R14: ffff9f7c881f3000 R15: ffff9f7e0a146a00
FS:  00007f6105cf3b80(0000) GS:ffff9f8b7ecc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005593907b06d0 CR3: 000000013f79a000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 btrfs_commit_transaction+0x52/0xb00
 ? start_transaction+0xc3/0x5e0
 relocate_block_group+0x179/0x4c0
 btrfs_relocate_block_group+0x22e/0x3f0
 ? preempt_count_add+0x64/0x90
 btrfs_relocate_chunk+0x27/0xe0
 btrfs_shrink_device+0x255/0x570
 btrfs_ioctl_resize+0x2ed/0x400
 btrfs_ioctl+0x1a5a/0x2b80
 ? ioctl_has_perm.constprop.0.isra.0+0xaa/0xf0
 ? __seccomp_filter+0x27b/0x4c0
 __x64_sys_ioctl+0x8d/0xc0
 do_syscall_64+0x3a/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f610636da3f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffcbe940430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000029b04f5000 RCX: 00007f610636da3f
RDX: 00007ffcbe940520 RSI: 0000000050009403 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000075
R10: 00007ffcbe9401cc R11: 0000000000000246 R12: 00007ffcbe940520
R13: 0000000000000000 R14: 000000000000005d R15: 0000557c349231b0
 </TASK>

Comment 1 Robert Mader 2022-05-05 08:45:38 UTC
Created attachment 1877278 [details]
File: dmesg

Comment 2 Chris Murphy 2022-06-08 01:01:36 UTC
The attached dmesg appears to be for a different boot, doesn't contain the call trace above. Can you get the full dmesg? Possibly from `journalctl -b-1 -k > dmesg.log` where you give -b flag a negative value for how many previous boots back you want to go. You could use the command without redirecting to std out to try and fine the boot that contains the call trace.

journalctl -b-1 -k | grep -i btrfs
journalctl -b-2 -k | grep -i btrfs
journalctl -b-3 -k | grep -i btrfs

Comment 3 Chris Murphy 2022-06-08 01:13:23 UTC
Was a file system shrink or device removal happening at the time? The call trace suggests the problem happens during block group relocation while doing file system shrink.

Comment 4 Chris Murphy 2022-06-08 01:28:50 UTC
By any chance are you using systemd-homed? It can do fs shrink and grow during startup I think.

Comment 6 Robert Mader 2022-06-24 09:28:28 UTC
> By any chance are you using systemd-homed? It can do fs shrink and grow during startup I think.

Sorry for the late reply! I'm indeed using systemd-homed.

Comment 7 Chris Murphy 2022-06-25 21:44:03 UTC
OK if you could rebase to 5.18 series, kernel 5.18.6 is stable and available in updates repo. And then if you can reproduce, please include a full journal log so we can see the systemd-homed message interleaved with kernel messages, that way we can find out if it's complaining about the host or homed fs, and what homed activity might be triggering the problem. Thanks.

Comment 8 Robert Mader 2022-06-26 18:29:41 UTC
Description of problem:
On boot/login using an encrypted btfrs systemd-homed home partition.

Version-Release number of selected component:
kernel-core-5.18.6-200.fc36

Additional info:
reporter:       libreport-2.17.1
cmdline:        BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.18.6-200.fc36.x86_64 root=UUID=bef85d3d-d314-4c06-82cd-329d7718744f ro rootflags=subvol=root rhgb quiet
crash_function: btrfs_commit_transaction
kernel:         5.18.6-200.fc36.x86_64
runlevel:       unknown
type:           Kerneloops

Truncated backtrace:
WARNING: CPU: 30 PID: 4261 at fs/btrfs/extent-tree.c:2151 btrfs_run_delayed_refs+0x196/0x1e0
Modules linked in: tls uinput dm_crypt loop rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc vfat fat iwlmvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi mac80211 intel_rapl_msr snd_hda_intel intel_rapl_common snd_intel_dspcfg libarc4 snd_intel_sdw_acpi snd_hda_codec snd_usb_audio edac_mce_amd iwlwifi uvcvideo btusb snd_usbmidi_lib snd_hda_core btrtl snd_rawmidi videobuf2_vmalloc snd_hwdep btbcm videobuf2_memops snd_seq btintel videobuf2_v4l2 iwlmei videobuf2_common btmtk snd_seq_device kvm bluetooth cfg80211 snd_pcm videodev irqbypass mc rapl snd_timer gigabyte_wmi wmi_bmof snd k10temp i2c_piix4 ecdh_generic mei rfkill soundcore acpi_cpufreq gpio_amdpt gpio_generic zram amdgpu
 drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 nvme gpu_sched ghash_clmulni_intel drm_dp_helper r8169 ccp nvme_core sp5100_tco wmi ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse i2c_dev
CPU: 30 PID: 4261 Comm: systemd-homewor Not tainted 5.18.6-200.fc36.x86_64 #1
Hardware name: CSL-Computer GmbH & Co. KG 5946/B550 AORUS ELITE V2, BIOS F14e 10/13/2021
RIP: 0010:btrfs_run_delayed_refs+0x196/0x1e0
Code: 48 8d 91 50 0a 00 00 f0 48 0f ba 2a 03 72 20 83 f8 fb 74 39 83 f8 e2 74 34 89 c6 48 c7 c7 98 9c 65 a8 89 04 24 e8 25 4b 7d 00 <0f> 0b 8b 04 24 89 c1 ba 67 08 00 00 48 89 df 89 04 24 48 c7 c6 80
RSP: 0018:ffffa90507a13ad0 EFLAGS: 00010286
RAX: 0000000000000026 RBX: ffff8fa30f5f8888 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffffa866ec04 RDI: 00000000ffffffff
RBP: ffff8fa33c3ba778 R08: 0000000000000000 R09: ffffa90507a13908
R10: 0000000000000003 R11: ffffffffa8f453e8 R12: ffff8fa33c3ba600
R13: ffff8fa3060d2000 R14: ffff8fa3060d2000 R15: ffff8fa33c3ba600
FS:  00007f6071c5fb80(0000) GS:ffff8fb1fc380000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f71e860ca50 CR3: 0000000108656000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 btrfs_commit_transaction+0x52/0xbc0
 ? start_transaction+0xc3/0x5e0
 relocate_block_group+0x179/0x4c0
 btrfs_relocate_block_group+0x22e/0x3f0
 ? preempt_count_add+0x64/0x90
 btrfs_relocate_chunk+0x3b/0xf0
 btrfs_shrink_device+0x255/0x570
 btrfs_ioctl_resize+0x2ed/0x400
 btrfs_ioctl+0xd34/0x2850
 ? __seccomp_filter+0x21b/0x4c0
 __x64_sys_ioctl+0x8d/0xc0
 do_syscall_64+0x5b/0x80
 ? handle_mm_fault+0xae/0x280
 ? do_user_addr_fault+0x1e2/0x670
 ? exc_page_fault+0x70/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f60722dd72f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffef6010e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000002b551be000 RCX: 00007f60722dd72f
RDX: 00007ffef6010f20 RSI: 0000000050009403 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000075
R10: 00007ffef6010bcc R11: 0000000000000246 R12: 00007ffef6010f20
R13: 0000000000000000 R14: 000000000000005d R15: 000056459506ce20
 </TASK>

Comment 9 Robert Mader 2022-06-26 18:33:39 UTC
Created attachment 1892814 [details]
journalctr -b log

Here's the requested log, assuming this was what you meant.

Comment 10 Chris Murphy 2022-06-27 16:59:04 UTC
Thanks but I'm not seeing the call trace from comment 8 in the journalctl from comment 9. So I don't think they match? You might need to use `journalctl -b-1` or -2 or -3 to go back to prior boots. I'm gonna guess that the abrt complain is about a prior boot.

Comment 11 Robert Mader 2022-06-27 17:21:09 UTC
Indeed, that confused me as well - that would also mean it does not necessarily happen on boot. Will get back once I see it again.

Comment 12 Chris Murphy 2022-06-27 18:08:14 UTC
You can search for it if you want by iterating:

`journalctl -b-1 | grep btrfs_relocate_chunk`
`journalctl -b-2 | grep btrfs_relocate_chunk`
`journalctl -b-3 | grep btrfs_relocate_chunk`

If you find that section of the call trace in any of those prior boots, you can then just redirect that boot to a file, e.g. `journalctl -b-1 -o short-monotonic --no-hostname > journal.log`

Comment 13 Robert Mader 2022-06-30 08:09:32 UTC
Description of problem:
Likely happened when logging into homed-systemd user with encrypted btrfs home partition.

Version-Release number of selected component:
kernel-core-5.18.6-200.fc36

Additional info:
reporter:       libreport-2.17.1
cmdline:        BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.18.6-200.fc36.x86_64 root=UUID=bef85d3d-d314-4c06-82cd-329d7718744f ro rootflags=subvol=root rhgb quiet
crash_function: btrfs_commit_transaction
kernel:         5.18.6-200.fc36.x86_64
runlevel:       unknown
type:           Kerneloops

Truncated backtrace:
WARNING: CPU: 18 PID: 139605 at fs/btrfs/extent-tree.c:2151 btrfs_run_delayed_refs+0x196/0x1e0
Modules linked in: binfmt_misc tls uinput dm_crypt loop rfcomm snd_seq_dummy snd_hrtimer nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc vfat fat intel_rapl_msr intel_rapl_common iwlmvm snd_hda_codec_realtek snd_hda_codec_generic mac80211 edac_mce_amd ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi uvcvideo libarc4 snd_usb_audio snd_hda_codec btusb videobuf2_vmalloc videobuf2_memops btrtl snd_hda_core btbcm snd_usbmidi_lib videobuf2_v4l2 btintel snd_rawmidi iwlwifi snd_hwdep kvm videobuf2_common snd_seq iwlmei btmtk snd_seq_device videodev snd_pcm irqbypass bluetooth cfg80211 rapl snd_timer mc gigabyte_wmi wmi_bmof snd mei i2c_piix4 k10temp ecdh_generic rfkill soundcore gpio_amdpt gpio_generic acpi_cpufreq zram
 amdgpu drm_ttm_helper ttm iommu_v2 crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched ghash_clmulni_intel nvme drm_dp_helper ccp r8169 sp5100_tco nvme_core wmi ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse i2c_dev
CPU: 18 PID: 139605 Comm: systemd-homewor Not tainted 5.18.6-200.fc36.x86_64 #1
Hardware name: CSL-Computer GmbH & Co. KG 5946/B550 AORUS ELITE V2, BIOS F14e 10/13/2021
RIP: 0010:btrfs_run_delayed_refs+0x196/0x1e0
Code: 48 8d 91 50 0a 00 00 f0 48 0f ba 2a 03 72 20 83 f8 fb 74 39 83 f8 e2 74 34 89 c6 48 c7 c7 98 9c 65 a3 89 04 24 e8 25 4b 7d 00 <0f> 0b 8b 04 24 89 c1 ba 67 08 00 00 48 89 df 89 04 24 48 c7 c6 80
RSP: 0018:ffffae3f118bbb88 EFLAGS: 00010292
RAX: 0000000000000026 RBX: ffff93460a8ad0d0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffffa366ec04 RDI: 00000000ffffffff
RBP: ffff934b017c5978 R08: 0000000000000000 R09: ffffae3f118bb9c0
R10: 0000000000000003 R11: ffffffffa3f453e8 R12: ffff934b017c5800
R13: ffff934600c92000 R14: ffff934600c92000 R15: ffff934b017c5800
FS:  00007f4ceaff7b80(0000) GS:ffff9354fc080000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4ceb12010b CR3: 00000001af126000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 <TASK>
 btrfs_commit_transaction+0x52/0xbc0
 ? start_transaction+0xc3/0x5e0
 relocate_block_group+0x179/0x4c0
 btrfs_relocate_block_group+0x22e/0x3f0
 ? preempt_count_add+0x64/0x90
 btrfs_relocate_chunk+0x3b/0xf0
 btrfs_shrink_device+0x255/0x570
 btrfs_ioctl_resize+0x2ed/0x400
 ? _copy_to_user+0x21/0x30
 btrfs_ioctl+0xd34/0x2850
 ? fd_statfs+0x1b/0x70
 ? security_file_ioctl+0x3c/0x60
 __x64_sys_ioctl+0x8d/0xc0
 do_syscall_64+0x5b/0x80
 ? exc_page_fault+0x70/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4ceb67576f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007fff921eebc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000002b7d037000 RCX: 00007f4ceb67576f
RDX: 00007fff921eecb0 RSI: 0000000050009403 RDI: 0000000000000004
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000075
R10: 00007fff921ee95c R11: 0000000000000246 R12: 00007fff921eecb0
R13: 0000000000000000 R14: 000000000000005d R15: 000055ed54784bb0
 </TASK>

Comment 14 Robert Mader 2022-06-30 08:11:14 UTC
Created attachment 1893561 [details]
Log according to comment 12

Comment 15 Chris Murphy 2022-06-30 18:31:42 UTC
[24753.375101] systemd-homework[139605]: Discarded unused 9.4G.
[24753.423525] systemd-homework[139605]: Syncing completed.
[24753.423650] audit[139605]: AVC avc:  denied  { read write } for  pid=139605 comm="systemd-homewor" name="robert.home" dev="nvme0n1p3" ino=6332 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
[24753.423720] audit[139605]: AVC avc:  denied  { open } for  pid=139605 comm="systemd-homewor" path="/home/robert.home" dev="nvme0n1p3" ino=6332 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:home_root_t:s0 tclass=file permissive=1
[24754.064686] systemd-homework[139605]: Discovered used LUKS device /dev/mapper/home-robert, and validated password.
[24754.201065] systemd-homework[139605]: Successfully re-activated LUKS device.
[24754.201130] systemd-homework[139605]: Discovered used loopback device /dev/loop0.
[24754.201182] systemd-homework[139605]: offset = 1048576, size = 950680064000, image = 950681133056
[24754.364371] systemd-homework[139605]: Ready to resize image size 885.3G → 173.9G, partition size 885.3G → 173.9G, file system size 885.3G → 173.9G.
[24754.999338] systemd-homework[139605]: Discarded unused 9.4G.
[24754.806788] kernel: BTRFS info (device dm-0): relocating block group 208336322560 flags data
[24754.814172] kernel: BTRFS info (device dm-0): relocating block group 192230195200 flags data
[24754.826658] kernel: BTRFS info (device dm-0): relocating block group 206188838912 flags data
[24754.834296] kernel: BTRFS info (device dm-0): relocating block group 190082711552 flags metadata|dup
[24755.980565] systemd-homework[139605]: Failed to resize file system: Read-only file system


I suspect the problem is 173.9G is just too small, and since the shrink operation is pass/fail, it has to fail. I'm not sure if it's practical for the kernel to do a "best effort" because that would require handing back to the requesting process the value that succeeded, so that systemd-homework can adjust the block device size (file or partition) according to actual shrink, not requested shrink.

A related problem is https://github.com/kdave/btrfs-progs/issues/271 

On the one hand shrink estimation probably should exist in libbtrfs, but systemd would probably just copy it rather than depend on libbtrfs, so it could get stale until someone notices. That suggests it'd be better if there were a kernel ioctl for shrink estimation. *shrug*

(Probably unrelated, I'm not sure why discard unused happens twice. Also, I suspect discard should happen after shrink. Doing it before shrink is sorta wasted because we're just going to move bg's around during shrink, and thus increase backing file size.)

Comment 17 Chris Murphy 2022-06-30 19:07:39 UTC
@robert the next time you get an abrt notification of this shrink failure and call trace, can you post two things?

1. This line: [24754.364371] systemd-homework[139605]: Ready to resize image size 885.3G → 173.9G, partition size 885.3G → 173.9G, file system size 885.3G → 173.9G.
I want to see the latest sizes.

2. Output from
grep -R . /sys/fs/btrfs/$fsuuid/allocation

If you have more than one btrfs file system, there might be more than one UUID in /sys/fs/btrfs/, you can find the correct one with `lsblk -f`; i.e. $fsuuid is not literal, it's a long random number unique to your btrfs fs. This output will give us much more detail what the kernel thinks about fs allocations, and maybe we can guess if in fact the shrink request is just too aggressive or if something else is going on.

Comment 18 Robert Mader 2022-08-05 09:30:19 UTC
Bug 2115749 appears to be a dupe with slightly different trace. Here's the new numbers I got:

```
[22920.060902] systemd-homework[265152]: Discovered used loopback device /dev/loop0.
[22920.061026] systemd-homework[265152]: offset = 1048576, size = 950680064000, image = 950681133056
[22920.230910] systemd-homework[265152]: Ready to resize image size 885.3G → 175.1G, partition size 885.3G → 175.1G, file system size 885.3G → 175.0G.
[22921.572937] systemd-homework[265152]: Discarded unused 11.2G.
```

```
grep -R . /sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/disk_used:6384713728
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_pinned:32768
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_used:3192356864
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/dup/used_bytes:3192356864
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/dup/total_bytes:4294967296
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/disk_total:8589934592
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/total_bytes:4294967296
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_reserved:720896
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_readonly:65536
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_zone_unusable:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/bytes_may_use:449052672
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/flags:4
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/disk_used:98304
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_pinned:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_used:49152
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/dup/used_bytes:49152
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/dup/total_bytes:8388608
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/disk_total:16777216
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/total_bytes:8388608
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_reserved:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_readonly:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_zone_unusable:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/bytes_may_use:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/system/flags:2
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/global_rsv_reserved:387891200
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/disk_used:181216686080
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_pinned:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_used:181216686080
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/single/used_bytes:181216686080
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/single/total_bytes:191134433280
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/disk_total:191134433280
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/total_bytes:191134433280
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_reserved:303104
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_readonly:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_zone_unusable:0
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/bytes_may_use:8192
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/flags:1
/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/global_rsv_size:387891200
```

Comment 19 Chris Murphy 2022-08-05 13:52:18 UTC
94% full, 178G
>/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/single/used_bytes:181216686080
>/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/data/single/total_bytes:191134433280
74% full, 4G (8G)
>/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/dup/used_bytes:3192356864
>/sys/fs/btrfs/88a9c414-112b-41e3-ac7b-76f4b1ea033f/allocation/metadata/dup/total_bytes:4294967296

This was never going to work. The absolute minimum that might succeed is 190G, and even that could fail. Plus it's a lot of work to do such a shrink. The more aggressive the shrink, the more extents have to be migrated further in (lower address values). Is that work really necessary? Especially on log out? Especially when FITRIM() is being used to return home unused blocks to the underlying file system? I think just FITRIM() on logout is sufficient, then do resize on login. That's way faster too. And I'm not sure what's gained by resize shrink on logout, it's just a lot of extra IO that doesn't help actually return more blocks to the underlying fs.

Comment 20 Chris Murphy 2022-08-05 14:12:23 UTC
Switching this to systemd since the most immediate problem is requesting way too aggressive shrink.


Note You need to log in before you can comment on or make changes to this bug.