Bug 2224346 - fs/btrfs/relocation.c:1937 prepare_to_merge+0x41f/0x430, permanently wedged in filesystem
Summary: fs/btrfs/relocation.c:1937 prepare_to_merge+0x41f/0x430, permanently wedged i...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: fedora-kernel-btrfs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-20 13:17 UTC by Chris Murphy
Modified: 2023-07-20 13:46 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg (9.79 KB, text/plain)
2023-07-20 13:19 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2023-07-20 13:17:17 UTC
kernel 6.3.12-100.fc37.x86_64
btrfs-progs 6.3.2-1.fc37.x86_64


User reports attempt to convert from metadata single to dup using:
>btrfs balance start -mconvert=dup /

The convert fails half way through and the file system goes read only. (dmesg 1)

Reboot initial resolves the problem but soon it starts to fail simple tasks like "touch sample.txt" operations with no space errors. Then discover the systemd-journal is likewise reporting it can't write to disk due to out of space which then causes a feedback loop of more systemd errors, and more need to write more errors.

None of the following succeed:

>btrfs balance start -dusage=1 -dlimit=1 /
>btrfs balance start -mconvert=single,soft /

At this point we collect the following information:

# btrfs fi usage /btrfs_root/sda1
Overall:
    Device size:                  50.92GiB
    Device allocated:             50.92GiB
    Device unallocated:            1.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                         29.20GiB
    Free (estimated):             20.49GiB      (min: 20.49GiB)
    Free (statfs, df):            20.49GiB
    Data ratio:                       1.00
    Metadata ratio:                   1.18
    Global reserve:              102.70MiB      (used: 0.00B)
    Multiple profiles:                 yes      (metadata, system)

Data,single: Size:47.98GiB, Used:27.48GiB (57.29%)
   /dev/sda1      47.98GiB

Metadata,single: Size:2.00GiB, Used:943.66MiB (46.08%)
   /dev/sda1       2.00GiB

Metadata,DUP: Size:463.00MiB, Used:408.33MiB (88.19%)
   /dev/sda1     926.00MiB

System,single: Size:32.00MiB, Used:0.00B (0.00%)
   /dev/sda1      32.00MiB

System,DUP: Size:6.00MiB, Used:16.00KiB (0.26%)
   /dev/sda1      12.00MiB

Unallocated:
   /dev/sda1       1.00MiB


sysfs

allocation/metadata/disk_used:1917665280
allocation/metadata/bytes_pinned:0
allocation/metadata/chunk_size:1073741824
allocation/metadata/bytes_used:1051197440
allocation/metadata/bg_reclaim_threshold:0
allocation/metadata/size_classes:none 0
allocation/metadata/size_classes:small 0
allocation/metadata/size_classes:medium 0
allocation/metadata/size_classes:large 0
allocation/metadata/single/used_bytes:184729600
allocation/metadata/single/total_bytes:2147483648
allocation/metadata/dup/used_bytes:866467840
allocation/metadata/dup/total_bytes:972029952
allocation/metadata/disk_total:4091543552
allocation/metadata/total_bytes:3119513600
allocation/metadata/bytes_reserved:0
allocation/metadata/bytes_readonly:1962754048
allocation/metadata/bytes_zone_unusable:0
allocation/metadata/bytes_may_use:105512960
allocation/metadata/flags:4
allocation/system/disk_used:32768
allocation/system/bytes_pinned:0
allocation/system/chunk_size:33554432
allocation/system/bytes_used:16384
allocation/system/bg_reclaim_threshold:0
allocation/system/size_classes:none 0
allocation/system/size_classes:small 0
allocation/system/size_classes:medium 0
allocation/system/size_classes:large 0
allocation/system/dup/used_bytes:16384
allocation/system/dup/total_bytes:67108864
allocation/system/disk_total:134217728
allocation/system/total_bytes:67108864
allocation/system/bytes_reserved:0
allocation/system/bytes_readonly:0
allocation/system/bytes_zone_unusable:0
allocation/system/bytes_may_use:0
allocation/system/flags:2
allocation/global_rsv_reserved:105512960
allocation/data/disk_used:29451214848
allocation/data/bytes_pinned:0
allocation/data/chunk_size:10737418240
allocation/data/bytes_used:29451214848
allocation/data/bg_reclaim_threshold:0
allocation/data/size_classes:none 5
allocation/data/size_classes:small 32
allocation/data/size_classes:medium 7
allocation/data/size_classes:large 5
allocation/data/single/used_bytes:29451214848
allocation/data/single/total_bytes:50453282816
allocation/data/disk_total:50453282816
allocation/data/total_bytes:50453282816
allocation/data/bytes_reserved:0
allocation/data/bytes_readonly:0
allocation/data/bytes_zone_unusable:0
allocation/data/bytes_may_use:0
allocation/data/flags:1
allocation/global_rsv_size:105512960




Reproducible: 
Once stuck in this situation, the bogus free space errors are 100% reproducible and persistent. To get it into this situation, it takes a file system aged just right to wedge itself in like this.

We don't know for sure what the block group mapping was immediately prior to the problem in order to reproduce the problem.


Actual Results:  
many no free space errors, despite a lot of unused space in both metadata and data block groups

file system was considered unrepairable, user reformatted

Expected Results:  
*shrug* not this. Whether the user space tools should disallow metadata conversion unless there's a minimum amount of unallocated space; or whether the kernel has a new bug that's making this old problem worse - no idea

Looks similar to:
https://lore.kernel.org/lkml/000000000000a3d67705ff730522@google.com/T/

Comment 1 Chris Murphy 2023-07-20 13:19:28 UTC
Created attachment 1976738 [details]
dmesg

[Thu Jul 20 00:19:49 2023] BTRFS info (device sda1): balance: start -mconvert=dup -sconvert=dup
[Thu Jul 20 00:19:49 2023] BTRFS info (device sda1): relocating block group 83480281088 flags metadata
[Thu Jul 20 00:20:05 2023] ------------[ cut here ]------------
[Thu Jul 20 00:20:05 2023] BTRFS: Transaction aborted (error -28)
[Thu Jul 20 00:20:05 2023] WARNING: CPU: 1 PID: 180121 at fs/btrfs/relocation.c:1937 prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023] Modules linked in: ib_core bluetooth tun overlay nft_compat nf_tables veth nfsv3 nfs_acl nfs lockd grace fscache netfs tls sit tunnel4 ip_tunnel ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack xt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack bridge stp llc nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set dell_rbtn nfnetlink dell_laptop ebtable_filter rfkill ebtables dell_smm_hwmon iTCO_wdt ip6table_filter intel_pmc_bxt iptable_filter iTCO_vendor_support gpio_ich snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi coretemp snd_hda_codec qrtr kvm_intel snd_hda_core snd_hwdep snd_seq snd_seq_device uvcvideo kvm uvc videobuf2_vmalloc videobuf2_memops snd_pcm irqbypass videobuf2_v4l2 dell_wmi i2c_i801 dell_smbios ledtrig_audio sparse_keymap wmi_bmof dcdbas dell_wmi_descriptor
[Thu Jul 20 00:20:05 2023]  videobuf2_common i2c_smbus joydev videodev mc snd_timer snd sunrpc lpc_ich soundcore acpi_cpufreq loop zram hid_microsoft ff_memless binfmt_misc nouveau sdhci_pci drm_ttm_helper cqhci ttm sdhci i2c_algo_bit mxm_wmi firewire_ohci mmc_core drm_display_helper firewire_core sha512_ssse3 e1000e serio_raw r8169 crc_itu_t yenta_socket cec video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
[Thu Jul 20 00:20:05 2023] CPU: 1 PID: 180121 Comm: btrfs Not tainted 6.3.12-100.fc37.x86_64 #1
[Thu Jul 20 00:20:05 2023] Hardware name: Dell Inc. Latitude E6500                  /0PP476, BIOS A29 06/04/2013
[Thu Jul 20 00:20:05 2023] RIP: 0010:prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023] Code: ad e8 75 e1 04 00 eb e0 44 89 f6 48 c7 c7 b8 8e 90 ad e8 a4 07 ab ff 0f 0b eb b4 44 89 f6 48 c7 c7 b8 8e 90 ad e8 91 07 ab ff <0f> 0b eb ba e8 38 be 93 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[Thu Jul 20 00:20:05 2023] RSP: 0018:ffffb6a64b387af8 EFLAGS: 00010282
[Thu Jul 20 00:20:05 2023] RAX: 0000000000000000 RBX: ffff9ac5cc749000 RCX: 0000000000000027
[Thu Jul 20 00:20:05 2023] RDX: ffff9ac617d21548 RSI: 0000000000000001 RDI: ffff9ac617d21540
[Thu Jul 20 00:20:05 2023] RBP: ffff9ac5004e0680 R08: 0000000000000000 R09: ffffb6a64b387988
[Thu Jul 20 00:20:05 2023] R10: 0000000000000003 R11: ffffffffae146108 R12: 00000000ffffffe4
[Thu Jul 20 00:20:05 2023] R13: ffff9ac50a5f7000 R14: 00000000ffffffe4 R15: ffff9ac4a3464358
[Thu Jul 20 00:20:05 2023] FS:  00007f0d6fc1b900(0000) GS:ffff9ac617d00000(0000) knlGS:0000000000000000
[Thu Jul 20 00:20:05 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Jul 20 00:20:05 2023] CR2: 000000c0010a9008 CR3: 000000001a196000 CR4: 00000000000406e0
[Thu Jul 20 00:20:05 2023] Call Trace:
[Thu Jul 20 00:20:05 2023]  <TASK>
[Thu Jul 20 00:20:05 2023]  ? prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023]  ? __warn+0x81/0x130
[Thu Jul 20 00:20:05 2023]  ? prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023]  ? report_bug+0x171/0x1a0
[Thu Jul 20 00:20:05 2023]  ? prb_read_valid+0x1b/0x30
[Thu Jul 20 00:20:05 2023]  ? handle_bug+0x41/0x70
[Thu Jul 20 00:20:05 2023]  ? exc_invalid_op+0x17/0x70
[Thu Jul 20 00:20:05 2023]  ? asm_exc_invalid_op+0x1a/0x20
[Thu Jul 20 00:20:05 2023]  ? prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023]  ? prepare_to_merge+0x41f/0x430
[Thu Jul 20 00:20:05 2023]  relocate_block_group+0x130/0x500
[Thu Jul 20 00:20:05 2023]  btrfs_relocate_block_group+0x296/0x430
[Thu Jul 20 00:20:05 2023]  btrfs_relocate_chunk+0x3f/0x160
[Thu Jul 20 00:20:05 2023]  btrfs_balance+0x905/0x1390
[Thu Jul 20 00:20:05 2023]  ? __kmem_cache_alloc_node+0x187/0x320
[Thu Jul 20 00:20:05 2023]  ? btrfs_ioctl+0x2435/0x2640
[Thu Jul 20 00:20:05 2023]  btrfs_ioctl+0x224e/0x2640
[Thu Jul 20 00:20:05 2023]  ? ioctl_has_perm.constprop.0.isra.0+0xdd/0x140
[Thu Jul 20 00:20:05 2023]  __x64_sys_ioctl+0x94/0xd0
[Thu Jul 20 00:20:05 2023]  do_syscall_64+0x5f/0x90
[Thu Jul 20 00:20:05 2023]  ? exit_to_user_mode_prepare+0x188/0x1f0
[Thu Jul 20 00:20:05 2023]  ? syscall_exit_to_user_mode+0x1b/0x40
[Thu Jul 20 00:20:05 2023]  ? do_syscall_64+0x6b/0x90
[Thu Jul 20 00:20:05 2023]  ? syscall_exit_to_user_mode+0x1b/0x40
[Thu Jul 20 00:20:05 2023]  ? do_syscall_64+0x6b/0x90
[Thu Jul 20 00:20:05 2023]  ? syscall_exit_to_user_mode+0x1b/0x40
[Thu Jul 20 00:20:05 2023]  ? do_syscall_64+0x6b/0x90
[Thu Jul 20 00:20:05 2023]  ? exc_page_fault+0x74/0x170
[Thu Jul 20 00:20:05 2023]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[Thu Jul 20 00:20:05 2023] RIP: 0033:0x7f0d6fd66d6f
[Thu Jul 20 00:20:05 2023] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Thu Jul 20 00:20:05 2023] RSP: 002b:00007fff3c9bc290 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Thu Jul 20 00:20:05 2023] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0d6fd66d6f
[Thu Jul 20 00:20:05 2023] RDX: 00007fff3c9bc390 RSI: 00000000c4009420 RDI: 0000000000000003
[Thu Jul 20 00:20:05 2023] RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000073
[Thu Jul 20 00:20:05 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff3c9be3f4
[Thu Jul 20 00:20:05 2023] R13: 00007fff3c9bc390 R14: 0000000000000001 R15: 0000000000000000
[Thu Jul 20 00:20:05 2023]  </TASK>
[Thu Jul 20 00:20:05 2023] ---[ end trace 0000000000000000 ]---
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): dumping space info:
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info DATA has 22002716672 free, is not full
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info total=51514441728, used=29511708672, pinned=0, reserved=0, may_use=16384, readonly=0 zone_unusable=0
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info METADATA has 242122752 free, is full
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info total=2632974336, used=1418067968, pinned=53788672, reserved=5505024, may_use=365117440, readonly=548372480 zone_unusable=0
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info SYSTEM has 39829504 free, is not full
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): space_info total=39845888, used=16384, pinned=0, reserved=0, may_use=0, readonly=0 zone_unusable=0
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): global_block_rsv: size 107692032 reserved 107692032
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): trans_block_rsv: size 0 reserved 0
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): chunk_block_rsv: size 0 reserved 0
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): delayed_block_rsv: size 655360 reserved 655360
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state A): delayed_refs_rsv: size 2482503680 reserved 254541824
[Thu Jul 20 00:20:05 2023] BTRFS: error (device sda1: state A) in prepare_to_merge:1937: errno=-28 No space left
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state EA): forced readonly
[Thu Jul 20 00:20:05 2023] BTRFS info (device sda1: state EA): balance: ended with status: -30


Note You need to log in before you can comment on or make changes to this bug.