Bug 1905618 - mount failure, WARNING at fs/btrfs/extent-tree.c:3060 __btrfs_free_extent.isra.0+0x5fd/0x8d0
Summary: mount failure, WARNING at fs/btrfs/extent-tree.c:3060 __btrfs_free_extent.isr...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: fedora-kernel-btrfs
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-08 16:39 UTC by Andreas Lorenz
Modified: 2020-12-28 12:46 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-12-10 23:28:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (251.02 KB, text/plain)
2020-12-08 16:39 UTC, Andreas Lorenz
no flags Details
journalctl -k (567.84 KB, text/plain)
2020-12-08 17:02 UTC, Andreas Lorenz
no flags Details

Description Andreas Lorenz 2020-12-08 16:39:14 UTC
Created attachment 1737686 [details]
dmesg

so was running 5.9.10, got some (3) unrecoverable errors, thought that it was kernel related (switched over to -mdup) but at the same time running 5.8.13.

after the pc ran for multiple days (2-3) I rebooted as I accumulated a few updates which needed restarts, and the system never came back up.

booting into a live fedora 33, tried to mount it, only working with "$ sudo btrfs rescue zero-log /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b".

Comment 1 Andreas Lorenz 2020-12-08 17:02:03 UTC
Created attachment 1737688 [details]
journalctl -k

Comment 2 Chris Murphy 2020-12-08 17:16:46 UTC
snipped from journalctl -k attachment...
either some kind of interruption during fsync from the previous mount? or may also be related to "corrupt 59" events?


Dec 08 10:03:18 localhost-live kernel: BTRFS: device label fedora_localhost-live devid 1 transid 356509 /dev/dm-2 scanned by systemd-udevd (3372)
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): disk space caching is enabled
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): has skinny extents
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): bdev /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b errs: wr 0, rd 0, flush 0, corrupt 59, gen 0
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): enabling ssd optimizations
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): start tree-log replay
Dec 08 10:03:25 localhost-live kernel: ------------[ cut here ]------------
Dec 08 10:03:25 localhost-live kernel: WARNING: CPU: 3 PID: 3232 at fs/btrfs/extent-tree.c:3060 __btrfs_free_extent.isra.0+0x5fd/0x8d0
Dec 08 10:03:25 localhost-live kernel: Modules linked in: dm_crypt uinput nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 ib_uverbs rdma_cm iw_cm ib_cm ib_core amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec iommu_v2 edac_mce_amd snd_hda_core gpu_sched ttm snd_hwdep kvm_amd snd_seq drm_kms_helper kvm snd_seq_device snd_pcm cec snd_timer irqbypass snd soundcore pcspkr rapl wmi_bmof sp5100_tco i2c_piix4 k10temp gpio_amdpt gpio_generic
Dec 08 10:03:25 localhost-live kernel:  acpi_cpufreq drm zram ip_tables nls_utf8 isofs squashfs crct10dif_pclmul crc32_pclmul crc32c_intel igb nvme ghash_clmulni_intel nvme_core ccp uas i2c_algo_bit dca usb_storage wmi video pinctrl_amd sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi loop scsi_transport_iscsi fuse
Dec 08 10:03:25 localhost-live kernel: CPU: 3 PID: 3232 Comm: pool-udisksd Not tainted 5.8.15-301.fc33.x86_64 #1
Dec 08 10:03:25 localhost-live kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO/B450 AORUS PRO-CF, BIOS F60c 10/29/2020
Dec 08 10:03:25 localhost-live kernel: RIP: 0010:__btrfs_free_extent.isra.0+0x5fd/0x8d0
Dec 08 10:03:25 localhost-live kernel: Code: 24 e8 c7 50 03 00 4c 8b 04 24 4c 89 c7 e8 bb f8 00 00 8b 54 24 10 85 d2 0f 84 15 fe ff ff 48 8b 54 24 40 e9 db fd ff ff 0f 0b <0f> 0b 49 8b 3e e8 e9 5e 00 00 49 89 d9 4c 89 e1 48 89 ea ff b4 24
Dec 08 10:03:25 localhost-live kernel: RSP: 0018:ffffa96dc3bf38d8 EFLAGS: 00010246
Dec 08 10:03:25 localhost-live kernel: RAX: 00000000fffffffe RBX: 0000000000000000 RCX: 0000000000000000
Dec 08 10:03:25 localhost-live kernel: RDX: 00000000fffffffe RSI: 00000000000000c8 RDI: ffff9bb2a2b57398
Dec 08 10:03:25 localhost-live kernel: RBP: 0000002575a34000 R08: 0000000000000000 R09: ffff9bb2a2b2c9a0
Dec 08 10:03:25 localhost-live kernel: R10: 00012ce817e7a320 R11: 0000000000000000 R12: 0000000000000000
Dec 08 10:03:25 localhost-live kernel: R13: 00000000fffffffe R14: ffff9bb2a2b2c9a0 R15: ffff9bb330951068
Dec 08 10:03:25 localhost-live kernel: FS:  00007f47df7fe640(0000) GS:ffff9bb3b06c0000(0000) knlGS:0000000000000000
Dec 08 10:03:25 localhost-live kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 08 10:03:25 localhost-live kernel: CR2: 00007f48561f9000 CR3: 00000003a7892000 CR4: 00000000003406e0
Dec 08 10:03:25 localhost-live kernel: Call Trace:
Dec 08 10:03:25 localhost-live kernel:  ? __btrfs_run_delayed_refs+0xfbc/0x1000
Dec 08 10:03:25 localhost-live kernel:  ? btrfs_merge_delayed_refs+0x19f/0x1d0
Dec 08 10:03:25 localhost-live kernel:  __btrfs_run_delayed_refs+0x6c9/0x1000
Dec 08 10:03:25 localhost-live kernel:  ? iput.part.0+0x5d/0x1c0
Dec 08 10:03:25 localhost-live kernel:  btrfs_run_delayed_refs+0x73/0x200
Dec 08 10:03:25 localhost-live kernel:  commit_cowonly_roots+0xd4/0x2d0
Dec 08 10:03:25 localhost-live kernel:  ? btrfs_qgroup_account_extents+0xb2/0x220
Dec 08 10:03:25 localhost-live kernel:  btrfs_commit_transaction+0x4b1/0xa10
Dec 08 10:03:25 localhost-live kernel:  ? btrfs_recover_log_trees+0x334/0x370
Dec 08 10:03:25 localhost-live kernel:  ? kmem_cache_free+0x83/0x190
Dec 08 10:03:25 localhost-live kernel:  btrfs_recover_log_trees+0x33c/0x370
Dec 08 10:03:25 localhost-live kernel:  ? replay_dir_deletes+0x2b0/0x2b0
Dec 08 10:03:25 localhost-live kernel:  open_ctree+0x12de/0x17a5
Dec 08 10:03:25 localhost-live kernel:  btrfs_mount_root.cold+0x12/0xea
Dec 08 10:03:25 localhost-live kernel:  ? selinux_fs_context_parse_param+0x27/0x60
Dec 08 10:03:25 localhost-live kernel:  legacy_get_tree+0x27/0x40
Dec 08 10:03:25 localhost-live kernel:  vfs_get_tree+0x25/0xb0
Dec 08 10:03:25 localhost-live kernel:  vfs_kern_mount.part.0+0x71/0xb0
Dec 08 10:03:25 localhost-live kernel:  btrfs_mount+0x119/0x36d
Dec 08 10:03:25 localhost-live kernel:  ? cred_has_capability.isra.0+0x68/0x100
Dec 08 10:03:25 localhost-live kernel:  legacy_get_tree+0x27/0x40
Dec 08 10:03:25 localhost-live kernel:  vfs_get_tree+0x25/0xb0
Dec 08 10:03:25 localhost-live kernel:  do_mount+0x7c3/0xaf0
Dec 08 10:03:25 localhost-live kernel:  ? memdup_user+0x4e/0x90
Dec 08 10:03:25 localhost-live kernel:  __x64_sys_mount+0x8e/0xd0
Dec 08 10:03:25 localhost-live kernel:  do_syscall_64+0x4d/0x90
Dec 08 10:03:25 localhost-live kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 08 10:03:25 localhost-live kernel: RIP: 0033:0x7f47f04007de
Dec 08 10:03:25 localhost-live kernel: Code: 48 8b 0d 9d 26 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 26 0c 00 f7 d8 64 89 01 48
Dec 08 10:03:25 localhost-live kernel: RSP: 002b:00007f47df7fd338 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Dec 08 10:03:25 localhost-live kernel: RAX: ffffffffffffffda RBX: 00007f47df7fd4b0 RCX: 00007f47f04007de
Dec 08 10:03:25 localhost-live kernel: RDX: 00007f47cc00c600 RSI: 00007f47e400cfd0 RDI: 00007f47cc003d20
Dec 08 10:03:25 localhost-live kernel: RBP: 00007f47cc005490 R08: 0000000000000000 R09: 00007f47cc000080
Dec 08 10:03:25 localhost-live kernel: R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
Dec 08 10:03:25 localhost-live kernel: R13: 00007f47cc003d20 R14: 00007f47cc00c600 R15: 0000000000000000
Dec 08 10:03:25 localhost-live kernel: ---[ end trace 1b51d2524043b144 ]---

Comment 3 Andreas Lorenz 2020-12-08 17:21:43 UTC
$ sudo btrfs check --readonly /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b
UUID: 7d999a7d-6d7d-4a45-814a-30dd69d9509e
[1/7] checking root items
[2/7] checking extents
tree backref 160506118144 parent 257 root 257 not found in extent tree
backref 160506118144 root 4398046511361 not referenced back 0x559743a79340
incorrect global backref count on 160506118144 found 2 wanted 1
backpointer mismatch on [160506118144 16384]
tree backref 160887422976 parent 7 root 7 not found in extent tree
backref 160887422976 root 4398046511111 not referenced back 0x5597438cb380
incorrect global backref count on 160887422976 found 2 wanted 1
backpointer mismatch on [160887422976 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 230738829312 bytes used, error(s) found
total csum bytes: 197964660
total tree bytes: 658178048
total fs tree bytes: 382402560
total extent tree bytes: 35241984
btree space waste bytes: 106420684
file data blocks allocated: 1418050183168
 referenced 229519929344

Comment 4 Chris Murphy 2020-12-08 17:35:21 UTC
Scrub is failing

        $ journalctl -fk
        -- Logs begin at Tue 2020-12-08 12:06:20 EST. --
        Dec 08 12:22:32 localhost-live kernel:  kthread+0x11b/0x140
        Dec 08 12:22:32 localhost-live kernel:  ? __kthread_bind_mask+0x60/0x60
        Dec 08 12:22:32 localhost-live kernel:  ret_from_fork+0x22/0x30
        Dec 08 12:22:32 localhost-live kernel: ---[ end trace 6e22b497c50379ce ]---
        Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:3066: errno=-2 No such entry
        Dec 08 12:22:32 localhost-live kernel: BTRFS info (device dm-2): forced readonly
        Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2173: errno=-2 No such entry
        Dec 08 12:22:32 localhost-live kernel: BTRFS warning (device dm-2): Skipping commit of aborted transaction.
        Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in cleanup_transaction:1898: errno=-2 No such entry
        Dec 08 12:22:32 localhost-live kernel: BTRFS info (device dm-2): scrub: not finished on devid 1 with status: -125

Comment 5 Andreas Lorenz 2020-12-08 17:42:57 UTC
$ sudo btrfs check --check-data-csum /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b
Opening filesystem to check...
Checking filesystem on /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b
UUID: 7d999a7d-6d7d-4a45-814a-30dd69d9509e
[1/7] checking root items
[2/7] checking extents
tree backref 160506118144 parent 257 root 257 not found in extent tree
backref 160506118144 root 4398046511361 not referenced back 0x56299bf60340
incorrect global backref count on 160506118144 found 2 wanted 1
backpointer mismatch on [160506118144 16384]
tree backref 160887422976 parent 7 root 7 not found in extent tree
backref 160887422976 root 4398046511111 not referenced back 0x56299bdb2380
incorrect global backref count on 160887422976 found 2 wanted 1
backpointer mismatch on [160887422976 16384]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking csums against data
mirror 1 bytenr 2700337152 csum 33 expected csum 225
mirror 1 bytenr 2831822848 csum 132 expected csum 9
mirror 1 bytenr 3146625024 csum 207 expected csum 185
mirror 1 bytenr 7110606848 csum 18 expected csum 18
ERROR: errors found in csum tree
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 230738829312 bytes used, error(s) found
total csum bytes: 197964660
total tree bytes: 658178048
total fs tree bytes: 382402560
total extent tree bytes: 35241984
btree space waste bytes: 106420684
file data blocks allocated: 1418050183168
 referenced 229519929344

Comment 6 Andreas Lorenz 2020-12-08 17:54:21 UTC
$ smartctl -i /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.15-301.fc33.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/nvme0n1 failed: Permission denied
[liveuser@localhost-live ~]$ sudo smartctl -i /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.15-301.fc33.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO 500GB
Serial Number:                      S466NB0K630255X
Firmware Version:                   2B2QEXE7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 500,107,862,016 [500 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Utilization:            237,226,782,720 [237 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5681b093f8
Local Time is:                      Tue Dec  8 12:53:33 2020 EST

Comment 7 Josef Bacik 2020-12-08 20:01:16 UTC
First things first, mismatched csums so you may have some bad memory, run a memtest and get those replaced because otherwise they could wreak more havoc.

Get a backup of the fs, and then run btrfs check --repair to repair the extent tree issues, and then re-check to make sure it fixed everything, then you should be good to go.

Comment 8 Andreas Lorenz 2020-12-10 23:28:17 UTC
As it turned out, my RAM was failing.

Comment 9 Chris Murphy 2020-12-11 00:00:54 UTC
Sorry about that, but yeah good catch!

It's a bit tricky to find this one in the attached journal. It's right after the call trace, in the leaf dump.

Dec 08 10:03:25 localhost-live kernel: ---[ end trace 1b51d2524043b144 ]---
Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): leaf 161062354944 gen 356510 total ptrs 226 free space 3175 owner 2
Dec 08 10:03:25 localhost-live kernel:         item 0 key (160887422976 169 0) itemoff 16250 itemsize 33
Dec 08 10:03:25 localhost-live kernel:                 extent refs 1 gen 356504 flags 2
Dec 08 10:03:25 localhost-live kernel:                 ref#0: tree block backref root 4398046511111
Dec 08 10:03:25 localhost-live kernel:         item 1 key (160887472128 169 0) itemoff 16217 itemsize 33
Dec 08 10:03:25 localhost-live kernel:                 extent refs 1 gen 356504 flags 2
Dec 08 10:03:25 localhost-live kernel:                 ref#0: tree block backref root 7

The suspicious line is the gigantic root in item 0.

$ printf '%x %x\n' 4398046511111
40000000007 0

That's hex and is a bit flip. More obvious in binary.
1000000000000000000000000000000000000000111

Flip that far left bit back to 0 and it's 7, which is the correct valure. I think btrfs --repair should be able to fix this, and then hopefully you can do a scrub. The online scrub will print out the full path to corrupt files. Just delete those and then replace with known good copies and you should be good.

Comment 10 Chris Murphy 2020-12-21 17:48:01 UTC
Andreas, could you confirm the history of this file system? The oldest kernel that was writing to it is 5.8 series? Thanks!

Comment 11 Andreas Lorenz 2020-12-28 12:46:10 UTC
Yes, as this was a new installed system. Before that I had Ubuntu 20.04 with its 5.4 kernel.


Note You need to log in before you can comment on or make changes to this bug.