Created attachment 1737686 [details] dmesg so was running 5.9.10, got some (3) unrecoverable errors, thought that it was kernel related (switched over to -mdup) but at the same time running 5.8.13. after the pc ran for multiple days (2-3) I rebooted as I accumulated a few updates which needed restarts, and the system never came back up. booting into a live fedora 33, tried to mount it, only working with "$ sudo btrfs rescue zero-log /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b".
Created attachment 1737688 [details] journalctl -k
snipped from journalctl -k attachment... either some kind of interruption during fsync from the previous mount? or may also be related to "corrupt 59" events? Dec 08 10:03:18 localhost-live kernel: BTRFS: device label fedora_localhost-live devid 1 transid 356509 /dev/dm-2 scanned by systemd-udevd (3372) Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): disk space caching is enabled Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): has skinny extents Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): bdev /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b errs: wr 0, rd 0, flush 0, corrupt 59, gen 0 Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): enabling ssd optimizations Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): start tree-log replay Dec 08 10:03:25 localhost-live kernel: ------------[ cut here ]------------ Dec 08 10:03:25 localhost-live kernel: WARNING: CPU: 3 PID: 3232 at fs/btrfs/extent-tree.c:3060 __btrfs_free_extent.isra.0+0x5fd/0x8d0 Dec 08 10:03:25 localhost-live kernel: Modules linked in: dm_crypt uinput nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set rfkill nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 ib_uverbs rdma_cm iw_cm ib_cm ib_core amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec iommu_v2 edac_mce_amd snd_hda_core gpu_sched ttm snd_hwdep kvm_amd snd_seq drm_kms_helper kvm snd_seq_device snd_pcm cec snd_timer irqbypass snd soundcore pcspkr rapl wmi_bmof sp5100_tco i2c_piix4 k10temp gpio_amdpt gpio_generic Dec 08 10:03:25 localhost-live kernel: acpi_cpufreq drm zram ip_tables nls_utf8 isofs squashfs crct10dif_pclmul crc32_pclmul crc32c_intel igb nvme ghash_clmulni_intel nvme_core ccp uas i2c_algo_bit dca usb_storage wmi video pinctrl_amd sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi loop scsi_transport_iscsi fuse Dec 08 10:03:25 localhost-live kernel: CPU: 3 PID: 3232 Comm: pool-udisksd Not tainted 5.8.15-301.fc33.x86_64 #1 Dec 08 10:03:25 localhost-live kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS PRO/B450 AORUS PRO-CF, BIOS F60c 10/29/2020 Dec 08 10:03:25 localhost-live kernel: RIP: 0010:__btrfs_free_extent.isra.0+0x5fd/0x8d0 Dec 08 10:03:25 localhost-live kernel: Code: 24 e8 c7 50 03 00 4c 8b 04 24 4c 89 c7 e8 bb f8 00 00 8b 54 24 10 85 d2 0f 84 15 fe ff ff 48 8b 54 24 40 e9 db fd ff ff 0f 0b <0f> 0b 49 8b 3e e8 e9 5e 00 00 49 89 d9 4c 89 e1 48 89 ea ff b4 24 Dec 08 10:03:25 localhost-live kernel: RSP: 0018:ffffa96dc3bf38d8 EFLAGS: 00010246 Dec 08 10:03:25 localhost-live kernel: RAX: 00000000fffffffe RBX: 0000000000000000 RCX: 0000000000000000 Dec 08 10:03:25 localhost-live kernel: RDX: 00000000fffffffe RSI: 00000000000000c8 RDI: ffff9bb2a2b57398 Dec 08 10:03:25 localhost-live kernel: RBP: 0000002575a34000 R08: 0000000000000000 R09: ffff9bb2a2b2c9a0 Dec 08 10:03:25 localhost-live kernel: R10: 00012ce817e7a320 R11: 0000000000000000 R12: 0000000000000000 Dec 08 10:03:25 localhost-live kernel: R13: 00000000fffffffe R14: ffff9bb2a2b2c9a0 R15: ffff9bb330951068 Dec 08 10:03:25 localhost-live kernel: FS: 00007f47df7fe640(0000) GS:ffff9bb3b06c0000(0000) knlGS:0000000000000000 Dec 08 10:03:25 localhost-live kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 08 10:03:25 localhost-live kernel: CR2: 00007f48561f9000 CR3: 00000003a7892000 CR4: 00000000003406e0 Dec 08 10:03:25 localhost-live kernel: Call Trace: Dec 08 10:03:25 localhost-live kernel: ? __btrfs_run_delayed_refs+0xfbc/0x1000 Dec 08 10:03:25 localhost-live kernel: ? btrfs_merge_delayed_refs+0x19f/0x1d0 Dec 08 10:03:25 localhost-live kernel: __btrfs_run_delayed_refs+0x6c9/0x1000 Dec 08 10:03:25 localhost-live kernel: ? iput.part.0+0x5d/0x1c0 Dec 08 10:03:25 localhost-live kernel: btrfs_run_delayed_refs+0x73/0x200 Dec 08 10:03:25 localhost-live kernel: commit_cowonly_roots+0xd4/0x2d0 Dec 08 10:03:25 localhost-live kernel: ? btrfs_qgroup_account_extents+0xb2/0x220 Dec 08 10:03:25 localhost-live kernel: btrfs_commit_transaction+0x4b1/0xa10 Dec 08 10:03:25 localhost-live kernel: ? btrfs_recover_log_trees+0x334/0x370 Dec 08 10:03:25 localhost-live kernel: ? kmem_cache_free+0x83/0x190 Dec 08 10:03:25 localhost-live kernel: btrfs_recover_log_trees+0x33c/0x370 Dec 08 10:03:25 localhost-live kernel: ? replay_dir_deletes+0x2b0/0x2b0 Dec 08 10:03:25 localhost-live kernel: open_ctree+0x12de/0x17a5 Dec 08 10:03:25 localhost-live kernel: btrfs_mount_root.cold+0x12/0xea Dec 08 10:03:25 localhost-live kernel: ? selinux_fs_context_parse_param+0x27/0x60 Dec 08 10:03:25 localhost-live kernel: legacy_get_tree+0x27/0x40 Dec 08 10:03:25 localhost-live kernel: vfs_get_tree+0x25/0xb0 Dec 08 10:03:25 localhost-live kernel: vfs_kern_mount.part.0+0x71/0xb0 Dec 08 10:03:25 localhost-live kernel: btrfs_mount+0x119/0x36d Dec 08 10:03:25 localhost-live kernel: ? cred_has_capability.isra.0+0x68/0x100 Dec 08 10:03:25 localhost-live kernel: legacy_get_tree+0x27/0x40 Dec 08 10:03:25 localhost-live kernel: vfs_get_tree+0x25/0xb0 Dec 08 10:03:25 localhost-live kernel: do_mount+0x7c3/0xaf0 Dec 08 10:03:25 localhost-live kernel: ? memdup_user+0x4e/0x90 Dec 08 10:03:25 localhost-live kernel: __x64_sys_mount+0x8e/0xd0 Dec 08 10:03:25 localhost-live kernel: do_syscall_64+0x4d/0x90 Dec 08 10:03:25 localhost-live kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 08 10:03:25 localhost-live kernel: RIP: 0033:0x7f47f04007de Dec 08 10:03:25 localhost-live kernel: Code: 48 8b 0d 9d 26 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 26 0c 00 f7 d8 64 89 01 48 Dec 08 10:03:25 localhost-live kernel: RSP: 002b:00007f47df7fd338 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 Dec 08 10:03:25 localhost-live kernel: RAX: ffffffffffffffda RBX: 00007f47df7fd4b0 RCX: 00007f47f04007de Dec 08 10:03:25 localhost-live kernel: RDX: 00007f47cc00c600 RSI: 00007f47e400cfd0 RDI: 00007f47cc003d20 Dec 08 10:03:25 localhost-live kernel: RBP: 00007f47cc005490 R08: 0000000000000000 R09: 00007f47cc000080 Dec 08 10:03:25 localhost-live kernel: R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 Dec 08 10:03:25 localhost-live kernel: R13: 00007f47cc003d20 R14: 00007f47cc00c600 R15: 0000000000000000 Dec 08 10:03:25 localhost-live kernel: ---[ end trace 1b51d2524043b144 ]---
$ sudo btrfs check --readonly /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b Opening filesystem to check... Checking filesystem on /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b UUID: 7d999a7d-6d7d-4a45-814a-30dd69d9509e [1/7] checking root items [2/7] checking extents tree backref 160506118144 parent 257 root 257 not found in extent tree backref 160506118144 root 4398046511361 not referenced back 0x559743a79340 incorrect global backref count on 160506118144 found 2 wanted 1 backpointer mismatch on [160506118144 16384] tree backref 160887422976 parent 7 root 7 not found in extent tree backref 160887422976 root 4398046511111 not referenced back 0x5597438cb380 incorrect global backref count on 160887422976 found 2 wanted 1 backpointer mismatch on [160887422976 16384] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 230738829312 bytes used, error(s) found total csum bytes: 197964660 total tree bytes: 658178048 total fs tree bytes: 382402560 total extent tree bytes: 35241984 btree space waste bytes: 106420684 file data blocks allocated: 1418050183168 referenced 229519929344
Scrub is failing $ journalctl -fk -- Logs begin at Tue 2020-12-08 12:06:20 EST. -- Dec 08 12:22:32 localhost-live kernel: kthread+0x11b/0x140 Dec 08 12:22:32 localhost-live kernel: ? __kthread_bind_mask+0x60/0x60 Dec 08 12:22:32 localhost-live kernel: ret_from_fork+0x22/0x30 Dec 08 12:22:32 localhost-live kernel: ---[ end trace 6e22b497c50379ce ]--- Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in __btrfs_free_extent:3066: errno=-2 No such entry Dec 08 12:22:32 localhost-live kernel: BTRFS info (device dm-2): forced readonly Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2173: errno=-2 No such entry Dec 08 12:22:32 localhost-live kernel: BTRFS warning (device dm-2): Skipping commit of aborted transaction. Dec 08 12:22:32 localhost-live kernel: BTRFS: error (device dm-2) in cleanup_transaction:1898: errno=-2 No such entry Dec 08 12:22:32 localhost-live kernel: BTRFS info (device dm-2): scrub: not finished on devid 1 with status: -125
$ sudo btrfs check --check-data-csum /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b Opening filesystem to check... Checking filesystem on /dev/mapper/luks-66adbb73-1374-47ec-9c56-c430aecd957b UUID: 7d999a7d-6d7d-4a45-814a-30dd69d9509e [1/7] checking root items [2/7] checking extents tree backref 160506118144 parent 257 root 257 not found in extent tree backref 160506118144 root 4398046511361 not referenced back 0x56299bf60340 incorrect global backref count on 160506118144 found 2 wanted 1 backpointer mismatch on [160506118144 16384] tree backref 160887422976 parent 7 root 7 not found in extent tree backref 160887422976 root 4398046511111 not referenced back 0x56299bdb2380 incorrect global backref count on 160887422976 found 2 wanted 1 backpointer mismatch on [160887422976 16384] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots [5/7] checking csums against data mirror 1 bytenr 2700337152 csum 33 expected csum 225 mirror 1 bytenr 2831822848 csum 132 expected csum 9 mirror 1 bytenr 3146625024 csum 207 expected csum 185 mirror 1 bytenr 7110606848 csum 18 expected csum 18 ERROR: errors found in csum tree [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) found 230738829312 bytes used, error(s) found total csum bytes: 197964660 total tree bytes: 658178048 total fs tree bytes: 382402560 total extent tree bytes: 35241984 btree space waste bytes: 106420684 file data blocks allocated: 1418050183168 referenced 229519929344
$ smartctl -i /dev/nvme0n1 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.15-301.fc33.x86_64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org Smartctl open device: /dev/nvme0n1 failed: Permission denied [liveuser@localhost-live ~]$ sudo smartctl -i /dev/nvme0n1 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.15-301.fc33.x86_64] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO 500GB Serial Number: S466NB0K630255X Firmware Version: 2B2QEXE7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 500,107,862,016 [500 GB] Unallocated NVM Capacity: 0 Controller ID: 4 Number of Namespaces: 1 Namespace 1 Size/Capacity: 500,107,862,016 [500 GB] Namespace 1 Utilization: 237,226,782,720 [237 GB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5681b093f8 Local Time is: Tue Dec 8 12:53:33 2020 EST
First things first, mismatched csums so you may have some bad memory, run a memtest and get those replaced because otherwise they could wreak more havoc. Get a backup of the fs, and then run btrfs check --repair to repair the extent tree issues, and then re-check to make sure it fixed everything, then you should be good to go.
As it turned out, my RAM was failing.
Sorry about that, but yeah good catch! It's a bit tricky to find this one in the attached journal. It's right after the call trace, in the leaf dump. Dec 08 10:03:25 localhost-live kernel: ---[ end trace 1b51d2524043b144 ]--- Dec 08 10:03:25 localhost-live kernel: BTRFS info (device dm-2): leaf 161062354944 gen 356510 total ptrs 226 free space 3175 owner 2 Dec 08 10:03:25 localhost-live kernel: item 0 key (160887422976 169 0) itemoff 16250 itemsize 33 Dec 08 10:03:25 localhost-live kernel: extent refs 1 gen 356504 flags 2 Dec 08 10:03:25 localhost-live kernel: ref#0: tree block backref root 4398046511111 Dec 08 10:03:25 localhost-live kernel: item 1 key (160887472128 169 0) itemoff 16217 itemsize 33 Dec 08 10:03:25 localhost-live kernel: extent refs 1 gen 356504 flags 2 Dec 08 10:03:25 localhost-live kernel: ref#0: tree block backref root 7 The suspicious line is the gigantic root in item 0. $ printf '%x %x\n' 4398046511111 40000000007 0 That's hex and is a bit flip. More obvious in binary. 1000000000000000000000000000000000000000111 Flip that far left bit back to 0 and it's 7, which is the correct valure. I think btrfs --repair should be able to fix this, and then hopefully you can do a scrub. The online scrub will print out the full path to corrupt files. Just delete those and then replace with known good copies and you should be good.
Andreas, could you confirm the history of this file system? The oldest kernel that was writing to it is 5.8 series? Thanks!
Yes, as this was a new installed system. Before that I had Ubuntu 20.04 with its 5.4 kernel.