Bug 1892166
Summary: | btrfsck segfaults on root 5 missing its root dir, recreating | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Sandino Araico Sánchez <sandino> | ||||||||||
Component: | btrfs-progs | Assignee: | Josef Bacik <josef> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 33 | CC: | bugzilla, esandeen, igor.raits, josef, ngompa13 | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | btrfs-progs-5.10-1.fc33 btrfs-progs-5.10-1.fc32 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2021-01-22 01:33:03 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Sandino Araico Sánchez
2020-10-28 04:59:50 UTC
Created attachment 1724696 [details]
btrfsck log
btrfsck --init-extent-tree --backup --progress /dev/sdc4 > btrfsck.log 2>&1
core dump in https://upload.1101.mx/core.btrfsck.0.980c24f22eca4a508bd80e5975729f24.1663.1603859962000000.zst 214 [root@fedora src] gdb /usr/sbin/btrfsck core.btrfsck.0.980c24f22eca4a508bd80e5975729f24.1663.1603859962000000 GNU gdb (GDB) Fedora 9.2-7.fc33 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/btrfsck... Reading symbols from /usr/lib/debug/usr/sbin/btrfs-5.7-5.fc33.x86_64.debug... warning: core file may not match specified executable file. [New LWP 1663] [New LWP 1672] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `btrfsck --init-extent-tree --backup --progress /dev/sdc4'. Program terminated with signal SIGSEGV, Segmentation fault. #0 btrfs_search_slot (trans=0x5567c68fba70, root=0x0, key=0x7ffc3af03480, p=0x5567d96a20a0, ins_len=0, cow=0) at ctree.c:1275 1275 struct btrfs_fs_info *fs_info = root->fs_info; [Current thread is 1 (Thread 0x7fb0d68b58c0 (LWP 1663))] (gdb) bt #0 btrfs_search_slot (trans=0x5567c68fba70, root=0x0, key=0x7ffc3af03480, p=0x5567d96a20a0, ins_len=0, cow=0) at ctree.c:1275 #1 0x00005567c4f13f53 in btrfs_device_avail_bytes (avail_bytes=<synthetic pointer>, device=0x5567c5fa3d80, trans=0x5567c68fba70) at volumes.c:936 #2 btrfs_alloc_chunk (trans=0x5567c68fba70, info=0x5567c5fabc40, start=0x7ffc3af03518, num_bytes=0x7ffc3af03510, type=20) at volumes.c:1122 #3 0x00005567c4f070b9 in do_chunk_alloc (trans=trans@entry=0x5567c68fba70, fs_info=fs_info@entry=0x5567c5fabc40, alloc_bytes=alloc_bytes@entry=2113536, flags=flags@entry=20) at extent-tree.c:1728 #4 0x00005567c4f072b1 in btrfs_reserve_extent (trans=trans@entry=0x5567c68fba70, root=root@entry=0x5567c60466d0, num_bytes=16384, empty_size=empty_size@entry=0, hint_byte=hint_byte@entry=5443871047680, search_end=search_end@entry=18446744073709551615, ins=0x7ffc3af03730, is_data=false) at extent-tree.c:2337 #5 0x00005567c4f07f45 in alloc_tree_block (flags=0, search_end=18446744073709551615, generation=<optimized out>, ins=0x7ffc3af03730, hint_byte=5443871047680, empty_size=0, level=0, key=0x7ffc3af037d0, root_objectid=5, num_bytes=<optimized out>, root=0x5567c60466d0, trans=0x5567c68fba70) at extent-tree.c:2451 #6 btrfs_alloc_free_block (trans=0x5567c68fba70, root=0x5567c60466d0, blocksize=<optimized out>, root_objectid=5, key=0x7ffc3af037d0, level=0, hint=5443871047680, empty_size=0) at extent-tree.c:2529 #7 0x00005567c4efa6d3 in __btrfs_cow_block (trans=trans@entry=0x5567c68fba70, root=root@entry=0x5567c60466d0, buf=buf@entry=0x5567c6046920, parent=parent@entry=0x0, parent_slot=0, cow_ret=cow_ret@entry=0x7ffc3af03a48, search_start=5443871047680, empty_size=0) at ctree.c:428 #8 0x00005567c4efaf42 in btrfs_cow_block (trans=trans@entry=0x5567c68fba70, root=root@entry=0x5567c60466d0, buf=0x5567c6046920, parent=0x0, parent_slot=<optimized out>, cow_ret=cow_ret@entry=0x7ffc3af03a48) at ctree.c:521 #9 0x00005567c4efd049 in btrfs_search_slot (trans=<optimized out>, root=root@entry=0x5567c60466d0, key=key@entry=0x7ffc3af03c20, p=p@entry=0x5567c5fa5480, ins_len=ins_len@entry=185, cow=cow@entry=1) at ctree.c:1288 #10 0x00005567c4eff159 in btrfs_insert_empty_items (trans=trans@entry=0x5567c68fba70, root=root@entry=0x5567c60466d0, path=path@entry=0x5567c5fa5480, cpu_key=cpu_key@entry=0x7ffc3af03c20, data_size=data_size@entry=0x7ffc3af03bd4, nr=nr@entry=1) at ctree.c:2742 #11 0x00005567c4eff578 in btrfs_insert_empty_item (data_size=<optimized out>, key=0x7ffc3af03c20, path=0x5567c5fa5480, root=0x5567c60466d0, trans=0x5567c68fba70) at ctree.h:2695 #12 btrfs_insert_item (trans=0x5567c68fba70, root=0x5567c60466d0, cpu_key=0x7ffc3af03c20, data=0x7ffc3af03c40, data_size=160) at ctree.c:2841 #13 0x00005567c4f1f464 in btrfs_insert_inode (inode_item=0x7ffc3af03c40, objectid=256, root=0x5567c60466d0, trans=0x5567c68fba70) at kernel-shared/inode-item.c:143 #14 btrfs_make_root_dir (trans=0x5567c68fba70, root=0x5567c60466d0, objectid=256) at common/utils.c:89 #15 0x00005567c4ec5441 in check_inode_recs (root=0x5567c60466d0, inode_cache=<optimized out>) at check/main.c:3015 #16 0x00005567c4ecf81e in check_fs_root (wc=0x7ffc3af04020, root_cache=0x7ffc3af03ed8, root=<optimized out>) at check/main.c:3692 #17 check_fs_roots (root_cache=0x7ffc3af03ed8, fs_info=<optimized out>) at check/main.c:3771 #18 do_check_fs_roots (root_cache=0x7ffc3af03ed8, fs_info=<optimized out>) at check/main.c:3888 #19 cmd_check (cmd=<optimized out>, argc=<optimized out>, argv=<optimized out>) at check/main.c:10393 #20 0x00005567c4e9edd0 in cmd_execute (argv=0x7ffc3af04458, argc=5, cmd=0x5567c4f761e0 <cmd_struct_check>) at cmds/commands.h:125 #21 main (argc=5, argv=0x7ffc3af04458) at btrfs.c:402 I've reported this upstream. It's a bug and shouldn't crash, at least it should fail safe and gracefully. However, I can't tell what the history of the file system is, what happened, what was attempted prior to --init-extent-tree which is a big hammer. >parent transid verify failed on 7715066789888 wanted 101433 found 102169 These are usually the result of a device failing to honor btrfs write ordering. There's over 700 commits separating them. Was there a crash or power failure preceding the need to run this command? It's also possible these errors can happen if one device is missing writes, and needs to be caught up with a scrub. >warning, device 6 is missing Is this device in a degraded state now? It's possible this will prevent self healing since there may only be one copy of metadata. 'btrfs dev stats' for each device (or just the mount point if it will at least mount -o ro,degraded) would be useful; as well as dmesg for any failed mount attempt. If this file system mounts successfully with -o degraded (i.e. read-write), my suggestion is to 'btrfs replace start 6 /dev/newdevice /mnt' and get it replaced. Then once it's finished rebuilding the missing devices, scrub the file system. After that's done, it can be unmounted and checked with 'btrfs check --readonly' but I do not recommend --repair until it's known what the problems are first. Upstream thread: https://lore.kernel.org/linux-btrfs/CAJCQCtQqJY7vSVsQeRP82K1x9VtSYUHK1zmnpfXrtJKFbcYxJQ@mail.gmail.com/T/#u It's a very damage filesystem. Power failures in the middle of a device remove rebalance, repeated device failures in the middle of attempts ro revert the rebalance, superblock damaged, the filesystem does not mount any more. The failed disk image was recovered partialy so I will not include it for now, however I would like to see the btrfs scrub doing it's magic on such a damaged scenario. The filesystem contains backups. Some of them are too old I would like to get them back, but I can live without them. I have no problem with the fsck missing unfixable parts of the filesystem as long as it gets me to a consistent (mountable) state. When you get a chance, please attach the output from three things: 1. btrfs check --readonly /dev/sdXY ## any of the devices for this volume, only one needs checked, the check tool finds all the others 2. btrfs check --mode lowmem --readonly /dev/sdXY ## this will take a long time, possibly more than a day with a file system of this size; but it's a different implementation than the normal mode 3. mount -o ro,usebackuproot,degraded /dev/sdXY dmesg ## Disconnect the faulty device that was being removed; I guess that's devid 6 that's already missing. And try to mount with these options; and if it fails, attach dmesg for the failure. The best chance for recovery in the near future is 'btrfs restore'. It's quite tedious to use, but has a high success rate. This amounts to offline scrape of your data, and once satisfied, start over with a new file system. There's often help on #fedora and #btrfs on irc.freenode. Future feature tentatively planned for kernel 5.11 is a more tolerant read-only rescue mount option for getting data off with normal tools. Significantly damaged Btrfs file systems are difficult to repair so I couldn't count on that happening soon. Of course, crashing is still a bug. Created attachment 1725257 [details]
btrfs check --readonly /dev/sdXY
2. btrfs check --mode lowmem --readonly /dev/sdXY 4.9 GB file uploaded to upload.1101.mx/btrfs-check--mode-lowmem--readonly-sdc4 Created attachment 1725393 [details]
mount -o ro,usebackuproot,degraded /dev/sdXY
Ok so ~12000+ snapshots it looks like. That complicates the repair, in particular init-extent-tree since all of those file btrees have to be walked to do the extent tree reconstruction. Depending on the RAM and metadata (block groups) size, it could take a long time indeed. What's the status of the missing drive? Have you tried cloning it to a new drive? Can you then 'mount' without any options? If not, I'd like to see dmesg for that failure. And also try 'mount -o ro,usebackuproot', and see dmesg for that too if it doesn't work. Thing is, so long as there aren't two bad copies of the same fs metadata, Btrfs can recover and self-heal. There might be something useful on the other drive still. Also, when was the last time the file system was scrubbed? It should get a scrub after each power fail or crash to make sure any corruption or stale metadata is fixed up. >[69325.537853] BTRFS error (device sdd4): chunk 7716140482560 has missing dev extent, have 0 expect 2
That's a problem. It's possible the only copy of this missing dev extent is on the missing drive.
Can you post the output from:
$ btrfs rescue super -v /dev/any
## only needs to be run on one device for this file system, it'll find the others
$ btrfs insp dump-s -Ffa /dev/each
## this needs to be run one time on every device in the file system; they can all get dumped in one text file though and attached to the bug
FWIW these are all read-only commands and do not make changes to the file system. Changes can make things worse.
btrfs rescue super -v /dev/sdc4 All Devices: Device: id = 1, name = /dev/sdd4 Device: id = 7, name = /dev/sdb4 Device: id = 8, name = /dev/sdc4 Before Recovering: [All good supers]: device name = /dev/sdd4 superblock bytenr = 65536 device name = /dev/sdd4 superblock bytenr = 67108864 device name = /dev/sdd4 superblock bytenr = 274877906944 device name = /dev/sdb4 superblock bytenr = 65536 device name = /dev/sdb4 superblock bytenr = 67108864 device name = /dev/sdb4 superblock bytenr = 274877906944 device name = /dev/sdc4 superblock bytenr = 65536 device name = /dev/sdc4 superblock bytenr = 67108864 device name = /dev/sdc4 superblock bytenr = 274877906944 [All bad supers]: All supers are valid, no need to recover Created attachment 1727280 [details]
btrfs insp dump-s -Ffa /dev/each
FEDORA-2021-3c30d1d273 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-3c30d1d273 FEDORA-2021-3c30d1d273 has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-3c30d1d273` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-3c30d1d273 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-002891b7f1 has been pushed to the Fedora 32 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-002891b7f1` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-002891b7f1 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2021-3c30d1d273 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report. I am sorry, I had to use the disks with the corrupted filesystem for another project. I am no longer able to verify the fixed version against the corrupted filesystem. I will try to reproduce the corruption when I get the disks back. FEDORA-2021-002891b7f1 has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report. |