Description of problem: HDD developed few bad sectors. Trying to xfs_repair (both original disc and dd_rescue'd copy) crashes xfs_repair: … Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... rebuilding directory inode 134218179 bad hash table for directory inode 134644569 (no data entry): rebuilding rebuilding directory inode 134644569 rebuilding directory inode 135789656 xfs_repair: phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. Aborted (core dumped) Backtrace: (gdb) bt #0 0x00007ffff7dac5cf in raise () from /lib64/libc.so.6 #1 0x00007ffff7d96895 in abort () from /lib64/libc.so.6 #2 0x00007ffff7d96769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x00007ffff7da49f6 in __assert_fail () from /lib64/libc.so.6 #4 0x00005555555797ef in longform_dir2_rebuild (hashtab=<optimized out>, ino_offset=24, irec=<optimized out>, ip=0x5555556f5270, ino=135789656, mp=<optimized out>) at phase6.c:1410 #5 longform_dir2_entry_check (hashtab=<optimized out>, ino_offset=24, irec=<optimized out>, need_dot=0x7fffffffd608, num_illegal=0x7fffffffd610, ip=0x5555556f5270, ino=135789656, mp=<optimized out>) at phase6.c:2481 #6 process_dir_inode (mp=<optimized out>, agno=agno@entry=1, irec=irec@entry=0x7fffd818a310, ino_offset=ino_offset@entry=24) at phase6.c:2983 #7 0x0000555555579ab2 in traverse_function (wq=0x7fffffffdb00, agno=1, arg=0x55555567f050) at phase6.c:3254 #8 0x000055555557dff5 in prefetch_ag_range (work=0x7fffffffdb00, start_ag=<optimized out>, end_ag=4, dirs_only=true, func=0x555555579a10 <traverse_function>) at prefetch.c:964 #9 0x000055555557fa25 in do_inode_prefetch (mp=0x7fffffffdf70, stride=0, func=0x555555579a10 <traverse_function>, check_cache=<optimized out>, dirs_only=true) at prefetch.c:1027 #10 0x000055555557acd4 in traverse_ags (mp=0x7fffffffdf70) at phase6.c:3372 #11 phase6 (mp=0x7fffffffdf70) at phase6.c:3372 #12 0x000055555555ac2e in main (argc=<optimized out>, argv=<optimized out>) at xfs_repair.c:949 Version-Release number of selected component (if applicable): xfsprogs-4.18.0-1.fc30.x86_64 I'm happy to run any further diagnostic, but I cannot share the disk image.
Any chance you can share an xfs_metadump image, which obfuscates nearly all metadata and zeros out unused portions of sectors?
(and contains no data blocks at all)
Created attachment 1495716 [details] vdb5.xfs_metadump.xz % xfs_metadump -g /dev/vdb5 vdb5.xfs_metadump Copied 90112 of 328832 inodes (1 of 4 AGs) Metadata corruption detected at 0x55a4cd81925e, xfs_inode block 0x4b68340/0x8000 Copied 144832 of 328832 inodes (1 of 4 AGs) Unknown directory buffer type! Zeroing clean log Uncompressess to 265M.
Ok, got it. The filesystem looks heavily damaged FWIW. I do see the xfs_repair: phase6.c:1362: longform_dir2_rebuild: Assertion `done' failed. error though. I'll look into why it failed, but how bad was the disk you rescued from, just out of curiosity?
Send this patch to the list, forgot to cc: you sorry, I'll bounce it to you. ==== xfs_repair: continue after xfs_bunmapi deadlock avoidance After commit: 15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi xfs_bunmapi can legitimately return before all work is done. Sadly nobody told xfs_repair, so it fires an assert: phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. Fix this by calling back in until all work is done, as we do in the kernel. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116 Reported-by: Tomasz Torcz <tomek> Signed-off-by: Eric Sandeen <sandeen> --- diff --git a/repair/phase6.c b/repair/phase6.c index e017326..b87c751 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1317,7 +1317,7 @@ longform_dir2_rebuild( xfs_fileoff_t lastblock; xfs_inode_t pip; dir_hash_ent_t *p; - int done; + int done = 0; /* * trash directory completely and rebuild from scratch using the @@ -1352,12 +1352,25 @@ longform_dir2_rebuild( error); /* free all data, leaf, node and freespace blocks */ - error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0, - &done); - if (error) { - do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); - goto out_bmap_cancel; - } + while (!done) { + error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, + 0, &done); + if (error) { + do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + error = xfs_defer_finish(&tp); + if (error) { + do_warn(("defer_finish failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + /* + * Close out trans and start the next one in the chain. + */ + error = xfs_trans_roll_inode(&tp, ip); + if (error) + goto out_bmap_cancel; + } ASSERT(done);
I've tested your V2 patch with success. xfs_repair was able to fix enough problems to mount the partition. I can now proceed with copying home directory from it. Thank you, Eric! The disk itself was few years old HDD, used in laptop. dd_rescue displayed 26 read errors (IIRC) during the copying. XFS partition served as rootfs (Fedora 28) on this laptop, and before I got it, the laptop was powered on couple of times. Each boot ended in initrd not being able to mount rootfs, and then the laptop was forced off. Before I discovered read errors, I've tried couple of unsuccessful xfs_repair runs. So nothing quite special, just worn out HDD.
Ok, I'm going to close this as an upstream bug, I think it's solved your problem ,and fedora will inherit it with normal updates. I don't think theres' any reason to push this patch to the fedora packages ahead of upstream. if you disagree, let me know & reopen. Thanks, -eric