Bug 1641116
| Summary: | xfs_repair crashes at phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Tomasz Torcz <tomek> |
| Component: | xfsprogs | Assignee: | Eric Sandeen <esandeen> |
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rawhide | CC: | esandeen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-01-09 17:22:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Any chance you can share an xfs_metadump image, which obfuscates nearly all metadata and zeros out unused portions of sectors? (and contains no data blocks at all) Created attachment 1495716 [details]
vdb5.xfs_metadump.xz
% xfs_metadump -g /dev/vdb5 vdb5.xfs_metadump
Copied 90112 of 328832 inodes (1 of 4 AGs) Metadata corruption detected at 0x55a4cd81925e, xfs_inode block 0x4b68340/0x8000
Copied 144832 of 328832 inodes (1 of 4 AGs) Unknown directory buffer type!
Zeroing clean log
Uncompressess to 265M.
Ok, got it. The filesystem looks heavily damaged FWIW. I do see the xfs_repair: phase6.c:1362: longform_dir2_rebuild: Assertion `done' failed. error though. I'll look into why it failed, but how bad was the disk you rescued from, just out of curiosity? Send this patch to the list, forgot to cc: you sorry, I'll bounce it to you. ==== xfs_repair: continue after xfs_bunmapi deadlock avoidance After commit: 15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi xfs_bunmapi can legitimately return before all work is done. Sadly nobody told xfs_repair, so it fires an assert: phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. Fix this by calling back in until all work is done, as we do in the kernel. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116 Reported-by: Tomasz Torcz <tomek> Signed-off-by: Eric Sandeen <sandeen> --- diff --git a/repair/phase6.c b/repair/phase6.c index e017326..b87c751 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1317,7 +1317,7 @@ longform_dir2_rebuild( xfs_fileoff_t lastblock; xfs_inode_t pip; dir_hash_ent_t *p; - int done; + int done = 0; /* * trash directory completely and rebuild from scratch using the @@ -1352,12 +1352,25 @@ longform_dir2_rebuild( error); /* free all data, leaf, node and freespace blocks */ - error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0, - &done); - if (error) { - do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); - goto out_bmap_cancel; - } + while (!done) { + error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, + 0, &done); + if (error) { + do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + error = xfs_defer_finish(&tp); + if (error) { + do_warn(("defer_finish failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + /* + * Close out trans and start the next one in the chain. + */ + error = xfs_trans_roll_inode(&tp, ip); + if (error) + goto out_bmap_cancel; + } ASSERT(done); I've tested your V2 patch with success. xfs_repair was able to fix enough problems to mount the partition. I can now proceed with copying home directory from it. Thank you, Eric! The disk itself was few years old HDD, used in laptop. dd_rescue displayed 26 read errors (IIRC) during the copying. XFS partition served as rootfs (Fedora 28) on this laptop, and before I got it, the laptop was powered on couple of times. Each boot ended in initrd not being able to mount rootfs, and then the laptop was forced off. Before I discovered read errors, I've tried couple of unsuccessful xfs_repair runs. So nothing quite special, just worn out HDD. Ok, I'm going to close this as an upstream bug, I think it's solved your problem ,and fedora will inherit it with normal updates. I don't think theres' any reason to push this patch to the fedora packages ahead of upstream. if you disagree, let me know & reopen. Thanks, -eric |
Description of problem: HDD developed few bad sectors. Trying to xfs_repair (both original disc and dd_rescue'd copy) crashes xfs_repair: … Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... rebuilding directory inode 134218179 bad hash table for directory inode 134644569 (no data entry): rebuilding rebuilding directory inode 134644569 rebuilding directory inode 135789656 xfs_repair: phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. Aborted (core dumped) Backtrace: (gdb) bt #0 0x00007ffff7dac5cf in raise () from /lib64/libc.so.6 #1 0x00007ffff7d96895 in abort () from /lib64/libc.so.6 #2 0x00007ffff7d96769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x00007ffff7da49f6 in __assert_fail () from /lib64/libc.so.6 #4 0x00005555555797ef in longform_dir2_rebuild (hashtab=<optimized out>, ino_offset=24, irec=<optimized out>, ip=0x5555556f5270, ino=135789656, mp=<optimized out>) at phase6.c:1410 #5 longform_dir2_entry_check (hashtab=<optimized out>, ino_offset=24, irec=<optimized out>, need_dot=0x7fffffffd608, num_illegal=0x7fffffffd610, ip=0x5555556f5270, ino=135789656, mp=<optimized out>) at phase6.c:2481 #6 process_dir_inode (mp=<optimized out>, agno=agno@entry=1, irec=irec@entry=0x7fffd818a310, ino_offset=ino_offset@entry=24) at phase6.c:2983 #7 0x0000555555579ab2 in traverse_function (wq=0x7fffffffdb00, agno=1, arg=0x55555567f050) at phase6.c:3254 #8 0x000055555557dff5 in prefetch_ag_range (work=0x7fffffffdb00, start_ag=<optimized out>, end_ag=4, dirs_only=true, func=0x555555579a10 <traverse_function>) at prefetch.c:964 #9 0x000055555557fa25 in do_inode_prefetch (mp=0x7fffffffdf70, stride=0, func=0x555555579a10 <traverse_function>, check_cache=<optimized out>, dirs_only=true) at prefetch.c:1027 #10 0x000055555557acd4 in traverse_ags (mp=0x7fffffffdf70) at phase6.c:3372 #11 phase6 (mp=0x7fffffffdf70) at phase6.c:3372 #12 0x000055555555ac2e in main (argc=<optimized out>, argv=<optimized out>) at xfs_repair.c:949 Version-Release number of selected component (if applicable): xfsprogs-4.18.0-1.fc30.x86_64 I'm happy to run any further diagnostic, but I cannot share the disk image.