Bug 1641116
Summary: | xfs_repair crashes at phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tomasz Torcz <tomek> |
Component: | xfsprogs | Assignee: | Eric Sandeen <esandeen> |
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | esandeen |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-01-09 17:22:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tomasz Torcz
2018-10-19 16:36:37 UTC
Any chance you can share an xfs_metadump image, which obfuscates nearly all metadata and zeros out unused portions of sectors? (and contains no data blocks at all) Created attachment 1495716 [details]
vdb5.xfs_metadump.xz
% xfs_metadump -g /dev/vdb5 vdb5.xfs_metadump
Copied 90112 of 328832 inodes (1 of 4 AGs) Metadata corruption detected at 0x55a4cd81925e, xfs_inode block 0x4b68340/0x8000
Copied 144832 of 328832 inodes (1 of 4 AGs) Unknown directory buffer type!
Zeroing clean log
Uncompressess to 265M.
Ok, got it. The filesystem looks heavily damaged FWIW. I do see the xfs_repair: phase6.c:1362: longform_dir2_rebuild: Assertion `done' failed. error though. I'll look into why it failed, but how bad was the disk you rescued from, just out of curiosity? Send this patch to the list, forgot to cc: you sorry, I'll bounce it to you. ==== xfs_repair: continue after xfs_bunmapi deadlock avoidance After commit: 15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi xfs_bunmapi can legitimately return before all work is done. Sadly nobody told xfs_repair, so it fires an assert: phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed. Fix this by calling back in until all work is done, as we do in the kernel. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116 Reported-by: Tomasz Torcz <tomek> Signed-off-by: Eric Sandeen <sandeen> --- diff --git a/repair/phase6.c b/repair/phase6.c index e017326..b87c751 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1317,7 +1317,7 @@ longform_dir2_rebuild( xfs_fileoff_t lastblock; xfs_inode_t pip; dir_hash_ent_t *p; - int done; + int done = 0; /* * trash directory completely and rebuild from scratch using the @@ -1352,12 +1352,25 @@ longform_dir2_rebuild( error); /* free all data, leaf, node and freespace blocks */ - error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0, - &done); - if (error) { - do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); - goto out_bmap_cancel; - } + while (!done) { + error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, + 0, &done); + if (error) { + do_warn(_("xfs_bunmapi failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + error = xfs_defer_finish(&tp); + if (error) { + do_warn(("defer_finish failed -- error - %d\n"), error); + goto out_bmap_cancel; + } + /* + * Close out trans and start the next one in the chain. + */ + error = xfs_trans_roll_inode(&tp, ip); + if (error) + goto out_bmap_cancel; + } ASSERT(done); I've tested your V2 patch with success. xfs_repair was able to fix enough problems to mount the partition. I can now proceed with copying home directory from it. Thank you, Eric! The disk itself was few years old HDD, used in laptop. dd_rescue displayed 26 read errors (IIRC) during the copying. XFS partition served as rootfs (Fedora 28) on this laptop, and before I got it, the laptop was powered on couple of times. Each boot ended in initrd not being able to mount rootfs, and then the laptop was forced off. Before I discovered read errors, I've tried couple of unsuccessful xfs_repair runs. So nothing quite special, just worn out HDD. Ok, I'm going to close this as an upstream bug, I think it's solved your problem ,and fedora will inherit it with normal updates. I don't think theres' any reason to push this patch to the fedora packages ahead of upstream. if you disagree, let me know & reopen. Thanks, -eric |