Bug 1910384
| Summary: | [xfstests xfs/291] xfs_repair abort malloc(): invalid size (unsorted) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Zorro Lang <zlang> | ||||
| Component: | xfsprogs | Assignee: | Bill O'Donnell <billodo> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Zorro Lang <zlang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.2 | CC: | billodo, hsiangkao, xzhou | ||||
| Target Milestone: | rc | Keywords: | Regression, Triaged | ||||
| Target Release: | 8.4 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-05-18 15:07:52 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
xfs/031 can trigger this bug too
# cat xfs/031.full
...
...
...
Repairing, round 0
Phase 1 - find and verify superblock...
Phase 2 - using <TYPEOF> log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
Phase 5 - rebuild AG headers and trees...
malloc(): invalid size (unsorted)
./common/xfs: line 295: 2054013 Aborted (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
Repairing, iteration 1
15c15
< ./common/xfs: line 295: 2054013 Aborted (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
---
> ./common/xfs: line 295: 2054123 Aborted (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
ERROR: repair round 1 differs to round 0 (see /var/lib/xfstests/results//xfs/031.full)
BTW, test with kernel-debug-4.18.0-265.el8.dt2 Hmm... and xfs/137 can reproduce this bug too
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
clearing reflink flag on inode 135
clearing reflink flag on inode 136
...
...
clearing reflink flag on inode 2807
clearing reflink flag on inode 2815
clearing reflink flag on inode 12679
Phase 5 - rebuild AG headers and trees...
malloc(): invalid size (unsorted)
./common/xfs: line 295: 2229223 Aborted (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
Zorro, can you attach the core file to the bug? The problematic patch is xfsprogs-5.7.0-xfs_repair-fix-rebuilding-btree-block-less-than-minr.patch 6df28d1 xfs_repair: fix rebuilding btree block less than minrecs It appears that it was broken at this point upstream as well, but never discovered because dc9f4f5 xfs_repair: rebuild reverse mapping btrees with bulk loader inadvertently resolved the bug before the point release. I have a patch that I think will fix this, we are using the wrong min/max values for the rmap btree. Created attachment 1741632 [details]
proposed patch
This makes xfs/031 work for me w/ rmabt enabled, I have not done a full regression test.
(In reply to Eric Sandeen from comment #6) > Created attachment 1741632 [details] > proposed patch > > This makes xfs/031 work for me w/ rmabt enabled, I have not done a full > regression test. Thanks Eric, I didn't upload core file when I reported this bug, due to I think this bug is too easy to reproduce, and easy to get a core file too. I'll scratch build a xfsprogs and give your patch a tier1 regression test. And you might need to add this bug into xfsprogs errata. Thanks, Zorro (In reply to Eric Sandeen from comment #6) > Created attachment 1741632 [details] > proposed patch > > This makes xfs/031 work for me w/ rmabt enabled, I have not done a full > regression test. Hi Eric, I built a scratch build xfsprogs with your patch as below: http://brew-task-repos.usersys.redhat.com/repos/scratch/zlang/xfsprogs/5.0.0/99.el8/ Then Tier1 regression test didn't found any regression issue, and this bug disappeared. So this patch good to me. Feel free to add this bug into xfsprogs erratum and fix it in time. Thanks, Zorro Thanks Zorro - yes that's fine that you didn't upload the core, you're right that it's very easy to reproduce when testing w/ rmapbt. I will ask Gao Xiang to review my attached patch for RHEL; it will never go upstream because that code no longer exists. Thanks, -Eric (In reply to Eric Sandeen from comment #9) > Thanks Zorro - yes that's fine that you didn't upload the core, you're right > that it's very easy to reproduce when testing w/ rmapbt. > > I will ask Gao Xiang to review my attached patch for RHEL; it will never go > upstream because that code no longer exists. > > Thanks, > -Eric Hi Eric, It looks OK with the attached patch and very sorry about that I didn't notice the different name pair (m_rmap_mxr/m_rmap_mnr) and didn't test with rmapbt enabled with xfstests at that time... Thanks, Gao Xiang Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (xfsprogs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1690 |
Description of problem: Although xfs/291 has known failure, but it doesn't expect a xfs_repair abort as below when xfs rmapbt is enabled. And this's a regression issue on xfsprogs-5.0.0-7.el8, due to xfsprogs-5.0.0-4.el8 can't reproduce it. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... malloc(): invalid size (unsorted) ./common/xfs: line 295: 2783225 Aborted (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV xfs_repair failed Version-Release number of selected component (if applicable): xfsprogs-5.0.0-7.el8 How reproducible: Nearly 100% Steps to Reproduce: Run xfs/291 on xfs with reflink=1,rmapbt=1 Actual results: Expected results: Additional info: