RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1910384 - [xfstests xfs/291] xfs_repair abort malloc(): invalid size (unsorted)
Summary: [xfstests xfs/291] xfs_repair abort malloc(): invalid size (unsorted)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: xfsprogs
Version: 8.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.4
Assignee: Bill O'Donnell
QA Contact: Zorro Lang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-23 16:50 UTC by Zorro Lang
Modified: 2021-05-18 15:08 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 15:07:52 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
proposed patch (3.03 KB, patch)
2020-12-23 23:07 UTC, Eric Sandeen
no flags Details | Diff

Description Zorro Lang 2020-12-23 16:50:37 UTC
Description of problem:

Although xfs/291 has known failure, but it doesn't expect a xfs_repair abort as below when xfs rmapbt is enabled. And this's a regression issue on xfsprogs-5.0.0-7.el8, due to xfsprogs-5.0.0-4.el8 can't reproduce it.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
malloc(): invalid size (unsorted)
./common/xfs: line 295: 2783225 Aborted                 (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
xfs_repair failed

Version-Release number of selected component (if applicable):
xfsprogs-5.0.0-7.el8

How reproducible:
Nearly 100%

Steps to Reproduce:
Run xfs/291 on xfs with reflink=1,rmapbt=1

Actual results:


Expected results:


Additional info:

Comment 1 Zorro Lang 2020-12-23 17:04:20 UTC
xfs/031 can trigger this bug too
# cat xfs/031.full
...
...
...
Repairing, round 0
Phase 1 - find and verify superblock...
Phase 2 - using <TYPEOF> log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
Phase 5 - rebuild AG headers and trees...
malloc(): invalid size (unsorted)
./common/xfs: line 295: 2054013 Aborted                 (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
Repairing, iteration 1
15c15
< ./common/xfs: line 295: 2054013 Aborted                 (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
---
> ./common/xfs: line 295: 2054123 Aborted                 (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV
ERROR: repair round 1 differs to round 0 (see /var/lib/xfstests/results//xfs/031.full)

Comment 2 Zorro Lang 2020-12-23 17:06:27 UTC
BTW, test with kernel-debug-4.18.0-265.el8.dt2

Comment 3 Zorro Lang 2020-12-23 17:11:58 UTC
Hmm... and xfs/137 can reproduce this bug too
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
clearing reflink flag on inode 135
clearing reflink flag on inode 136
...
...
clearing reflink flag on inode 2807
clearing reflink flag on inode 2815
clearing reflink flag on inode 12679
Phase 5 - rebuild AG headers and trees...
malloc(): invalid size (unsorted)
./common/xfs: line 295: 2229223 Aborted                 (core dumped) $XFS_REPAIR_PROG $SCRATCH_OPTIONS $* $SCRATCH_DEV

Comment 4 Eric Sandeen 2020-12-23 20:14:24 UTC
Zorro, can you attach the core file to the bug?

Comment 5 Eric Sandeen 2020-12-23 23:04:21 UTC
The problematic patch is 

xfsprogs-5.7.0-xfs_repair-fix-rebuilding-btree-block-less-than-minr.patch

6df28d1 xfs_repair: fix rebuilding btree block less than minrecs

It appears that it was broken at this point upstream as well, but never discovered because 

dc9f4f5 xfs_repair: rebuild reverse mapping btrees with bulk loader

inadvertently resolved the bug before the point release.

I have a patch that I think will fix this, we are using the wrong min/max values for the rmap btree.

Comment 6 Eric Sandeen 2020-12-23 23:07:22 UTC
Created attachment 1741632 [details]
proposed patch

This makes xfs/031 work for me w/ rmabt enabled, I have not done a full regression test.

Comment 7 Zorro Lang 2020-12-24 02:57:19 UTC
(In reply to Eric Sandeen from comment #6)
> Created attachment 1741632 [details]
> proposed patch
> 
> This makes xfs/031 work for me w/ rmabt enabled, I have not done a full
> regression test.

Thanks Eric, I didn't upload core file when I reported this bug, due to I think this bug is too easy to reproduce, and easy to get a core file too.
I'll scratch build a xfsprogs and give your patch a tier1 regression test. And you might need to add this bug into xfsprogs errata.

Thanks,
Zorro

Comment 8 Zorro Lang 2020-12-25 11:19:47 UTC
(In reply to Eric Sandeen from comment #6)
> Created attachment 1741632 [details]
> proposed patch
> 
> This makes xfs/031 work for me w/ rmabt enabled, I have not done a full
> regression test.

Hi Eric, I built a scratch build xfsprogs with your patch as below:
http://brew-task-repos.usersys.redhat.com/repos/scratch/zlang/xfsprogs/5.0.0/99.el8/

Then Tier1 regression test didn't found any regression issue, and this bug disappeared.
So this patch good to me. Feel free to add this bug into xfsprogs erratum and fix it in time.

Thanks,
Zorro

Comment 9 Eric Sandeen 2020-12-28 15:52:36 UTC
Thanks Zorro - yes that's fine that you didn't upload the core, you're right that it's very easy to reproduce when testing w/ rmapbt.

I will ask Gao Xiang to review my attached patch for RHEL; it will never go upstream because that code no longer exists.

Thanks,
-Eric

Comment 10 Gao Xiang 2020-12-29 12:57:01 UTC
(In reply to Eric Sandeen from comment #9)
> Thanks Zorro - yes that's fine that you didn't upload the core, you're right
> that it's very easy to reproduce when testing w/ rmapbt.
> 
> I will ask Gao Xiang to review my attached patch for RHEL; it will never go
> upstream because that code no longer exists.
> 
> Thanks,
> -Eric

Hi Eric,

It looks OK with the attached patch and very sorry about that I didn't notice
the different name pair (m_rmap_mxr/m_rmap_mnr) and didn't test with rmapbt
enabled with xfstests at that time...

Thanks,
Gao Xiang

Comment 15 errata-xmlrpc 2021-05-18 15:07:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (xfsprogs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1690


Note You need to log in before you can comment on or make changes to this bug.