Bug 1878352

Summary: mkfs.xfs asserts in align_ag_geometry
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: xfsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: esandeen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-08 15:31:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zdenek Kabelac 2020-09-12 07:58:50 UTC
While running some lvm2 tests, I'm noticing several core dumps from mkfs.xfs.

#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007f2eca0ea8a4 in __GI_abort () at abort.c:79
#2  0x00007f2eca0ea789 in __assert_fail_base (fmt=0x7f2eca257ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x558b0315a7b7 "cfg->agcount != 0", file=0x558b0315a7ac "xfs_mkfs.c", line=2834, function=<optimized out>) at assert.c:92
#3  0x00007f2eca0f9fb6 in __GI___assert_fail (assertion=0x558b0315a7b7 "cfg->agcount != 0", file=0x558b0315a7ac "xfs_mkfs.c", line=2834, 
    function=0x558b0315ae20 <__PRETTY_FUNCTION__.1> "align_ag_geometry") at assert.c:101
#4  0x0000558b03120410 in align_ag_geometry (cfg=0x7ffef7bdf550) at xfs_mkfs.c:2834
#5  0x0000558b03116f89 in main (argc=<optimized out>, argv=<optimized out>) at xfs_mkfs.c:3761

xfsprogs-5.8.0-1.fc34.x86_64

Comment 1 Eric Sandeen 2020-09-12 14:52:43 UTC
Could you please provide a hint at a reproducer?  What test are you running?  On what sort of device, with what type of geometry?  How is mkfs.xfs invoked, with defaults or with options?  Was this device tiny?

Comment 2 Eric Sandeen 2020-09-12 14:53:30 UTC
Perhaps you could install xfsprogs-debuginfo and provide the core dump itself?

Comment 3 Eric Sandeen 2020-09-12 14:58:24 UTC
Oh I guess maybe that was from debuginfo and we've just optimized away the useful info.  In any case, knowing how this was triggered, on what type of device, will be helpful.

Thanks,
-Eric

Comment 4 Zdenek Kabelac 2020-09-12 18:06:11 UTC
So I've looked over tests run on the box for a while - and figured out the result is got from
the unreleased test case which can be be easily reproduced with these commands:
(modify names to your system needs)

# vgcreate -s 4K vgname /dev/device_to_play_with

# lvcreate -T -L10  vgname/poolname  -n lvname  -V10200K

# mkfs.xfs /dev/vgname/lvname

(running on kernel 5.9-rc4)

Interestingly for the LV created as regular linear device (lvcreate -L10200K -n lvname vgname)
mkfs.xfs exits with expected error message:

# mkfs.xfs  /dev/vg/lv
agsize (2550 blocks) too small, need at least 4096 blocks


So it is just some corner-case - where just coredump crash is annoying but device would not be usable anyway.

Comment 5 Eric Sandeen 2020-09-12 20:19:10 UTC
Thanks - so,

# blockdev --getiomin --getioopt --getsize64 /dev/vgname/lvname
65536
65536
10444800

So it's a very small device with "interesting" geometry.

Ok, so this will do the same:

# truncate --size=10444800 testfile
# mkfs.xfs -dsu=65536,sw=1 testfile 
mkfs.xfs: xfs_mkfs.c:2834: align_ag_geometry: Assertion `cfg->agcount != 0' failed.

Even a more typical stripe geometry such as

# mkfs.xfs -dsu=32768,sw=2 testfile 
mkfs.xfs: xfs_mkfs.c:2834: align_ag_geometry: Assertion `cfg->agcount != 0' failed.

or

# mkfs.xfs -dsu=16384,sw=4 testfile 
mkfs.xfs: xfs_mkfs.c:2834: align_ag_geometry: Assertion `cfg->agcount != 0' failed.

fails as well.  So we can't just ignore it for iomin==ioopt.

As you say, omitting stripe geometry leads to simply failing the min size check:
# mkfs.xfs testfile
agsize (2550 blocks) too small, need at least 4096 blocks

so we're trying to set up stripe geometry before we validate the device size.
I think we can just remove the assert and move the check to the validate_ag_geometry to avoid the segfault.

Thank for the report!
-Eric

Comment 6 Eric Sandeen 2020-09-12 20:27:45 UTC
I'll send a patch to only try to reduce the agcount if it's > 1, and then be more explicit that the device itself is simply too small:

# mkfs/mkfs.xfs -f -dsu=16384,sw=4 testfile 
device (2550 blocks) too small, need at least 4096 blocks
# # mkfs/mkfs.xfs -f testfile
device (2550 blocks) too small, need at least 4096 blocks

-Eric

Comment 7 Eric Sandeen 2020-10-08 15:31:24 UTC
This has been fixed upstream, btw:

commit 97a4059660b27a9b0e3d8cdde5dbef8712685865
Author: Pavel Reichl <preichl>
Date:   Mon Sep 28 17:31:18 2020 -0400

    mkfs.xfs: fix ASSERT on too-small device with stripe geometry
    
    When a too-small device is created with stripe geometry, we hit an
    assert in align_ag_geometry():
    
    mkfs.xfs: xfs_mkfs.c:2834: align_ag_geometry: Assertion `cfg->agcount != 0' failed.
    
    This is because align_ag_geometry() finds that the size of the last
    (only) AG is too small, and attempts to trim it off.  Obviously 0
    AGs is invalid, and we hit the ASSERT.
    
    Reported-by: Zdenek Kabelac <zkabelac>
    Suggested-by: Dave Chinner <dchinner>
    Signed-off-by: Pavel Reichl <preichl>
    Reviewed-by: Christoph Hellwig <hch>
    Reviewed-by: Carlos Maiolino <cmaiolino>
    Signed-off-by: Eric Sandeen <sandeen>

I'm guessing this doesn't need to be pushed quickly to Fedora so closing as upstream; if that's wrong please let me know.

Thanks for the report!