730433 – Error message when allocation group size too big is misleading

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 730433 - Error message when allocation group size too big is misleading

Summary: Error message when allocation group size too big is misleading

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	xfsprogs
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Boris Ranto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-08-12 22:22 UTC by linuxteer
Modified:	2013-02-21 11:00 UTC (History)
CC List:	3 users (show)
Fixed In Version:	xfsprogs-3.1.1-8.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 11:00:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Output for mkfs.xfs when agcount set to primes < 37 (23.79 KB, text/plain) 2011-08-15 21:30 UTC, linuxteer	no flags	Details
Output for mkfs.xfs when agcount set to primes >= 37 (23.79 KB, text/plain) 2011-08-15 21:31 UTC, linuxteer	no flags	Details
Output for mkfs.xfs when agcount set to primes < 37 (3.19 KB, text/plain) 2011-08-15 21:37 UTC, linuxteer	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Linux Kernel	41052	0	None	None	None	Never
Red Hat Product Errata	RHBA-2013:0481	0	normal	SHIPPED_LIVE	xfsprogs bug fix and enhancement update	2013-02-21 12:44:16 UTC

Description linuxteer 2011-08-12 22:22:02 UTC

Description of problem:
In my test system I created a FS where the default allocation groups size happened to be of about 1 GiB:

[root@localhost ~]# mkfs.xfs -L nss6_1 -f -d su=512k,sw=20 -l sunit=512,size=64m /dev/sdc
meta-data=/dev/sdc               isize=256    agcount=37, agsize=268435328 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=9764864000, imaxpct=5
         =                       sunit=128    swidth=2560 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

However if I specify less allocation groups then the default, instead of getting a message complaining of AG size too big I get this error message:

[root@localhost ~]# mkfs.xfs -L nss6_1 -f -d agcount=31,su=512k,sw=20 -l sunit=512,size=64m ${dev1}
Allocation group size (314995613) is not a multiple of the stripe unit (128)


Version-Release number of selected component (if applicable):

I am using a RHEL 6.1 server with kernel 2.6.32-131.0.15.el6.x86_64 and XFS packages:
xfsprogs-devel-3.1.1-4.el6.x86_64
xfsprogs-3.1.1-4.el6.x86_64
xfsdump-3.0.4-2.el6.x86_64
xfsprogs-qa-devel-3.1.1-4.el6.x86_64

LANG=en_US.UTF-8


How reproducible:

100%


Steps to Reproduce:
1. Create a XFS FS where allocation groups size is over 1 TiB.
2. Inspect the misleading error message.
3.
  

Actual results:
"Allocation group size (314995613) is not a multiple of the stripe unit (128)"

Expected results:
"Allocation group size (314995613 blks) is over maximum allocation groups size of 1 TiB (268435328 blks)"

Or a message related to the AG size being over maximum allowed.
Also a nice improvement is to add the units for values displayed in the message (blocks in this case).

Comment 2 Eric Sandeen 2011-08-12 22:33:07 UTC

Another good reason to stick with the defaults, and not fiddle with things like agcount? :)

But agreed, it could be a better error message.

Comment 3 Dave Chinner 2011-08-15 06:03:48 UTC

(In reply to comment #0)
> Description of problem:
> In my test system I created a FS where the default allocation groups size
> happened to be of about 1 GiB:
> 
> [root@localhost ~]# mkfs.xfs -L nss6_1 -f -d su=512k,sw=20 -l
> sunit=512,size=64m /dev/sdc
> meta-data=/dev/sdc               isize=256    agcount=37, agsize=268435328 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=9764864000, imaxpct=5
>          =                       sunit=128    swidth=2560 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=512   sunit=64 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> However if I specify less allocation groups then the default, instead of
> getting a message complaining of AG size too big I get this error message:
> 
> [root@localhost ~]# mkfs.xfs -L nss6_1 -f -d agcount=31,su=512k,sw=20 -l
> sunit=512,size=64m ${dev1}
> Allocation group size (314995613) is not a multiple of the stripe unit (128)

That's not misleading - it's correctly detecting an error with the configuration you specified - but it's not the error you -expected-.

All it means is that the alignment checks are done before the size checks. 
Indeed, we have to do the checking that way, because when we cater for AG alignment during automatic sizing it changes the size of the AGs. Hence the
size checks must be done after the alignment checks. You've just triggered an alignment check failure before the size checks are done....

> Actual results:
> "Allocation group size (314995613) is not a multiple of the stripe unit (128)"
> 
> Expected results:
> "Allocation group size (314995613 blks) is over maximum allocation groups size
> of 1 TiB (268435328 blks)"

Just because you are trying to trigger a specific error, it doesn't mean that the specific error you want to see is the only possible error that can occur from the given configuration. A different error occurs doesn't necessarily mean there is a bug in the program.

> Or a message related to the AG size being over maximum allowed.
> Also a nice improvement is to add the units for values displayed in the message
> (blocks in this case).

Yes, I agree the error message could be more verbose and mention units, but that is a secondary issues and unrelated to your (incorrect) expectation of what error should be detected given the input configuration.

Comment 4 linuxteer 2011-08-15 21:30:54 UTC

Created attachment 518343 [details]
Output for mkfs.xfs when agcount set to primes < 37

Comment 5 linuxteer 2011-08-15 21:31:28 UTC

Created attachment 518344 [details]
Output for mkfs.xfs when agcount set to primes >= 37

Comment 6 linuxteer 2011-08-15 21:34:51 UTC

Yes Dave,
I did not check if alignment was a problem, since I was supplying mkfs.xfs with agcount, not agsize. AGsize was calculated by mkfs.xfs.
Incidentally, I used prime numbers on agcount to avoid getting the warning:
"Warning: AG size is a multiple of stripe width. This can cause performance
problems by aligning all AGs on the same disk..."

To me, after checking that the agsize with agcount=37 was close to 1 TB, the "obvious" issue was that a smaller count forced the agsize over the 1 TB limit.

The agcount values of primes below 37 gave all the same error, except 2 and 5 which generate multiples of 128 and display what I thought was the correct message (which BTW includes units):
"agsize (1952972800b) too big, maximum is 268435455 blocks"
Providing agcount with primes 37 and over (up to 257), all worked fine.

Even more, the agsize in the alignment error message (314995613) seems to be reporting the ceiling function for number of blocks from the data section (9764864000) divided by the agcount provided (for primes below 37, except 2 & 5).
However, when specifying agcount with primes from 37 to 257, agsize was always adjusted to the next multiple of 128 (only 149 generates a multiple of 128), but for primes under 37 it was not.
See adjunct files for details.

When the defaults are allowed, the number of data blocks divided by 37 (default agcount) = 263915243.24 and default agsize selected was 268435328, the highest multiple of 128 below the 1 TB limit. So it seems to be optimizing for max agsize and then getting agcount.

Without checking the source code, to a user of mkfs.xfs trying to manipulate agcount (in my case, for parallel scalability optimization purposes) it looks like the agsize is correctly selected/adjusted to be aligned to sunit
But when using smaller values for agcount than default, it looks like a wrong error message was displayed. Especially if you see the 'agsize too big" error message for agcount set to 2 or 5.

Comment 7 linuxteer 2011-08-15 21:37:05 UTC

Created attachment 518347 [details]
Output for mkfs.xfs when agcount set to primes < 37

Selected the wrong file in the original upload. Apologies.

Comment 8 RHEL Program Management 2011-10-07 16:12:43 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Eric Sandeen 2012-03-28 18:29:46 UTC

Dave, I do find this interesting, in the mkfs code:

                        if ((tmp_agsize >= XFS_AG_MIN_BLOCKS(blocklog)) &&
                            (tmp_agsize <= XFS_AG_MAX_BLOCKS(blocklog))) {
                                ...
                        } else {
                                if (nodsflag) {
                                        dsunit = dswidth = 0;
                                } else {
                                        fprintf(stderr,
_("Allocation group size (%lld) is not a multiple of the stripe unit (%d)\n"),
                                                (long long)agsize, dsunit);
                                        exit(1);
                                }
                        }

At that point we have tried to round the agsize up and down to align it, and have found it to be too large in both cases.  At the exit(1) point, it seems like it'd make some sense to point that out in the error message.

agsize wasn't specified, it was calculated given the specified agcount.  There were efforts to fix agsize up w.r.t. stripe geometry, but no efforts to make it fit within the maximum size; hence I tend to agree that if one specifies agcount so small that agsize is out of bounds, that does seem like a reasonable first error message to provide.

Having said all that, this seems like the sort of thing which could be tweaked upstream, but doesn't necessarily rise to the level of requiring a RHEL package update...

-Eric

Comment 10 Eric Sandeen 2012-05-14 15:34:13 UTC

commit ddf12ea5dc56a728f24d24c5d7403c3412b40b86
Author: Eric Sandeen <sandeen>
Date:   Wed Mar 28 22:23:11 2012 -0500

    mkfs.xfs: print std info if agcount makes agsize out of bounds
    
    When specifying a too-small agcount with stripe geometry,
    mkfs.xfs can fail with a somewhat unexpected message:
    
    $ mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31,su=512k,sw=20
    Allocation group size (314995613) is not a multiple of the stripe unit (128)
    
    This strikes me as especially odd because normally, mkfs.xfs
    tries to fix up the agsize to be a stripe multiple.  The only way
    we get to the above error message is if ag _size_ is out of bounds;
    exiting with an error about alignment rather than about size
    seems odd.
    
    Maybe below is too clever, but if by the time we've decided that
    agsize is out of bounds after rounding it both up and down,
    as necessary, to get to a stripe-width multiple, calling
    validate_ag_geometry() will give us the same standard message as
    if we had specified no stripe geometry:
    
    $ mkfs/mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31,su=512k,sw=20
    agsize (314995613b) too big, maximum is 268435455 blocks
    Usage: mkfs.xfs
    ...
    
    $ mkfs/mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31
    agsize (314995613b) too big, maximum is 268435455 blocks
    Usage: mkfs.xfs
    ...
    
    Also, tidy up error message to explicitly state "blocks" not "b"
    
    Signed-off-by: Eric Sandeen <sandeen>
    Reviewed-by: Dave Chinner <dchinner>

Comment 11 RHEL Program Management 2012-09-13 05:39:19 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 17 errata-xmlrpc 2013-02-21 11:00:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0481.html

Note You need to log in before you can comment on or make changes to this bug.