Bug 730433 - Error message when allocation group size too big is misleading
Summary: Error message when allocation group size too big is misleading
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xfsprogs
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Boris Ranto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-08-12 22:22 UTC by linuxteer
Modified: 2013-02-21 11:00 UTC (History)
3 users (show)

Fixed In Version: xfsprogs-3.1.1-8.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 11:00:44 UTC


Attachments (Terms of Use)
Output for mkfs.xfs when agcount set to primes < 37 (23.79 KB, text/plain)
2011-08-15 21:30 UTC, linuxteer
no flags Details
Output for mkfs.xfs when agcount set to primes >= 37 (23.79 KB, text/plain)
2011-08-15 21:31 UTC, linuxteer
no flags Details
Output for mkfs.xfs when agcount set to primes < 37 (3.19 KB, text/plain)
2011-08-15 21:37 UTC, linuxteer
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0481 normal SHIPPED_LIVE xfsprogs bug fix and enhancement update 2013-02-21 12:44:16 UTC
Linux Kernel 41052 None None None Never

Description linuxteer 2011-08-12 22:22:02 UTC
Description of problem:
In my test system I created a FS where the default allocation groups size happened to be of about 1 GiB:

[root@localhost ~]# mkfs.xfs -L nss6_1 -f -d su=512k,sw=20 -l sunit=512,size=64m /dev/sdc
meta-data=/dev/sdc               isize=256    agcount=37, agsize=268435328 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=9764864000, imaxpct=5
         =                       sunit=128    swidth=2560 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

However if I specify less allocation groups then the default, instead of getting a message complaining of AG size too big I get this error message:

[root@localhost ~]# mkfs.xfs -L nss6_1 -f -d agcount=31,su=512k,sw=20 -l sunit=512,size=64m ${dev1}
Allocation group size (314995613) is not a multiple of the stripe unit (128)


Version-Release number of selected component (if applicable):

I am using a RHEL 6.1 server with kernel 2.6.32-131.0.15.el6.x86_64 and XFS packages:
xfsprogs-devel-3.1.1-4.el6.x86_64
xfsprogs-3.1.1-4.el6.x86_64
xfsdump-3.0.4-2.el6.x86_64
xfsprogs-qa-devel-3.1.1-4.el6.x86_64

LANG=en_US.UTF-8


How reproducible:

100%


Steps to Reproduce:
1. Create a XFS FS where allocation groups size is over 1 TiB.
2. Inspect the misleading error message.
3.
  

Actual results:
"Allocation group size (314995613) is not a multiple of the stripe unit (128)"

Expected results:
"Allocation group size (314995613 blks) is over maximum allocation groups size of 1 TiB (268435328 blks)"

Or a message related to the AG size being over maximum allowed.
Also a nice improvement is to add the units for values displayed in the message (blocks in this case).

Comment 2 Eric Sandeen 2011-08-12 22:33:07 UTC
Another good reason to stick with the defaults, and not fiddle with things like agcount? :)

But agreed, it could be a better error message.

Comment 3 Dave Chinner 2011-08-15 06:03:48 UTC
(In reply to comment #0)
> Description of problem:
> In my test system I created a FS where the default allocation groups size
> happened to be of about 1 GiB:
> 
> [root@localhost ~]# mkfs.xfs -L nss6_1 -f -d su=512k,sw=20 -l
> sunit=512,size=64m /dev/sdc
> meta-data=/dev/sdc               isize=256    agcount=37, agsize=268435328 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=9764864000, imaxpct=5
>          =                       sunit=128    swidth=2560 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=16384, version=2
>          =                       sectsz=512   sunit=64 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> However if I specify less allocation groups then the default, instead of
> getting a message complaining of AG size too big I get this error message:
> 
> [root@localhost ~]# mkfs.xfs -L nss6_1 -f -d agcount=31,su=512k,sw=20 -l
> sunit=512,size=64m ${dev1}
> Allocation group size (314995613) is not a multiple of the stripe unit (128)

That's not misleading - it's correctly detecting an error with the configuration you specified - but it's not the error you -expected-.

All it means is that the alignment checks are done before the size checks. 
Indeed, we have to do the checking that way, because when we cater for AG alignment during automatic sizing it changes the size of the AGs. Hence the
size checks must be done after the alignment checks. You've just triggered an alignment check failure before the size checks are done....

> Actual results:
> "Allocation group size (314995613) is not a multiple of the stripe unit (128)"
> 
> Expected results:
> "Allocation group size (314995613 blks) is over maximum allocation groups size
> of 1 TiB (268435328 blks)"

Just because you are trying to trigger a specific error, it doesn't mean that the specific error you want to see is the only possible error that can occur from the given configuration. A different error occurs doesn't necessarily mean there is a bug in the program.

> Or a message related to the AG size being over maximum allowed.
> Also a nice improvement is to add the units for values displayed in the message
> (blocks in this case).

Yes, I agree the error message could be more verbose and mention units, but that is a secondary issues and unrelated to your (incorrect) expectation of what error should be detected given the input configuration.

Comment 4 linuxteer 2011-08-15 21:30:54 UTC
Created attachment 518343 [details]
Output for mkfs.xfs when agcount set to primes < 37

Comment 5 linuxteer 2011-08-15 21:31:28 UTC
Created attachment 518344 [details]
Output for mkfs.xfs when agcount set to primes >= 37

Comment 6 linuxteer 2011-08-15 21:34:51 UTC
Yes Dave,
I did not check if alignment was a problem, since I was supplying mkfs.xfs with agcount, not agsize. AGsize was calculated by mkfs.xfs.
Incidentally, I used prime numbers on agcount to avoid getting the warning:
"Warning: AG size is a multiple of stripe width.  This can cause performance
problems by aligning all AGs on the same disk..."

To me, after checking that the agsize with agcount=37 was close to 1 TB, the "obvious" issue was that a smaller count forced the agsize over the 1 TB limit.

The agcount values of primes below 37 gave all the same error, except 2 and 5 which generate multiples of 128 and display what I thought was the correct message (which BTW includes units):
"agsize (1952972800b) too big, maximum is 268435455 blocks"
Providing agcount with primes 37 and over (up to 257), all worked fine.

Even more, the agsize in the alignment error message (314995613) seems to be reporting the ceiling function for number of blocks from the data section (9764864000) divided by the agcount provided (for primes below 37, except 2 & 5).
However, when specifying agcount with primes from 37 to 257, agsize was always adjusted to the next multiple of 128 (only 149 generates a multiple of 128), but for primes under 37 it was not.
See adjunct files for details.

When the defaults are allowed, the number of data blocks divided by 37 (default agcount) = 263915243.24 and default agsize selected was 268435328, the highest multiple of 128 below the 1 TB limit. So it seems to be optimizing for max agsize and then getting agcount.

Without checking the source code, to a user of mkfs.xfs trying to manipulate agcount (in my case, for parallel scalability optimization purposes) it looks like the agsize is correctly selected/adjusted to be aligned to sunit
But when using smaller values for agcount than default, it looks like a wrong error message was displayed. Especially if you see the 'agsize too big" error message for agcount set to 2 or 5.

Comment 7 linuxteer 2011-08-15 21:37:05 UTC
Created attachment 518347 [details]
Output for mkfs.xfs when agcount set to primes < 37

Selected the wrong file in the original upload. Apologies.

Comment 8 RHEL Product and Program Management 2011-10-07 16:12:43 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Eric Sandeen 2012-03-28 18:29:46 UTC
Dave, I do find this interesting, in the mkfs code:

                        if ((tmp_agsize >= XFS_AG_MIN_BLOCKS(blocklog)) &&
                            (tmp_agsize <= XFS_AG_MAX_BLOCKS(blocklog))) {
                                ...
                        } else {
                                if (nodsflag) {
                                        dsunit = dswidth = 0;
                                } else {
                                        fprintf(stderr,
_("Allocation group size (%lld) is not a multiple of the stripe unit (%d)\n"),
                                                (long long)agsize, dsunit);
                                        exit(1);
                                }
                        }

At that point we have tried to round the agsize up and down to align it, and have found it to be too large in both cases.  At the exit(1) point, it seems like it'd make some sense to point that out in the error message.

agsize wasn't specified, it was calculated given the specified agcount.  There were efforts to fix agsize up w.r.t. stripe geometry, but no efforts to make it fit within the maximum size; hence I tend to agree that if one specifies agcount so small that agsize is out of bounds, that does seem like a reasonable first error message to provide.

Having said all that, this seems like the sort of thing which could be tweaked upstream, but doesn't necessarily rise to the level of requiring a RHEL package update...

-Eric

Comment 10 Eric Sandeen 2012-05-14 15:34:13 UTC
commit ddf12ea5dc56a728f24d24c5d7403c3412b40b86
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Wed Mar 28 22:23:11 2012 -0500

    mkfs.xfs: print std info if agcount makes agsize out of bounds
    
    When specifying a too-small agcount with stripe geometry,
    mkfs.xfs can fail with a somewhat unexpected message:
    
    $ mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31,su=512k,sw=20
    Allocation group size (314995613) is not a multiple of the stripe unit (128)
    
    This strikes me as especially odd because normally, mkfs.xfs
    tries to fix up the agsize to be a stripe multiple.  The only way
    we get to the above error message is if ag _size_ is out of bounds;
    exiting with an error about alignment rather than about size
    seems odd.
    
    Maybe below is too clever, but if by the time we've decided that
    agsize is out of bounds after rounding it both up and down,
    as necessary, to get to a stripe-width multiple, calling
    validate_ag_geometry() will give us the same standard message as
    if we had specified no stripe geometry:
    
    $ mkfs/mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31,su=512k,sw=20
    agsize (314995613b) too big, maximum is 268435455 blocks
    Usage: mkfs.xfs
    ...
    
    $ mkfs/mkfs.xfs -f -d file,name=fsfile,size=9764864000b,agcount=31
    agsize (314995613b) too big, maximum is 268435455 blocks
    Usage: mkfs.xfs
    ...
    
    Also, tidy up error message to explicitly state "blocks" not "b"
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>

Comment 11 RHEL Product and Program Management 2012-09-13 05:39:19 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 17 errata-xmlrpc 2013-02-21 11:00:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0481.html


Note You need to log in before you can comment on or make changes to this bug.