961501 – mkfs.xfs: go into multidisk mode when geometry is on cmdline

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 961501 - mkfs.xfs: go into multidisk mode when geometry is on cmdline

Summary: mkfs.xfs: go into multidisk mode when geometry is on cmdline

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	xfsprogs
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Eric Sandeen
QA Contact:	Boris Ranto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	968418 971698
TreeView+	depends on / blocked

Reported:	2013-05-09 18:23 UTC by Eric Sandeen
Modified:	2013-11-21 21:19 UTC (History)
CC List:	8 users (show)
Fixed In Version:	xfsprogs-3.1.1-11.el6
Doc Type:	Bug Fix
Doc Text:	When stripe geometry was specified manually to the mkfs.xfs utility, mkfs.xfs did not properly select "multidisk mode" as it does when stripe geometry is automatically detected. As a result, a less than optimal number of allocation groups were created. With this update, multidisk mode is selected properly, and a larger number of allocation groups are created.
Clone Of:
Clones:	968418 (view as bug list)
Environment:
Last Closed:	2013-11-21 21:19:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:1657	0	normal	SHIPPED_LIVE	xfsprogs bug fix update	2013-11-20 21:53:13 UTC

Description Eric Sandeen 2013-05-09 18:23:48 UTC

RHS needs this commit to properly create more AGs on striped storage:

commit 0cfda29e1fded3975562824895a95e1a4dfe4cbc
Author: Eric Sandeen <sandeen>
Date:   Thu Dec 6 15:52:54 2012 -0600

    mkfs.xfs: go into multidisk mode when geometry is on cmdline
    
    In the course of some other investigations, I found that
    calc_default_ag_geometry() doesn't go into "multidisk" mode
    unless stripe geometry is *detected* (i.e. by the blkid routines).
    
    Specifying a geometry on the cmdline is *not* sufficient, because
    we test (ft.dsunit | ft.dswidth) which are not set by the cmdline
    options.
    
    If we move the AG calculations to after we have set dsunit & dswdith,
    then we'll pick up either cmdline-specified or blkid-detected
    geometry, and go into "multidisk" mode for AG size/count
    calculations in both cases.
    
    So now for a ~5T fs, for example, we'd make several more
    AGs:
    
    # truncate --size=5t fsfile
    # mkfs.xfs -N -d su=128k,sw=8 fsfile | grep agcount
    meta-data=fsfile                 isize=256    agcount=5, agsize=268435424 blks
    # mkfs/mkfs.xfs -N -d su=128k,sw=8 fsfile | grep agcount
    meta-data=fsfile                 isize=256    agcount=32, agsize=41943008 blks
    
    Signed-off-by: Eric Sandeen <sandeen>
    Reviewed-by: Christoph Hellwig <hch>
    Signed-off-by: Ben Myers <bpm>

Comment 1 Eric Sandeen 2013-05-10 03:04:31 UTC

xfstests xfs/292 tests this:

# FS QA Test No. 292
#
# Ensure mkfs with stripe geometry goes into multidisk mode
# which results in more AGs

Comment 3 Ben England 2013-05-24 16:43:35 UTC

perf team is tracking this for support of RHS 2.1, thanks Eric for requesting this.

Comment 4 Ben England 2013-06-04 15:23:43 UTC

Is this fix making it into 6.4 z-stream?  We need RHS to start using this.  I did not see the above xfsprogs version in the RHEL6.5 nightly build dated June 4th, nor did I find it in http://download.lab.bos.redhat.com/composes/nightly/latest-RHEL6.5/6.5/Server/x86_64/os/Packages/, am I looking in the right place?

I reproduced a problem with XFS using RHEL6.4 with glusterfs-3.4.0.8 if only 5 allocation groups with smallfile benchmark (Peter Portante reported it with Catalyst workload before). I think this fix would have prevented the problem by forcing more allocation groups, will try xfsprogs version that has fix.

The "sync" command hangs for at least 15 min after I append to a bunch of small files, and perf utility shows that xfsalloc threads are spending time waiting on a spin lock, I think this is associated with an allocation group.  This happened on multiple servers, not a hardware problem.  

19.77%  [kernel]            [k] _spin_lock
 15.91%  [xfs]               [k] xfs_alloc_busy_trim
 12.96%  [xfs]               [k] xfs_btree_get_rec
  8.34%  [xfs]               [k] xfs_alloc_get_rec
  5.30%  [xfs]               [k] xfs_alloc_ag_vextent_near
  5.16%  [xfs]               [k] xfs_btree_get_block
  4.54%  [xfs]               [k] xfs_btree_increment
  3.88%  [xfs]               [k] xfs_alloc_compute_aligned
  3.43%  [xfs]               [k] xfs_btree_readahead
  3.29%  [xfs]               [k] xfs_btree_rec_offset
  2.26%  [xfs]               [k] _xfs_buf_find
  2.19%  [xfs]               [k] xfs_btree_decrement
  2.01%  [xfs]               [k] xfs_btree_rec_addr
  0.99%  [xfs]               [k] xfs_trans_buf_item_match

I'm getting stack traces like this in /var/log/messages:

Jun  3 19:00:47 gprfs048 kernel: INFO: task sync:25314 blocked for more than 120 seconds.

[root@gprfs048 ~]# xfs_info /mnt/brick0
meta-data=/dev/mapper/vg_brick0-lv isize=512    agcount=5, agsize=268435392 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=1167851520, imaxpct=5
         =                       sunit=64     swidth=640 blks
naming   =version 2              bsize=8192   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@gprfs048 ~]# mount | grep xfs
/dev/mapper/vg_brick0-lv on /mnt/brick0 type xfs (rw,noatime,inode64)


workload:

after this command runs successfully, run "sync" command:

13-06-04-09-07-04 : command 2352: cd /root/smallfile-v1.9.13 ; ./smallfile_cli.py  --top /mnt/glusterfs/smf.d-pass2 --host-set gprfc088,gprfc089,gprfc090,gprfc091,gprfc092,gprfc094,gprfc095,gprfc096 --operation append --threads 4 --file-size 4 --record-size 0 --files-per-dir 100 --dirs-per-dir 10 --files 32768 --response-times Y --stonewall N --pause 500 > /shared/benchmarks/gluster_test/logs/13-06-02-20-02-16/smallfile.13-06-04-09-07-04

Comment 5 Eric Sandeen 2013-06-04 15:28:30 UTC

zstream has not yet been granted.  Patience... the Wheels of Process must turn.

For testing you can grab it from here:
http://download.devel.redhat.com/brewroot/packages/xfsprogs/3.1.1/11.el6/

Thanks,
-Eric

Comment 6 Ben England 2013-06-05 23:34:35 UTC

when I use mkfs.xfs ... -d agcount=32,sw=10...  I get a warning from mkfs.  32 is the value that your mkfs patch generates if sw=10 is specified (bz 961501).  I get rid of the warning if I use 31 instead of 32 

Also, the warning is slightly off -- it says that AG size is a multiple of stripe width, but it is not, right?  Maybe it wants agcount value that has no common factors with sw so there is even wear on drives?

Does it matter?  Example:

# mkfs -t xfs -f -i size=512 -n size=8192 -d agcount=32,su=256k,sw=10 -L RHSbrick0 /dev/vg_brick0/lv

Warning: AG size is a multiple of stripe width.  This can cause performance
problems by aligning all AGs on the same disk.  To avoid this, run mkfs with
an AG size that is one stripe unit smaller, for example 36495296.
meta-data=/dev/vg_brick0/lv      isize=512    agcount=32, agsize=36495360 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=1167851520, imaxpct=5
         =                       sunit=64     swidth=640 blks
naming   =version 2              bsize=8192   ascii-ci=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Comment 7 Eric Sandeen 2013-06-06 04:15:01 UTC

It is a multiple, yes:

In blocks: 36495360 / (256*1024/4096 * 10) = 57024.0


In this version and versions prior, mkfs will slightly lower the agsize if needed to avoid the multiple, by default.  If you specify agcount it changes when calculations are done and sometimes issues this warning.  If you think that's a problem which must be fixed in RHEL, it needs a new bug.

The whole point of this change was to not *need* to specify an agcount on the cmdline in order to go into "multidisk mode."  So behavior when specifying agcount is not relevant or related to this bug, nor is it changed behavior with this patch.

Comment 11 Ben England 2013-10-12 19:07:43 UTC

is this change in RHEL6.5 as well?  When I yum installed xfsprogs from http://download.lab.bos.redhat.com/nightly/latest-RHEL6.5/6.5/Server/x86_64/os

I get this:

[root@perf88 ~]# rpm -q xfsprogs
xfsprogs-3.1.1-4.el6.x86_64

and  mkfs does this:

[root@perf88 network-scripts]# lvcreate --name brick1 --size 2750G vg_bricks /dev/sdb
  Logical volume "brick1" created

[root@perf88 network-scripts]# mkfs -t xfs -L perf88-brk1 -i size=512 -n size=8192 -d su=256k,sw=10 /dev/vg_bricks/brick1
meta-data=/dev/vg_bricks/brick1  isize=512    agcount=4, agsize=180223936 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=720895744, imaxpct=5
         =                       sunit=64     swidth=640 blks
naming   =version 2              bsize=8192   ascii-ci=0
log      =internal log           bsize=4096   blocks=352000, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1

Comment 12 Ben England 2013-10-12 19:42:35 UTC

Sorry, user brain damage on my part.  I updated pointer in yum repo file for RHEL65 but not ScalableFileSystem, so I didn't get the right xfsprogs. Never mind.

Comment 13 errata-xmlrpc 2013-11-21 21:19:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1657.html

Note You need to log in before you can comment on or make changes to this bug.