Bug 1212655

Summary: mkfs.xfs does not read/detect lvm raid layout when stripe unit is 4k
Product: Red Hat Enterprise Linux 7 Reporter: lejeczek <peljasz>
Component: xfsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED NOTABUG QA Contact: Filesystem QE <fs-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: peljasz
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-22 16:27:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description lejeczek 2015-04-16 22:52:05 UTC
Description of problem:

if I remember correctly mkfs.xfs would look into device and take care of stips and stripes. From my old notes, long time ago:

  --- Segments ---
  Logical extent 0 to 228929:
    Type    striped
    Stripes   2
    Stripe size   64.00 KiB
    Stripe 0:
      Physical volume /dev/sds
      Physical extents  0 to 114464
    Stripe 1:
      Physical volume /dev/sdt
      Physical extents  0 to 114464

# and mkfs.xfs does find underlaying geometry correctly!!

mkfs.xfs /dev/mapper/h200Internal-0
meta-data=/dev/mapper/h200Internal-0 isize=256    agcount=32, agsize=7325744 blks
         =                       sectsz=4096  attr=2, projid32bit=0
data     =                       bsize=4096   blocks=234423808, imaxpct=25
         =                       sunit=16     swidth=32 blks # <------------ HERE

today, I created a lv like this:

lvcreate --type raid5 -i 9 -I 4 -n raid5 -l 100%pv 5tb.Toshiba-Lot $(echo /dev/sd{g..p}

so it uses 10 drives/pvs, then usual mkfs.xfs but then:

meta-data=/dev/5tb.Toshiba-Lot/raid5 isize=256    agcount=41, agsize=268435455 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=10988467200, imaxpct=5
         =                       sunit=0      swidth=0 blks # <--HERE ?????
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

is this a bug??


Version-Release number of selected component (if applicable):

xfsprogs-3.2.1-6.el7.x86_64
3.10.0-229.1.2.el7.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Eric Sandeen 2015-04-16 22:59:20 UTC
Can you provide the output of:

# blockdev --getss --getpbsz --getiomin --getioopt --getbsz /dev/5tb.Toshiba-Lot/raid5

please?

thanks,
-Eric

Comment 3 lejeczek 2015-04-16 23:08:07 UTC
blockdev --getss --getpbsz --getiomin --getioopt --getbsz /dev/5tb.Toshiba-Lot/raid5
512
4096
4096
36864
4096

Comment 4 Eric Sandeen 2015-04-16 23:13:40 UTC
So:

Sector size: 512
Physical block size: 4096
Minimum IO size: 4096
Optimal IO size: 36k

mkfs uses minimum and optimal sizes for stripe unit and stripe width:

        val = blkid_topology_get_minimum_io_size(tp);
        *sunit = val;
        val = blkid_topology_get_optimal_io_size(tp);
        *swidth = val;

but a stripe unit which matches the physical block size is not a stripe unit at all:

        /*
         * If the reported values are the same as the physical sector size
         * do not bother to report anything.  It will only cause warnings
         * if people specify larger stripe units or widths manually.
         */
        if (*sunit == *psectorsize || *swidth == *psectorsize) {
                *sunit = 0;
                *swidth = 0;
        }

That's why it's coming up zero.

And sunit == psectorsize because that's what you specified on the lvcreate commandline, with -I 4:

       -I, --stripesize StripeSize
              Gives the number of kilobytes for the granularity of the stripes.

so the problem here is your lvcreate commandline, I think.  Did you really want a 4k stripe unit?

-Eric

Comment 5 Eric Sandeen 2015-04-17 02:11:46 UTC
So, I'm inclined to close this NOTABUG.  mkfs.xfs intentionally ignores a stripe unit of the size you specified... thoughts?

Comment 6 lejeczek 2015-04-17 06:03:30 UTC
yes, it seems ok when, eg. -I 8

mkfs.xfs /dev/5tb.Toshiba-Lot/raid5 
meta-data=/dev/5tb.Toshiba-Lot/raid5 isize=256    agcount=41, agsize=268435454 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=10988467200, imaxpct=5
         =                       sunit=2      swidth=18 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

but what's wrong with stripe size of 4Kb? I though arrays spanning larger number of drives are better off with smaller stripe sizes so I went with 4Kb.

but also(or separate thing happens to my ext4 on the same VG, this time it's a simple stripe:

lvcreate -i 4 -I 4 -n raid0

and dumpe2fs -h

Inode blocks per group:   128
RAID stripe width:        4
Flex block group size:    16
Filesystem created:       Thu Apr 16 23:56:14 2015

so it seems that 4K is the rule across the OS, what's the reasoning behind it?

Comment 7 Eric Sandeen 2015-04-17 19:15:13 UTC
The reason mkfs.xfs rejects/ignores stripe unit == physical sector size as a valid stripe geometry is because some non-striped storage reports an "optimal IO size" as the physical sector size:

commit 3dc7147f03cdd4cfe689d78d4ca4b2650c49a263
Author: Eric Sandeen <sandeen>
Date:   Wed Dec 12 17:26:24 2012 -0600

    mkfs.xfs: don't detect geometry values <= psectorsize
    
    blkid_get_topology() ignores devices which report 512
    as their minimum & optimal IO size, but we should ignore
    anything up to the physical sector size; otherwise hard-4k
    sector devices will report a "stripe size" of 4k, and warn
    if anything larger is specified:
    
    # modprobe scsi_debug physblk_exp=3 num_parts=2 dev_size_mb=128
    # mdadm --create /dev/md1 --level=0 --raid-devices=2  -c 4 /dev/sdb1 /dev/sdb2
    # mkfs.xfs -f -d su=16k,sw=2 /dev/md1
    mkfs.xfs: Specified data stripe unit 32 is not the same as the volume stripe unit 
    mkfs.xfs: Specified data stripe width 64 is not the same as the volume stripe widt
    ...

Generally you'll want stripe units larger than a 4k, which is a single filesystem block.  Unless there's a compelling reason to need a 4k stripe unit, I think your best path forward is to just create your raid with a larger stripe unit.

Comment 8 Eric Sandeen 2015-04-22 16:27:03 UTC
Closing NOTABUG; this is working as designed.