Bug 1612058 - Arbiter brick can contain less files than expected when we specify average file size explicitly
Summary: Arbiter brick can contain less files than expected when we specify average fi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: Raghavendra Talur
QA Contact: Valerii Ponomarov
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-08-03 11:30 UTC by Valerii Ponomarov
Modified: 2019-02-11 10:17 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Heketi passed incorrect options to the XFS format command when creating bricks. This meant that inode tables on arbiter bricks were being sized as though they were for data bricks, and therefore reserved a large number of inodes for data that would never be written to the arbiter brick. Heketi now passes the correct options and the inode tables of arbiter bricks are sized to ensure they can hold metadata for the rest of the volume or volume set.
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github heketi heketi issues 1296 0 None closed Consider different xfs formatting options for arbiter brick 2020-10-13 11:57:53 UTC
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:25:05 UTC

Description Valerii Ponomarov 2018-08-03 11:30:06 UTC
Description of problem: We expect that if we specify average file size for arbiter volume, then arbiter brick should be able to create amount of files equal to "(volume_size / avg_file_size)". But, it is not so de-facto. Underlying volumes have expected size, but resulting size of XFS filesystem is significantly less than expected.


Version-Release number of selected component (if applicable):
Heketi Image:          rhgs3/rhgs-volmanager-rhel7:v3.10
Heketi Image ID:       docker-pullable://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-volmanager-rhel7@sha256:51fabf34dfd0dd795fb47b9923a67d32cf866e4a1b8ed8482b0f83dfd9aa1fa8

$ oc version
oc v3.10.14
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://vp-ansible-v310-14-2-master-0:8443
openshift v3.10.14
kubernetes v1.10.0+b81c8f8

sh-4.2# gluster --version
glusterfs 3.8.4 built on Jul 12 2018 12:36:38
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

How reproducible: 100%

Steps to Reproduce:
1. Create arbiter volume of 13Gb size specifying avg file size as 500kb:
   $ heketi-cli volume create --name=500kb-avg --size=13 --gluster-volume-options='user.heketi.arbiter true,user.heketi.average-file-size 500'
2. Go to the arbiter brick
3. Create as many files as you can

Actual results:
Only 14373 files was possible to create.

Expected results:
27262 files are expected to be created.

Comments: real available minimum space we can get for arbiter brick is 9.3Mb instead of expected about 16MB. It is true for arbiter volumes with 900kb+ avg file size.

Additional info:

$ heketi-cli volume list
Id:9e56f7819a4d6a524cca496d269440ec    Cluster:ed3af51a8e17862a2ba7b886d4f9ad9e    Name:heketidbstorage

$ heketi-cli volume create --name=500kb-avg --size=13 --gluster-volume-options='user.heketi.arbiter true,user.heketi.average-file-size 500'
Name: 500kb-avg
Size: 13
Volume Id: 92931fc40faad24a65e86f1b46ee5e2f
Cluster Id: ed3af51a8e17862a2ba7b886d4f9ad9e
Mount: 10.70.46.6:500kb-avg
Mount Options: backup-volfile-servers=10.70.47.64,10.70.47.164
Block: false
Free Size: 0
Reserved Size: 0
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 3

$ heketi-cli volume list
Id:92931fc40faad24a65e86f1b46ee5e2f    Cluster:ed3af51a8e17862a2ba7b886d4f9ad9e    Name:500kb-avg
Id:9e56f7819a4d6a524cca496d269440ec    Cluster:ed3af51a8e17862a2ba7b886d4f9ad9e    Name:heketidbstorage

$ oc get pods
NAME                                      READY     STATUS    RESTARTS   AGE
glusterblock-cns-provisioner-dc-1-9wpzc   1/1       Running   0          13h
glusterfs-cns-9fzvj                       1/1       Running   0          13h
glusterfs-cns-glz7g                       1/1       Running   0          13h
glusterfs-cns-pblkc                       1/1       Running   0          13h
heketi-cns-1-sqhh6                        1/1       Running   0          13h

$ oc rsh glusterfs-cns-9fzvj
sh-4.2# gluster v info

Volume Name: 500kb-avg
Type: Replicate
Volume ID: 450a54ee-2325-4228-89d2-b0124a12a30e
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.64:/var/lib/heketi/mounts/vg_e503be91d5ab04a216fc7f2083233c25/brick_cbda8ce01f3d356d51a9ef57b1e099eb/brick
Brick2: 10.70.47.164:/var/lib/heketi/mounts/vg_ac4f3b15e121b39e15a210d22e3b409c/brick_29972828bec6b8a6b7f917010cba01c0/brick
Brick3: 10.70.46.6:/var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c/brick (arbiter)
Options Reconfigured:
user.heketi.average-file-size: 500
user.heketi.arbiter: true
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on

Volume Name: heketidbstorage
Type: Replicate
Volume ID: 993af0bc-24ae-4e4c-81c6-95bf6e9567c1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.70.47.164:/var/lib/heketi/mounts/vg_f19d67fa640cc26e007b1fd1766aeff9/brick_ca13d9d9e7b122e39234772483e7ae3f/brick
Brick2: 10.70.46.6:/var/lib/heketi/mounts/vg_ecf35005c0da09ab36325168e793334d/brick_fd370dd25f4f1a2c93aaea2d27f0d6eb/brick
Brick3: 10.70.47.64:/var/lib/heketi/mounts/vg_d922736ca831714ed37240c05385c14d/brick_3ee556751d5dfcff2b526a1bd7bd21f9/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on

sh-4.2# df -Th
Filesystem                                                                             Type      Size  Used Avail Use% Mounted on
overlay                                                                                overlay    40G  1.8G   39G   5% /
/dev/sdc                                                                               xfs        40G   42M   40G   1% /run
devtmpfs                                                                               devtmpfs   16G     0   16G   0% /dev
shm                                                                                    tmpfs      64M     0   64M   0% /dev/shm
/dev/mapper/docker--vol-dockerlv                                                       xfs        40G  1.8G   39G   5% /run/secrets
/dev/mapper/rhel_dhcp46--210-root                                                      xfs        35G  2.4G   33G   7% /etc/ssl
tmpfs                                                                                  tmpfs      16G  5.1M   16G   1% /run/lvm
tmpfs                                                                                  tmpfs      16G     0   16G   0% /sys/fs/cgroup
tmpfs                                                                                  tmpfs      16G   16K   16G   1% /run/secrets/kubernetes.io/serviceaccount
/dev/mapper/vg_ecf35005c0da09ab36325168e793334d-brick_fd370dd25f4f1a2c93aaea2d27f0d6eb xfs       2.0G   33M  2.0G   2% /var/lib/heketi/mounts/vg_ecf35005c0da09ab36325168e793334d/brick_fd370dd25f4f1a2c93aaea2d27f0d6eb
/dev/mapper/vg_4b472ca392527038bbceaf6cb5c6c134-brick_c6ce97aa651332654482d01979e19e0c xfs        22M  1.6M   20M   8% /var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c

sh-4.2# cd /var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c

sh-4.2# touch file{1..14374}
touch: cannot touch 'file14374': No space left on device

sh-4.2# ls -l | grep -c file
14373

sh-4.2# pwd
/var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c

sh-4.2# df -Th /var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c
Filesystem                                                                             Type  Size  Used Avail Use% Mounted on
/dev/mapper/vg_4b472ca392527038bbceaf6cb5c6c134-brick_c6ce97aa651332654482d01979e19e0c xfs    22M  9.1M   13M  43% /var/lib/heketi/mounts/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c


sh-4.2# lvdisplay /dev/mapper/vg_4b472ca392527038bbceaf6cb5c6c134-brick_c6ce97aa651332654482d01979e19e0c
  --- Logical volume ---
  LV Path                /dev/vg_4b472ca392527038bbceaf6cb5c6c134/brick_c6ce97aa651332654482d01979e19e0c
  LV Name                brick_c6ce97aa651332654482d01979e19e0c
  VG Name                vg_4b472ca392527038bbceaf6cb5c6c134
  LV UUID                T0H0N0-fd1y-u5Df-uUSG-62Th-m1CY-kTW2vj
  LV Write Access        read/write
  LV Creation host, time vp-ansible-v310-14-2-app-cns-2, 2018-08-03 11:14:17 +0000
  LV Pool name           tp_c6ce97aa651332654482d01979e19e0c
  LV Status              available
  # open                 1
  LV Size                28.00 MiB
  Mapped size            54.46%
  Current LE             7
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:12

Comment 2 Raghavendra Talur 2018-08-03 11:57:50 UTC
I am working on this.

Comment 3 Raghavendra Talur 2018-08-08 18:57:37 UTC
Analysis:

when a 13G volume with average size of 500 KB is created:
we get an arbiter calculation of: bricksize: 13631488, filesize: 500, arbiterbricksize: 27262
mkfs command: mkfs.xfs -i size=512 -n size=8192 /dev/mapper/vg_8b7da1a300b44823878dfab9c4112224-brick_bbc42f67c168b1b8d1e6f9b176088491

Hence, with each file taking up 1K in inode block, we should be able to create 27262 files.
However, the stat info is

  File: ‘/var/lib/heketi/mounts/vg_8b7da1a300b44823878dfab9c4112224/brick_bbc42f67c168b1b8d1e6f9b176088491’
  Size: 19              Blocks: 0          IO Block: 4096   directory
Device: fc04h/64516d    Inode: 14336       Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:unlabeled_t:s0
Access: 1970-01-01 05:30:00.000000000 +0530
Modify: 2018-08-08 23:03:07.302437334 +0530
Change: 2018-08-08 23:03:07.302437334 +0530
 Birth: -


We can see only 14336 inodes available.


Formatting the space partition with xfs inode option of maxpct=100 gives us about 38K inodes. 38K being greater than 27262 satisfies our requirement.

We get more inodes than the calculation in heketi because in heketi we also accounted for blocks needed by those 27262 files' extended attributes. To test the same, I create a arbiter volume and created a file. Then I copied over the file to xfs partition with maxpct=100, 27262 times.  This test worked. 

Other tests performed:
I added a extended attribute to all files using setfattr -n user.rtalur1 -v a10byteval
I added another set of ea to all files using setfattr -n user.rtalur2 -v a10byteval

At this point, we still have about 75% of the partition free. I think this is a satisfactory test.

Hence, I will create a patch which changes xfs partition options for arbiter brick.

Comment 4 Raghavendra Talur 2018-08-08 19:09:57 UTC
(In reply to Raghavendra Talur from comment #3)
> Analysis:
> 
> when a 13G volume with average size of 500 KB is created:
> we get an arbiter calculation of: bricksize: 13631488, filesize: 500,
> arbiterbricksize: 27262
> mkfs command: mkfs.xfs -i size=512 -n size=8192
> /dev/mapper/vg_8b7da1a300b44823878dfab9c4112224-
> brick_bbc42f67c168b1b8d1e6f9b176088491
> 
> Hence, with each file taking up 1K in inode block, we should be able to
> create 27262 files.
> However, the stat info is
> 
>   File:
> ‘/var/lib/heketi/mounts/vg_8b7da1a300b44823878dfab9c4112224/
> brick_bbc42f67c168b1b8d1e6f9b176088491’
>   Size: 19              Blocks: 0          IO Block: 4096   directory
> Device: fc04h/64516d    Inode: 14336       Links: 3
> Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Context: system_u:object_r:unlabeled_t:s0
> Access: 1970-01-01 05:30:00.000000000 +0530
> Modify: 2018-08-08 23:03:07.302437334 +0530
> Change: 2018-08-08 23:03:07.302437334 +0530
>  Birth: -
> 
> 
> We can see only 14336 inodes available.

I should have pasted the df output, not stat. Below is df output showing about 14K inodes.

xfs     14K    26   14K    1%   22M  1.6M   20M   8% -

> 
> 
> Formatting the space partition with xfs inode option of maxpct=100 gives us
> about 38K inodes. 38K being greater than 27262 satisfies our requirement.
> 
> We get more inodes than the calculation in heketi because in heketi we also
> accounted for blocks needed by those 27262 files' extended attributes. To
> test the same, I create a arbiter volume and created a file. Then I copied
> over the file to xfs partition with maxpct=100, 27262 times.  This test
> worked. 
> 
> Other tests performed:
> I added a extended attribute to all files using setfattr -n user.rtalur1 -v
> a10byteval
> I added another set of ea to all files using setfattr -n user.rtalur2 -v
> a10byteval
> 
> At this point, we still have about 75% of the partition free. I think this
> is a satisfactory test.
> 
> Hence, I will create a patch which changes xfs partition options for arbiter
> brick.

Comment 9 Raghavendra Talur 2018-08-10 17:28:38 UTC
Patch posted at https://github.com/heketi/heketi/pull/1306

Comment 11 Valerii Ponomarov 2018-08-17 15:46:03 UTC
Verified it using "rhgs3/rhgs-volmanager-rhel7:3.4.0-3".
Where we can create amount of files on arbiter bricks equal to "volume size / avg file size".

Comment 12 Anjana KD 2018-08-31 00:18:27 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 13 Anjana KD 2018-09-07 12:29:26 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 14 John Mulligan 2018-09-07 17:40:50 UTC
Doc Text looks OK

Comment 16 errata-xmlrpc 2018-09-12 09:23:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.