Hello, could you try the experiment on a volume without sharding enabled and see if there is a difference in write performance between 2x2 and 2x(2+1) volumes?
Hi, I think I have the experiment you were looking for. Indeed, it seems like setting the features.shard to off will big the performance up again. Regards. Max Experiment: [root@localhost ~]# gluster volume create storage replica 3 arbiter 1 192.168.122.14:/data/brick1 192.168.122.15:/data/brick1 192.168.122.167:/data/brick1-arbiter 192.168.122.167:/data/brick1 192.168.122.230:/data/brick1 192.168.122.14:/data/brick1-arbiter force volume create: storage: success: please start the volume to access data [root@localhost ~]# gluster volume start storage volume start: storage: success [root@localhost ~]# gluster volume info Volume Name: storage Type: Distributed-Replicate Volume ID: c601be67-2857-4bfd-a226-504e8d1f3c5b Status: Started Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: 192.168.122.14:/data/brick1 Brick2: 192.168.122.15:/data/brick1 Brick3: 192.168.122.167:/data/brick1-arbiter (arbiter) Brick4: 192.168.122.167:/data/brick1 Brick5: 192.168.122.230:/data/brick1 Brick6: 192.168.122.14:/data/brick1-arbiter (arbiter) Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@localhost ~]# gluster volume set storage features.shard on volume set: success [root@localhost ~]# gluster volume set storage features.shard-block-size 16MB volume set: success [root@localhost ~]# mkdir /srv/storage [root@localhost ~]# mount -t glusterfs 127.0.0.1:storage /srv/storage/ [root@localhost ~]# cd /srv/storage/ [root@localhost storage]# df -h /srv/storage/ Filesystem Size Used Avail Use% Mounted on 127.0.0.1:storage 20G 2,0G 18G 11% /srv/storage [root@localhost storage]# dd if=/dev/zero of=testfile count=1 bs=10M 1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 11,6287 s, 902 kB/s [root@localhost storage]# gluster volume set storage features.shard off volume set: success [root@localhost storage]# dd if=/dev/zero of=testfile count=1 bs=10M 1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 0,0328133 s, 320 MB/s [root@localhost storage]# gluster volume set storage features.shard on volume set: success [root@localhost storage]# dd if=/dev/zero of=testfile count=1 bs=10M 1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 11,2339 s, 933 kB/s [root@localhost storage]# gluster volume remove-brick storage replica 2 192.168.122.167:/data/brick1-arbiter 192.168.122.14:/data/brick1-arbiter force Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: success [root@localhost storage]# dd if=/dev/zero of=testfile count=1 bs=10M1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 0,0843147 s, 124 MB/s [root@localhost storage]# gluster volume set storage features.shard off [root@localhost storage]# dd if=/dev/zero of=testfile count=1 bs=10M 1+0 records in 1+0 records out 10485760 bytes (10 MB) copied, 0,0365119 s, 287 MB/s [root@localhost storage]#
Accoring to the experiment is seems that it is a bad idea to do [root@localhost storage]# gluster volume set storage features.shard off because then all files are shrinked to the shard-block-size.
(In reply to Max Raba from comment #3) > Accoring to the experiment is seems that it is a bad idea to do > > [root@localhost storage]# gluster volume set storage features.shard off > > because then all files are shrinked to the shard-block-size. Yes I was only trying to isolate the cause. I'm able to recreate the issue. I'll update once I find out what the issue is. Note; Making the BZ description private upon the reporter's request as it contains some sensitive IP information.
REVIEW: http://review.gluster.org/15647 (afr: Take full locks in arbiter only for data transactions) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/15647 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu) ------ commit a1ee61051c6d284b9e632b975227c07cb4dda93d Author: Ravishankar N <ravishankar> Date: Fri Oct 14 16:09:08 2016 +0530 afr: Take full locks in arbiter only for data transactions Problem: Sharding exposed a bug in arbiter config. where `dd` throughput was extremely slow. Shard xlator was sending a fxattrop to update the file size immediately after a writev. Arbiter was incorrectly over-riding the LLONGMAX-1 start offset (for metadata domain locks) for this fxattrop, causing the inodelk to be taken on the data domain. And since the preceeding writev hadn't released the lock (afr does a 'lazy' unlock if write succeeds on all bricks), this degraded to a blocking lock causing extra lock/unlock calls and delays. Fix: Modify flock.l_len and flock.l_start to take full locks only for data transactions. > Reviewed-on: http://review.gluster.org/15641 > Smoke: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> (cherry picked from commit 3a97486d7f9d0db51abcb13dcd3bc9db935e3a60) Change-Id: I906895da2f2d16813607e6c906cb4defb21d7c3b BUG: 1375125 Signed-off-by: Ravishankar N <ravishankar> Reported-by: Max Raba <max.raba> Reviewed-on: http://review.gluster.org/15647 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
Hi, I installed glusterfs nightly build rpm(2016-10-25),which link from :http://artifacts.ci.centos.org/gluster/nightly/release-3.8/7/x86_64/?C=M;O=D . and create a replicate 3 arbiter 1 volume and enable features.shard (set it enabe or on ),info as : [root@horeba ~]# gluster --version glusterfs 3.8.5 built on Oct 25 2016 02:09:23 [root@horeba ~]# gluster volume info data_volume3 Volume Name: data_volume3 Type: Distributed-Replicate Volume ID: cd5f4322-11e3-4f18-a39d-f0349b8d2a0c Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: 192.168.10.71:/data_sdaa/brick Brick2: 192.168.10.72:/data_sdaa/brick Brick3: 192.168.10.73:/data_sdaa/brick (arbiter) Brick4: 192.168.10.71:/data_sdc/brick Brick5: 192.168.10.73:/data_sdc/brick Brick6: 192.168.10.72:/data_sdc/brick (arbiter) Brick7: 192.168.10.72:/data_sde/brick Brick8: 192.168.10.73:/data_sde/brick Brick9: 192.168.10.71:/data_sde/brick (arbiter) Brick10: 192.168.10.71:/data_sde/brick1 Brick11: 192.168.10.72:/data_sdc/brick1 Brick12: 192.168.10.73:/data_sdaa/brick1 (arbiter) Options Reconfigured: server.allow-insecure: on features.shard: enable features.shard-block-size: 512MB storage.owner-gid: 36 storage.owner-uid: 36 nfs.disable: on cluster.data-self-heal-algorithm: full auth.allow: * network.ping-timeout: 10 performance.low-prio-threads: 32 performance.io-thread-count: 32 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on glusterfs mount one host , add dd test it ,reselst are : [root@horebb test6]# for i in `seq 3`; do dd if=/dev/zero of=./file bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 55.9329 s, 19.2 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 54.8481 s, 19.6 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 57.9079 s, 18.5 MB/s and disable features.shard config and test it : [root@horeba ~]# gluster volume reset data_volume3 features.shard volume reset: success: reset volume successful [root@horebb test6]# for i in `seq 3`; do dd if=/dev/zero of=./filetest bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.25607 s, 855 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.18359 s, 907 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.29374 s, 830 MB/s and I also download master source code from 2016-10-25 (git clone https://github.com/gluster/glusterfs ) and builded rpm , install builded glusterfs rpm , the test result is the same as nightly build result . so ,enable glusterfs volume shard config performance is bad problem also exist ,please see how to resolve it , as we known ,shard config is important for glusterfs usage .
Hi humaorong, can you try the same test on a normal replica-3 volume with and without sharding enabled and see if you are seeing similar perf differences?
Hi Ravishankar N : I test as follow : gluster volume is replication 3 not arbiter : [root@horeba ~]# gluster volume info data_volume Volume Name: data_volume Type: Replicate Volume ID: 48d74735-db85-44e8-b0d2-1c8cf651418c Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.10.71:/data_sdc/brick3 Brick2: 192.168.10.72:/data_sdc/brick3 Brick3: 192.168.10.73:/data_sdc/brick3 disable features.shard ,performance OK : [root@horebc mnt]# for i in `seq 3`; do dd if=/dev/zero of=./file bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85306 s, 579 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85131 s, 580 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85037 s, 580 MB/s enable features.shard ,performance OK also : [root@horebc mnt]# for i in `seq 3`; do dd if=/dev/zero of=./filetest bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.84995 s, 580 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.87079 s, 574 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85104 s, 580 MB/s so now only relication 3 arbiter 1 volume enable features.shard have performace bad
I am sorry I make a mistake in "humaorong 2016-10-26 01:41:39 EDT comment 9 " that result all are not enable shard. and now I enable shard : [root@horeba ~]# gluster volume set data_volume features.shard on volume set: success [root@horeba ~]# gluster volume set data_volume features.shard-block-size 512MB volume set: success [root@horebc mnt]# for i in `seq 3`; do dd if=/dev/zero of=./filetest2 bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 5.91316 s, 182 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 6.00505 s, 179 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 5.92659 s, 181 MB/s
so test that , volume perforce bad on shard enable condition , arbiter or not arbiter volume also have this problem .
Right, can you raise a separate bug with component as replicate? We can take it from there. Also, for o_directs to be honoured, you will need to disable network.remote-dio and enable performance.strict-o-direct
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.6, please open a new bug report. glusterfs-3.8.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://www.gluster.org/pipermail/packaging/2016-November/000217.html [2] https://www.gluster.org/pipermail/gluster-users/