Description of problem: glusterfs relicate 3 or relicate 3 arbiter 1 volume , set features.shard enable or set features.shard on , glusterfs volume performnace bad . And other also report arbiter volume enable shard performance bad bug : https://bugzilla.redhat.com/show_bug.cgi?id=1375125 , I use nightly build version(2016-10-25):http://artifacts.ci.centos.org/gluster/nightly/release-3.8/7/x86_64/?C=M;O=D test , the problem also exist, I think this releally problem is shard ,not arbiter . Version-Release number of selected component (if applicable): test glusterfs version on : 3.7 , 3.8 and 3.9 and also How reproducible: Steps to Reproduce: 1.create glusterfs relicate 3 and relicate 3 arbiter 1 volume : [root@horeba ~]# gluster volume info data_volume Volume Name: data_volume Type: Replicate Volume ID: 48d74735-db85-44e8-b0d2-1c8cf651418c Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.10.71:/data_sdc/brick3 Brick2: 192.168.10.72:/data_sdc/brick3 Brick3: 192.168.10.73:/data_sdc/brick3 Options Reconfigured: features.shard-block-size: 512MB features.shard: on nfs.disable: on cluster.data-self-heal-algorithm: full server.allow-insecure: on auth.allow: * network.ping-timeout: 10 storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on [root@horeba ~]# gluster v info data_volume3 Volume Name: data_volume3 Type: Distributed-Replicate Volume ID: cd5f4322-11e3-4f18-a39d-f0349b8d2a0c Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: 192.168.10.71:/data_sdaa/brick Brick2: 192.168.10.72:/data_sdaa/brick Brick3: 192.168.10.73:/data_sdaa/brick (arbiter) Brick4: 192.168.10.71:/data_sdc/brick Brick5: 192.168.10.73:/data_sdc/brick Brick6: 192.168.10.72:/data_sdc/brick (arbiter) Brick7: 192.168.10.72:/data_sde/brick Brick8: 192.168.10.73:/data_sde/brick Brick9: 192.168.10.71:/data_sde/brick (arbiter) Brick10: 192.168.10.71:/data_sde/brick1 Brick11: 192.168.10.72:/data_sdc/brick1 Brick12: 192.168.10.73:/data_sdaa/brick1 (arbiter) Options Reconfigured: features.shard: on server.allow-insecure: on features.shard-block-size: 512MB storage.owner-gid: 36 storage.owner-uid: 36 nfs.disable: on cluster.data-self-heal-algorithm: full auth.allow: * network.ping-timeout: 10 performance.low-prio-threads: 32 performance.io-thread-count: 32 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on 2. mount on one host and test : all volume set features.shard on as above , add dd test : replication3 arbiter 1 , enable shard : [root@horebc test]# for i in `seq 3`; do dd if=/dev/zero of=./file bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 56.3563 s, 19.1 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 56.8704 s, 18.9 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 54.8892 s, 19.6 MB/s relication 3 , enable shard : [root@horebc test2]# for i in `seq 3`; do dd if=/dev/zero of=./file bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 6.46174 s, 166 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 6.39413 s, 168 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 6.36879 s, 169 MB/s [root@horeba ~]# gluster v reset data_volume3 features.shard volume reset: success: reset volume successful [root@horeba ~]# gluster v reset data_volume features.shard volume reset: success: reset volume successful relication 3 ,no shard enable : [root@horebc test2]# for i in `seq 3`; do dd if=/dev/zero of=./file bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85271 s, 580 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85781 s, 578 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.85364 s, 579 MB/s relication 3 arbiter 1 ,no shard enable : [root@horebc test]# for i in `seq 3`; do dd if=/dev/zero of=./file1 bs=1G count=1 oflag=direct ; done 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.40569 s, 764 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.33287 s, 806 MB/s 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB) copied, 1.32026 s, 813 MB/s 3. Actual results: wo can see if enable shard config , relication volume performance bad , and arbiter volume crazy bad . Additional info:
Hi, So https://bugzilla.redhat.com/show_bug.cgi?id=1384906 is also a similar bug and Ravi who works on arbiter has fixed this bug and the patch has made it into 3.7, 3.8 and 3.9. I am therefore closing this bug as a clone of the other bug. Ravi should be able to tell you the exact .x releases the patch has made it into. -Krutika *** This bug has been marked as a duplicate of bug 1384906 ***
Krutika, this is not a duplicate. humaorong did try the arbiter fix (https://bugzilla.redhat.com/show_bug.cgi?id=1375125#c11) but the problem was seen in replicate volumes too. The performance impact he observed is due to the shard xattr being updated due to appending writes, amplified by replication. You might want to confirm that this is expected behaviour and not a bug per se in sharding. (Unless there is some form of delayed size updation thingy we can do in the happy path for shard size xattr updation).
I confirm I can reproduce the issue in the exact same condition: * Replica 2 with our without sharding: OK * Replica 3 + 1 arbiter without sharding: OK * Replica 3 + 1 arbiter **with** sharding: NOT OK (1MB/s against ~100MB/s for 1 and 2)
(In reply to Olivier LAMBERT from comment #3) > I confirm I can reproduce the issue in the exact same condition: > > * Replica 2 with our without sharding: OK > * Replica 3 + 1 arbiter without sharding: OK > * Replica 3 + 1 arbiter **with** sharding: NOT OK (1MB/s against ~100MB/s > for 1 and 2) How does * replica 3 without arbiter and without sharding * replica 3 without arbiter and with sharding compare with the data in comment #3? Could you share that information as well? -Krutika
Sadly, I only have a 2 nodes setup ATM, I can't create a "real" replica 3.