1388837 – enable features.shard glusterfs replication or arbiter volume performance bad

Bug 1388837 - enable features.shard glusterfs replication or arbiter volume performance bad

Summary: enable features.shard glusterfs replication or arbiter volume performance bad

Keywords:
Status:	CLOSED DUPLICATE of bug 1384906
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	sharding
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-26 09:38 UTC by humaorong
Modified:	2016-11-30 10:44 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-11-14 15:24:30 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
CentOS	1375225	0	None	None	None	2016-10-26 09:38:00 UTC

Description humaorong 2016-10-26 09:38:00 UTC

Description of problem:
     
  glusterfs relicate 3 or relicate 3 arbiter 1 volume , set  features.shard enable or set features.shard on , glusterfs volume performnace bad .
   And other also report arbiter volume enable shard performance bad bug :  https://bugzilla.redhat.com/show_bug.cgi?id=1375125 ,  I use nightly build version(2016-10-25):http://artifacts.ci.centos.org/gluster/nightly/release-3.8/7/x86_64/?C=M;O=D  test , the problem also exist, I think this releally problem is shard ,not arbiter .

Version-Release number of selected component (if applicable):
   test glusterfs version on : 3.7 , 3.8 and 3.9 and also 
  


How reproducible:


Steps to Reproduce:
1.create glusterfs relicate 3 and relicate 3 arbiter 1  volume :
[root@horeba ~]# gluster volume info data_volume
 
Volume Name: data_volume
Type: Replicate
Volume ID: 48d74735-db85-44e8-b0d2-1c8cf651418c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.10.71:/data_sdc/brick3
Brick2: 192.168.10.72:/data_sdc/brick3
Brick3: 192.168.10.73:/data_sdc/brick3
Options Reconfigured:
features.shard-block-size: 512MB
features.shard: on
nfs.disable: on
cluster.data-self-heal-algorithm: full
server.allow-insecure: on
auth.allow: *
network.ping-timeout: 10
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on

[root@horeba ~]# gluster v info data_volume3
 
Volume Name: data_volume3
Type: Distributed-Replicate
Volume ID: cd5f4322-11e3-4f18-a39d-f0349b8d2a0c
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 192.168.10.71:/data_sdaa/brick
Brick2: 192.168.10.72:/data_sdaa/brick
Brick3: 192.168.10.73:/data_sdaa/brick (arbiter)
Brick4: 192.168.10.71:/data_sdc/brick
Brick5: 192.168.10.73:/data_sdc/brick
Brick6: 192.168.10.72:/data_sdc/brick (arbiter)
Brick7: 192.168.10.72:/data_sde/brick
Brick8: 192.168.10.73:/data_sde/brick
Brick9: 192.168.10.71:/data_sde/brick (arbiter)
Brick10: 192.168.10.71:/data_sde/brick1
Brick11: 192.168.10.72:/data_sdc/brick1
Brick12: 192.168.10.73:/data_sdaa/brick1 (arbiter)
Options Reconfigured:
features.shard: on
server.allow-insecure: on
features.shard-block-size: 512MB
storage.owner-gid: 36
storage.owner-uid: 36
nfs.disable: on
cluster.data-self-heal-algorithm: full
auth.allow: *
network.ping-timeout: 10
performance.low-prio-threads: 32
performance.io-thread-count: 32
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: on



2. mount on one host and test :

  all volume set features.shard on  as above , add dd test :
  
replication3 arbiter 1 , enable shard :
[root@horebc test]# for i in `seq 3`; do dd if=/dev/zero of=./file   bs=1G count=1 oflag=direct ; done
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 56.3563 s, 19.1 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 56.8704 s, 18.9 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 54.8892 s, 19.6 MB/s

relication 3 , enable shard :
[root@horebc test2]# for i in `seq 3`; do dd if=/dev/zero of=./file   bs=1G count=1 oflag=direct ; done
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.46174 s, 166 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.39413 s, 168 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 6.36879 s, 169 MB/s


[root@horeba ~]# gluster v reset  data_volume3 features.shard 
volume reset: success: reset volume successful
[root@horeba ~]# gluster v reset  data_volume features.shard 
volume reset: success: reset volume successful



relication 3 ,no shard enable :
[root@horebc test2]# for i in `seq 3`; do dd if=/dev/zero of=./file   bs=1G count=1 oflag=direct ; done
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.85271 s, 580 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.85781 s, 578 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.85364 s, 579 MB/s

relication 3 arbiter 1 ,no shard enable :
[root@horebc test]# for i in `seq 3`; do dd if=/dev/zero of=./file1   bs=1G count=1 oflag=direct ; done
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.40569 s, 764 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.33287 s, 806 MB/s
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.32026 s, 813 MB/s


3.

Actual results:
   wo can see if enable shard config , relication volume performance bad , and arbiter volume crazy bad . 



Additional info:

Comment 1 Krutika Dhananjay 2016-11-14 15:24:30 UTC

Hi,

So https://bugzilla.redhat.com/show_bug.cgi?id=1384906 is also a similar bug and Ravi who works on arbiter has fixed this bug and the patch has made it into 3.7, 3.8 and 3.9. I am therefore closing this bug as a clone of the other bug.

Ravi should be able to tell you the exact .x releases the patch has made it into.

-Krutika

*** This bug has been marked as a duplicate of bug 1384906 ***

Comment 2 Ravishankar N 2016-11-14 15:48:27 UTC

Krutika, this is not a duplicate. humaorong did try the arbiter fix (https://bugzilla.redhat.com/show_bug.cgi?id=1375125#c11) but the problem was seen in replicate volumes too. The performance impact he observed is due to the shard xattr being updated due to appending writes, amplified by replication.  You might want to confirm that this is expected behaviour and not a bug per se in sharding. (Unless there is some form of delayed size updation thingy we can do in the happy path for shard size xattr updation).

Comment 3 Olivier LAMBERT 2016-11-29 16:50:11 UTC

I confirm I can reproduce the issue in the exact same condition:

* Replica 2 with our without sharding: OK
* Replica 3 + 1 arbiter without sharding: OK
* Replica 3 + 1 arbiter **with** sharding: NOT OK (1MB/s against ~100MB/s for 1 and 2)

Comment 4 Krutika Dhananjay 2016-11-30 08:51:38 UTC

(In reply to Olivier LAMBERT from comment #3)
> I confirm I can reproduce the issue in the exact same condition:
> 
> * Replica 2 with our without sharding: OK
> * Replica 3 + 1 arbiter without sharding: OK
> * Replica 3 + 1 arbiter **with** sharding: NOT OK (1MB/s against ~100MB/s
> for 1 and 2)

How does

* replica 3 without arbiter and without sharding
* replica 3 without arbiter and with sharding

compare with the data in comment #3?

Could you share that information as well?

-Krutika

Comment 5 Olivier LAMBERT 2016-11-30 10:44:26 UTC

Sadly, I only have a 2 nodes setup ATM, I can't create a "real" replica 3.

Note You need to log in before you can comment on or make changes to this bug.