Bug 1418011

Summary: [RFE] disable client.io-threads on replica volume creation
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Atin Mukherjee <amukherj>
Component: replicateAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asoman, rcyriac, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: FutureFeature
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1418014 (view as bug list) Environment:
Last Closed: 2017-03-23 06:04:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1418014, 1419305    
Bug Blocks: 1351503    

Description Atin Mukherjee 2017-01-31 15:30:30 UTC
Description of problem:

client.io-threads is turned on by default in rhgs-3.2.0. While this tunable has improved EC performance significantly, it has adverse effects on the replicate volume performance, following BZs were filed by perf QE while validating this option:


> https://bugzilla.redhat.com/show_bug.cgi?id=1413512 : [Perf]  Inconsistent sequential writes on FUSE due to client-io-threads

> https://bugzilla.redhat.com/show_bug.cgi?id=1404113 : [Perf] 12% Drop in  sequential reads on SMB v1.0

> https://bugzilla.redhat.com/show_bug.cgi?id=1397854 : [Perf] 10% and 20% drop in sequential writes  on SMB v1 and V3 with RHEL 6.8

> https://bugzilla.redhat.com/show_bug.cgi?id=1395204 : 34% drop in Random Writes  from 3.1.3 to 3.2 on FUSE


As these issues can not be addressed within rhgs-3.2.0 timelines as they are related to the design limitation of AFR, it was decided to turn this option off when a replicate volume is created.

Comment 4 Atin Mukherjee 2017-01-31 15:45:53 UTC
upstream patch : https://review.gluster.org/16492

Comment 5 Atin Mukherjee 2017-02-02 05:15:43 UTC
(In reply to Atin Mukherjee from comment #4)
> upstream patch : https://review.gluster.org/16492

Another alternative approach was thought out and a patch https://review.gluster.org/#/c/16502/ was put up for review.

Comment 6 Atin Mukherjee 2017-02-05 07:20:44 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96886/

Comment 8 Nag Pavan Chilakam 2017-02-07 12:26:07 UTC
QA validation:
===========
observed that on 3.8.4-14:
a new distribute only volume ===>performance.client-io-threads is ON
new distrep ===>performance.client-io-threads is OFF
new replicate ===>performance.client-io-threads is OFF
new ec volume ===>performance.client-io-threads is ON

Converted a pure distribute to 2x2 volume ===>performance.client-io-threads is turned off as part of conversion ====>expected ===>PASS



Upgrade from 3.8.4-13 to 3.8.4-14
new distrep (default was on)===>performance.client-io-threads is OFF
new replicate (default was on)===>performance.client-io-threads is OFF





Observations/Questions:
===========
Do we want to keep performance.client-io-threads "ON" for pure distribute volumes?
What about a customer who had turned it on for a purpose, then we are explicitly turning off, right?

Comment 9 Nag Pavan Chilakam 2017-02-07 12:26:46 UTC
(In reply to nchilaka from comment #8)
> QA validation:
> ===========
> observed that on 3.8.4-14:
> a new distribute only volume ===>performance.client-io-threads is ON
> new distrep ===>performance.client-io-threads is OFF
> new replicate ===>performance.client-io-threads is OFF
> new ec volume ===>performance.client-io-threads is ON
> 
> Converted a pure distribute to 2x2 volume ===>performance.client-io-threads
> is turned off as part of conversion ====>expected ===>PASS
> 
> 
> 
> Upgrade from 3.8.4-13 to 3.8.4-14
> new distrep (default was on)===>performance.client-io-threads is OFF
> new replicate (default was on)===>performance.client-io-threads is OFF
> 
> 
> 
> 

Atin, Can you confirm with below questions
> 
> Observations/Questions:
> ===========
> Do we want to keep performance.client-io-threads "ON" for pure distribute
> volumes?
> What about a customer who had turned it on for a purpose, then we are
> explicitly turning off, right?

Comment 10 Nag Pavan Chilakam 2017-02-07 12:31:58 UTC
test logs
[root@dhcp35-116 ~]# gluster v get x2 all|grep client
diagnostics.client-log-level            INFO                                    
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.client-logger               (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.client-log-buf-size         5                                       
diagnostics.client-log-flush-timeout    120                                     
client.event-threads                    2                                       
client.send-gids                        on                                      
performance.client-io-threads           off                                     
client.bind-insecure                    (null)                                  
[root@dhcp35-116 ~]# 
[root@dhcp35-116 ~]# gluster v info disperse
gl 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: f92ad68e-2cc8-41a5-9911-576a27b9b8ca
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick1/disperse
Brick2: 10.70.35.116:/rhs/brick1/disperse
Brick3: 10.70.35.239:/rhs/brick1/disperse
Brick4: 10.70.35.135:/rhs/brick1/disperse
Brick5: 10.70.35.8:/rhs/brick1/disperse
Brick6: 10.70.35.196:/rhs/brick1/disperse
Brick7: 10.70.35.37:/rhs/brick2/disperse
Brick8: 10.70.35.116:/rhs/brick2/disperse
Brick9: 10.70.35.239:/rhs/brick2/disperse
Brick10: 10.70.35.135:/rhs/brick2/disperse
Brick11: 10.70.35.8:/rhs/brick2/disperse
Brick12: 10.70.35.196:/rhs/brick2/disperse
Options Reconfigured:
nfs.disable: off
performance.readdir-ahead: on
transport.address-family: inet
nfs.rdirplus: on
[root@dhcp35-116 ~]# gluster v get disperse all|grep performance.client-io-threads
rpm -qa|performance.client-io-threads           on                                      
[root@dhcp35-116 ~]# rpm -qa|grep gluster
glusterfs-libs-3.8.4-14.el7rhgs.x86_64
glusterfs-events-3.8.4-14.el7rhgs.x86_64
glusterfs-fuse-3.8.4-14.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-14.el7rhgs.x86_64
glusterfs-server-3.8.4-14.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.8.4-14.el7rhgs.x86_64
glusterfs-cli-3.8.4-14.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-14.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.1.el7rhgs.noarch
glusterfs-api-3.8.4-14.el7rhgs.x86_64
python-gluster-3.8.4-14.el7rhgs.noarch
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-14.el7rhgs.x86_64
glusterfs-rdma-3.8.4-14.el7rhgs.x86_64
[root@dhcp35-116 ~]#

Comment 11 Atin Mukherjee 2017-02-07 12:45:00 UTC
(In reply to nchilaka from comment #8)
> QA validation:
> ===========
> observed that on 3.8.4-14:
> a new distribute only volume ===>performance.client-io-threads is ON
> new distrep ===>performance.client-io-threads is OFF
> new replicate ===>performance.client-io-threads is OFF
> new ec volume ===>performance.client-io-threads is ON
> 
> Converted a pure distribute to 2x2 volume ===>performance.client-io-threads
> is turned off as part of conversion ====>expected ===>PASS
> 
> 
> 
> Upgrade from 3.8.4-13 to 3.8.4-14
> new distrep (default was on)===>performance.client-io-threads is OFF
> new replicate (default was on)===>performance.client-io-threads is OFF
> 
> 
> 
> 
> 
> Observations/Questions:
> ===========
> Do we want to keep performance.client-io-threads "ON" for pure distribute
> volumes?
> What about a customer who had turned it on for a purpose, then we are
> explicitly turning off, right?


client-io-threads was enabled by default irrespective of volume types and now with this patch this option will not be loaded into the graph if the volume is of replicate nature (distrep or rep). So for distribute only volume the option would still be turned on.

If customer turns off this option for EC & distribute volume then the same will be reflected. For replicate volumes if it is turned on it won't reflect in the graph (however the change will reflect in gluster v info) as we'd not want to bring in the same regression problem because of which this fix was made.

Comment 12 Nag Pavan Chilakam 2017-02-07 12:56:57 UTC
thanks Atin,


based on my testing in https://bugzilla.redhat.com/show_bug.cgi?id=1418011#c8
moving to verified

Comment 14 errata-xmlrpc 2017-03-23 06:04:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html