Description of problem: client.io-threads is turned on by default in rhgs-3.2.0. While this tunable has improved EC performance significantly, it has adverse effects on the replicate volume performance, following BZs were filed by perf QE while validating this option: > https://bugzilla.redhat.com/show_bug.cgi?id=1413512 : [Perf] Inconsistent sequential writes on FUSE due to client-io-threads > https://bugzilla.redhat.com/show_bug.cgi?id=1404113 : [Perf] 12% Drop in sequential reads on SMB v1.0 > https://bugzilla.redhat.com/show_bug.cgi?id=1397854 : [Perf] 10% and 20% drop in sequential writes on SMB v1 and V3 with RHEL 6.8 > https://bugzilla.redhat.com/show_bug.cgi?id=1395204 : 34% drop in Random Writes from 3.1.3 to 3.2 on FUSE As these issues can not be addressed within rhgs-3.2.0 timelines as they are related to the design limitation of AFR, it was decided to turn this option off when a replicate volume is created.
upstream patch : https://review.gluster.org/16492
(In reply to Atin Mukherjee from comment #4) > upstream patch : https://review.gluster.org/16492 Another alternative approach was thought out and a patch https://review.gluster.org/#/c/16502/ was put up for review.
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96886/
QA validation: =========== observed that on 3.8.4-14: a new distribute only volume ===>performance.client-io-threads is ON new distrep ===>performance.client-io-threads is OFF new replicate ===>performance.client-io-threads is OFF new ec volume ===>performance.client-io-threads is ON Converted a pure distribute to 2x2 volume ===>performance.client-io-threads is turned off as part of conversion ====>expected ===>PASS Upgrade from 3.8.4-13 to 3.8.4-14 new distrep (default was on)===>performance.client-io-threads is OFF new replicate (default was on)===>performance.client-io-threads is OFF Observations/Questions: =========== Do we want to keep performance.client-io-threads "ON" for pure distribute volumes? What about a customer who had turned it on for a purpose, then we are explicitly turning off, right?
(In reply to nchilaka from comment #8) > QA validation: > =========== > observed that on 3.8.4-14: > a new distribute only volume ===>performance.client-io-threads is ON > new distrep ===>performance.client-io-threads is OFF > new replicate ===>performance.client-io-threads is OFF > new ec volume ===>performance.client-io-threads is ON > > Converted a pure distribute to 2x2 volume ===>performance.client-io-threads > is turned off as part of conversion ====>expected ===>PASS > > > > Upgrade from 3.8.4-13 to 3.8.4-14 > new distrep (default was on)===>performance.client-io-threads is OFF > new replicate (default was on)===>performance.client-io-threads is OFF > > > > Atin, Can you confirm with below questions > > Observations/Questions: > =========== > Do we want to keep performance.client-io-threads "ON" for pure distribute > volumes? > What about a customer who had turned it on for a purpose, then we are > explicitly turning off, right?
test logs [root@dhcp35-116 ~]# gluster v get x2 all|grep client diagnostics.client-log-level INFO diagnostics.client-sys-log-level CRITICAL diagnostics.client-logger (null) diagnostics.client-log-format (null) diagnostics.client-log-buf-size 5 diagnostics.client-log-flush-timeout 120 client.event-threads 2 client.send-gids on performance.client-io-threads off client.bind-insecure (null) [root@dhcp35-116 ~]# [root@dhcp35-116 ~]# gluster v info disperse gl Volume Name: disperse Type: Distributed-Disperse Volume ID: f92ad68e-2cc8-41a5-9911-576a27b9b8ca Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick1/disperse Brick2: 10.70.35.116:/rhs/brick1/disperse Brick3: 10.70.35.239:/rhs/brick1/disperse Brick4: 10.70.35.135:/rhs/brick1/disperse Brick5: 10.70.35.8:/rhs/brick1/disperse Brick6: 10.70.35.196:/rhs/brick1/disperse Brick7: 10.70.35.37:/rhs/brick2/disperse Brick8: 10.70.35.116:/rhs/brick2/disperse Brick9: 10.70.35.239:/rhs/brick2/disperse Brick10: 10.70.35.135:/rhs/brick2/disperse Brick11: 10.70.35.8:/rhs/brick2/disperse Brick12: 10.70.35.196:/rhs/brick2/disperse Options Reconfigured: nfs.disable: off performance.readdir-ahead: on transport.address-family: inet nfs.rdirplus: on [root@dhcp35-116 ~]# gluster v get disperse all|grep performance.client-io-threads rpm -qa|performance.client-io-threads on [root@dhcp35-116 ~]# rpm -qa|grep gluster glusterfs-libs-3.8.4-14.el7rhgs.x86_64 glusterfs-events-3.8.4-14.el7rhgs.x86_64 glusterfs-fuse-3.8.4-14.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-14.el7rhgs.x86_64 glusterfs-server-3.8.4-14.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-3.8.4-14.el7rhgs.x86_64 glusterfs-cli-3.8.4-14.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-14.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch glusterfs-api-3.8.4-14.el7rhgs.x86_64 python-gluster-3.8.4-14.el7rhgs.noarch gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-14.el7rhgs.x86_64 glusterfs-rdma-3.8.4-14.el7rhgs.x86_64 [root@dhcp35-116 ~]#
(In reply to nchilaka from comment #8) > QA validation: > =========== > observed that on 3.8.4-14: > a new distribute only volume ===>performance.client-io-threads is ON > new distrep ===>performance.client-io-threads is OFF > new replicate ===>performance.client-io-threads is OFF > new ec volume ===>performance.client-io-threads is ON > > Converted a pure distribute to 2x2 volume ===>performance.client-io-threads > is turned off as part of conversion ====>expected ===>PASS > > > > Upgrade from 3.8.4-13 to 3.8.4-14 > new distrep (default was on)===>performance.client-io-threads is OFF > new replicate (default was on)===>performance.client-io-threads is OFF > > > > > > Observations/Questions: > =========== > Do we want to keep performance.client-io-threads "ON" for pure distribute > volumes? > What about a customer who had turned it on for a purpose, then we are > explicitly turning off, right? client-io-threads was enabled by default irrespective of volume types and now with this patch this option will not be loaded into the graph if the volume is of replicate nature (distrep or rep). So for distribute only volume the option would still be turned on. If customer turns off this option for EC & distribute volume then the same will be reflected. For replicate volumes if it is turned on it won't reflect in the graph (however the change will reflect in gluster v info) as we'd not want to bring in the same regression problem because of which this fix was made.
thanks Atin, based on my testing in https://bugzilla.redhat.com/show_bug.cgi?id=1418011#c8 moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html