+++ This bug was initially created as a clone of Bug #1408395 +++ +++ This bug was initially created as a clone of Bug #1408112 +++ Description of problem: When both the bricks are up writing is at optimal speed and after killing a data brick the writes drastically slow down. Version-Release number of selected component (if applicable): Gluster version:- 3.8.4-9 How reproducible: 100% Logs and Volume profiles are placed at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug> Steps to Reproduce: 1. To compare create a 1*(2+1) arbiter volume 2. Now write 2 gigs of data using FIO with below command fio /randomwritejob.ini --client=/clients.list 3. now kill a data brick and then write the same data using fio writing 2 gigs of data takes very long time to complete. Expected results: There should be no difference in writting same data in both scenario. Additional info: [root@dhcp46-206 /]# vim /randomwritejob.ini [root@dhcp46-206 /]# cat /randomwritejob.ini [global] rw=randrw io_size=1g fsync_on_close=1 size=1g bs=64k rwmixread=20 openfiles=1 startdelay=0 ioengine=sync verify=md5 [write] directory=/mnt/samsung nrfiles=1 filename_format=f.$jobnum.$filenum numjobs=2 [root@dhcp46-206 /]# --- Additional comment from Karan Sandha on 2016-12-23 02:43:36 EST --- Tested the above test steps on Replica 2 and Replica 3. Seems like this issue is specific to arbiter. Thanks & Regards Karan Sandha --- Additional comment from Ravishankar N on 2016-12-23 04:21:45 EST --- RCA: afr_replies_interpret() used the 'readable' matrix to trigger client side heals after inode refresh. But for arbiter, readable is always zero. So when `dd` is run with a data brick down, spurious data heals are are triggered repeatedly. These heals open an fd, causing eager lock to be disabled (open fd count >1) in afr transactions, leading to extra LOCK + FXATTROPS, slowing the throughput. --- Additional comment from Worker Ant on 2016-12-23 04:36:42 EST --- REVIEW: http://review.gluster.org/16277 (afr: use accused matrix instead of readable matrix for deciding heals) posted (#1) for review on master by Ravishankar N (ravishankar) --- Additional comment from Worker Ant on 2016-12-27 01:34:05 EST --- COMMIT: http://review.gluster.org/16277 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 5a7c86e578f5bbd793126a035c30e6b052177a9f Author: Ravishankar N <ravishankar> Date: Fri Dec 23 07:11:13 2016 +0000 afr: use accused matrix instead of readable matrix for deciding heals Problem: afr_replies_interpret() used the 'readable' matrix to trigger client side heals after inode refresh. But for arbiter, readable is always zero. So when `dd` is run with a data brick down, spurious data heals are are triggered. These heals open an fd, causing eager lock to be disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS Fix: Use the accused matrix (derived from interpreting the afr pending xattrs) to decide whether we can start heal or not. Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9 BUG: 1408395 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/16277 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu>
REVIEW: http://review.gluster.org/16299 (afr: use accused matrix instead of readable matrix for deciding heals) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/16299 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 2892fb43027b2c3c39b9e3b32ea99a3a090c0297 Author: Ravishankar N <ravishankar> Date: Fri Dec 23 07:11:13 2016 +0000 afr: use accused matrix instead of readable matrix for deciding heals Problem: afr_replies_interpret() used the 'readable' matrix to trigger client side heals after inode refresh. But for arbiter, readable is always zero. So when `dd` is run with a data brick down, spurious data heals are are triggered. These heals open an fd, causing eager lock to be disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS Fix: Use the accused matrix (derived from interpreting the afr pending xattrs) to decide whether we can start heal or not. > Reviewed-on: http://review.gluster.org/16277 > NetBSD-regression: NetBSD Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> > Tested-by: Pranith Kumar Karampuri <pkarampu> (cherry picked from commit 5a7c86e578f5bbd793126a035c30e6b052177a9f) Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9 BUG: 1408820 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/16299 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.19, please open a new bug report. glusterfs-3.7.19 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/gluster-users/2017-January/029623.html [2] https://www.gluster.org/pipermail/gluster-users/