1408772 – [Arbiter] After Killing a brick writes drastically slow down

Bug 1408772 - [Arbiter] After Killing a brick writes drastically slow down

Summary: [Arbiter] After Killing a brick writes drastically slow down

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	arbiter
Sub Component:
Version:	3.8
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1408112 1408395 1408820
Blocks:	1408770
TreeView+	depends on / blocked

Reported:	2016-12-27 06:42 UTC by Ravishankar N
Modified:	2017-01-16 12:27 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.8.8
Clone Of:	1408395
Environment:
Last Closed:	2017-01-16 12:27:41 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2016-12-27 06:42:00 UTC

+++ This bug was initially created as a clone of Bug #1408395 +++

+++ This bug was initially created as a clone of Bug #1408112 +++

Description of problem:
When both the bricks are up writing is at optimal speed and after killing a data brick the writes drastically slow down. 

Version-Release number of selected component (if applicable):
Gluster version:- 3.8.4-9

How reproducible:
100%
Logs and Volume profiles are placed at 
 rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps to Reproduce:
1. To compare create a 1*(2+1) arbiter volume
2. Now write 2 gigs of data using FIO with below command 
    fio /randomwritejob.ini  --client=/clients.list
3. now kill a data brick and then write the same data using fio 
  writing 2 gigs of data takes very long time to complete.

Expected results:
There should be no difference in writting same data in both scenario.

Additional info:
[root@dhcp46-206 /]# vim /randomwritejob.ini
[root@dhcp46-206 /]# cat /randomwritejob.ini
[global]
rw=randrw
io_size=1g
fsync_on_close=1
size=1g
bs=64k
rwmixread=20
openfiles=1
startdelay=0
ioengine=sync
verify=md5
[write]
directory=/mnt/samsung
nrfiles=1
filename_format=f.$jobnum.$filenum
numjobs=2
[root@dhcp46-206 /]#


--- Additional comment from Karan Sandha on 2016-12-23 02:43:36 EST ---

Tested the above test steps on Replica 2 and Replica 3. Seems like this issue is specific to arbiter.

Thanks & Regards
Karan Sandha

--- Additional comment from Ravishankar N on 2016-12-23 04:21:45 EST ---

RCA:
afr_replies_interpret() used the 'readable' matrix to trigger client
side heals after inode refresh. But for arbiter, readable is always
zero. So when `dd` is run with a data brick down, spurious data heals
are are triggered repeatedly. These heals open an fd, causing eager lock to be
disabled (open fd count >1) in afr transactions, leading to extra LOCK + FXATTROPS, slowing the throughput.

--- Additional comment from Worker Ant on 2016-12-23 04:36:42 EST ---

REVIEW: http://review.gluster.org/16277 (afr: use accused matrix instead of readable matrix for deciding heals) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2016-12-27 01:34:05 EST ---

COMMIT: http://review.gluster.org/16277 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 5a7c86e578f5bbd793126a035c30e6b052177a9f
Author: Ravishankar N <ravishankar>
Date:   Fri Dec 23 07:11:13 2016 +0000

    afr: use accused matrix instead of readable matrix for deciding heals
    
    Problem:
    afr_replies_interpret() used the 'readable' matrix to trigger client
    side heals after inode refresh. But for arbiter, readable is always
    zero. So when `dd` is run with a data brick down, spurious data heals
    are are triggered. These heals open an fd, causing eager lock to be
    disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS
    
    Fix:
    Use the accused matrix (derived from interpreting the afr pending
    xattrs) to decide whether we can start heal or not.
    
    Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9
    BUG: 1408395
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/16277
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    Tested-by: Pranith Kumar Karampuri <pkarampu>

Comment 1 Worker Ant 2016-12-27 06:43:15 UTC

REVIEW: http://review.gluster.org/16291 (afr: use accused matrix instead of readable matrix for deciding heals) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2016-12-28 09:13:26 UTC

COMMIT: http://review.gluster.org/16291 committed in release-3.8 by Pranith Kumar Karampuri (pkarampu) 
------
commit 034ee769a55099c343b00cdc39896fe74e44068d
Author: Ravishankar N <ravishankar>
Date:   Fri Dec 23 07:11:13 2016 +0000

    afr: use accused matrix instead of readable matrix for deciding heals
    
    Problem:
    afr_replies_interpret() used the 'readable' matrix to trigger client
    side heals after inode refresh. But for arbiter, readable is always
    zero. So when `dd` is run with a data brick down, spurious data heals
    are are triggered. These heals open an fd, causing eager lock to be
    disabled (open fd count >1) in afr transactions, leading to extra FXATTROPS
    
    Fix:
    Use the accused matrix (derived from interpreting the afr pending
    xattrs) to decide whether we can start heal or not.
    
    > Reviewed-on: http://review.gluster.org/16277
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Pranith Kumar Karampuri <pkarampu>
    > Tested-by: Pranith Kumar Karampuri <pkarampu>
    (cherry picked from commit 5a7c86e578f5bbd793126a035c30e6b052177a9f)
    
    Change-Id: Ibbd56c9aed6026de6ec42422e60293702aaf55f9
    BUG: 1408772
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/16291
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Niels de Vos 2017-01-16 12:27:41 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.8, please open a new bug report.

glusterfs-3.8.8 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-January/000064.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.