Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1408112 - [Arbiter] After Killing a brick writes drastically slow down
[Arbiter] After Killing a brick writes drastically slow down
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: arbiter (Show other bugs)
3.2
All Linux
unspecified Severity high
: ---
: RHGS 3.2.0
Assigned To: Ravishankar N
Karan Sandha
:
Depends On:
Blocks: 1351528 1408395 1408770 1408772 1408820
  Show dependency treegraph
 
Reported: 2016-12-22 02:22 EST by Karan Sandha
Modified: 2017-03-23 01:59 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-11
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1408395 (view as bug list)
Environment:
Last Closed: 2017-03-23 01:59:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Karan Sandha 2016-12-22 02:22:11 EST
Description of problem:
When both the bricks are up writing is at optimal speed and after killing a data brick the writes drastically slow down. 

Version-Release number of selected component (if applicable):
Gluster version:- 3.8.4-9

How reproducible:
100%
Logs and Volume profiles are placed at 
 rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps to Reproduce:
1. To compare create a 1*(2+1) arbiter volume
2. Now write 2 gigs of data using FIO with below command 
    fio /randomwritejob.ini  --client=/clients.list
3. now kill a data brick and then write the same data using fio 
  writing 2 gigs of data takes very long time to complete.

Expected results:
There should be no difference in writting same data in both scenario.

Additional info:
[root@dhcp46-206 /]# vim /randomwritejob.ini
[root@dhcp46-206 /]# cat /randomwritejob.ini
[global]
rw=randrw
io_size=1g
fsync_on_close=1
size=1g
bs=64k
rwmixread=20
openfiles=1
startdelay=0
ioengine=sync
verify=md5
[write]
directory=/mnt/samsung
nrfiles=1
filename_format=f.$jobnum.$filenum
numjobs=2
[root@dhcp46-206 /]#
Comment 5 Ravishankar N 2016-12-23 04:21:45 EST
RCA:
afr_replies_interpret() used the 'readable' matrix to trigger client
side heals after inode refresh. But for arbiter, readable is always
zero. So when `dd` is run with a data brick down, spurious data heals
are are triggered repeatedly. These heals open an fd, causing eager lock to be
disabled (open fd count >1) in afr transactions, leading to extra LOCK + FXATTROPS, slowing the throughput.
Comment 6 Ravishankar N 2016-12-23 04:38:36 EST
Upstream patch  http://review.gluster.org/#/c/16277/
Comment 8 Ravishankar N 2016-12-27 01:51:15 EST
Downstream patch https://code.engineering.redhat.com/gerrit/#/c/93735
Comment 12 errata-xmlrpc 2017-03-23 01:59:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.