Description of problem:
When both the bricks are up writing is at optimal speed and after killing a data brick the writes drastically slow down.
Version-Release number of selected component (if applicable):
Gluster version:- 3.8.4-9
Logs and Volume profiles are placed at
Steps to Reproduce:
1. To compare create a 1*(2+1) arbiter volume
2. Now write 2 gigs of data using FIO with below command
fio /randomwritejob.ini --client=/clients.list
3. now kill a data brick and then write the same data using fio
writing 2 gigs of data takes very long time to complete.
There should be no difference in writting same data in both scenario.
[root@dhcp46-206 /]# vim /randomwritejob.ini
[root@dhcp46-206 /]# cat /randomwritejob.ini
afr_replies_interpret() used the 'readable' matrix to trigger client
side heals after inode refresh. But for arbiter, readable is always
zero. So when `dd` is run with a data brick down, spurious data heals
are are triggered repeatedly. These heals open an fd, causing eager lock to be
disabled (open fd count >1) in afr transactions, leading to extra LOCK + FXATTROPS, slowing the throughput.
Upstream patch http://review.gluster.org/#/c/16277/
Downstream patch https://code.engineering.redhat.com/gerrit/#/c/93735
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.