Bug 1640581

Summary: [AFR] : Start crawling indices and healing only if both data bricks are UP in replica 2 (thin-arbiter)
Product: [Community] GlusterFS Reporter: Ashish Pandey <aspandey>
Component: replicateAssignee: Ashish Pandey <aspandey>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1644645 (view as bug list) Environment:
Last Closed: 2018-11-20 08:46:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1644645    

Description Ashish Pandey 2018-10-18 11:08:15 UTC
Description of problem:

Problem: 

Currently in replica 2 volume, if one brick is down and we write/create on mount point an index entry will be created.
SHD will also keep crawling these index entries even if one brick is down.
That does not makes sense as it can never heal the entry if only one brick is UP.

In thin-arbiter volume which is also a replica 2 volume, this causes inode lock contention which in turn sends upcall to all the clients to release notify
locks, even if it can not do anything for healing.

This will slow down the client performance and kills the purpose of keeping in memory information about bad brick.




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2018-10-18 11:58:38 UTC
REVIEW: https://review.gluster.org/21448 (cluster/afr : Check for UP bricks before starting heal) posted (#1) for review on master by Ashish Pandey

Comment 2 Worker Ant 2018-10-24 00:56:53 UTC
COMMIT: https://review.gluster.org/21448 committed in master by "Ravishankar N" <ravishankar> with a commit message- cluster/afr : Check for UP bricks before starting heal

Problem:
Currently for replica volume, even if only one brick is UP
SHD will keep crawling index entries even if it can not
heal anything.

In thin-arbiter volume which is also a replica 2 volume,
this causes inode lock contention which in turn sends
upcall to all the clients to release notify locks, even
if it can not do anything for healing.

This will slow down the client performance and kills the
purpose of keeping in memory information about bad brick.

Solution: Before starting heal or even crawling, check if
sufficient number of children are UP and available to check
and heal entries.

Change-Id: I011c9da3b37cae275f791affd56b8f1c1ac9255d
updates: bz#1640581
Signed-off-by: Ashish Pandey <aspandey>

Comment 3 Shyamsundar 2019-03-25 16:31:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/