Bug 865825

Summary: Self-heal checks skip pending counts that they shouldn't
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: replicateAssignee: Jeff Darcy <jdarcy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: All   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 867360 867364 (view as bug list) Environment:
Last Closed: 2013-07-24 17:19:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 867360, 867364    

Description Jeff Darcy 2012-10-12 14:21:37 UTC
The afr_sh_has_*_pending pending functions, which were clearly copied from one another, all have the problem that if they get an error from dict_get_ptr they return immediately.  This means they won't check the pending counts for other peers, which should trigger self-heal if they're non-zero.  Failing to trigger self-heal when we should makes this a high-priority issue.

Comment 1 Jeff Darcy 2012-10-19 13:36:41 UTC
http://review.gluster.org/#change,4070 should address this (along with some performance/readability issues).

Comment 2 Vijay Bellur 2012-11-27 05:34:44 UTC
CHANGE: http://review.gluster.org/4070 (replicate: don't stop checking xattrs because one was absent) merged in master by Anand Avati (avati)

Comment 3 Vijay Bellur 2012-11-28 06:42:33 UTC
CHANGE: http://review.gluster.org/4242 (tests/bug-865825: turn stat-prefetch off before doing any afr self-heal                   related tests.) merged in master by Vijay Bellur (vbellur)

Comment 4 Vijay Bellur 2012-12-03 08:30:02 UTC
CHANGE: http://review.gluster.org/4253 (tests: Perform self-heal in foreground) merged in master by Vijay Bellur (vbellur)