Bug 1493415

Summary: self-heal daemon stuck
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.13.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1499202 (view as bug list) Environment:
Last Closed: 2017-12-08 17:41:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1492782, 1499202    

Description Ravishankar N 2017-09-20 07:18:27 UTC
Problem:
    If a brick crashes after an entry (file or dir) is created but before
    gfid is assigned, the good bricks will have pending entry heal xattrs
    but the heal won't complete because afr_selfheal_recreate_entry() tries
    to create the entry again and it fails with EEXIST.

Comment 1 Worker Ant 2017-09-20 07:19:23 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2017-09-20 11:30:37 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#2) for review on master by Ravishankar N (ravishankar)

Comment 3 Worker Ant 2017-09-24 11:21:26 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#3) for review on master by Ravishankar N (ravishankar)

Comment 4 Worker Ant 2017-10-03 07:21:54 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#4) for review on master by Ravishankar N (ravishankar)

Comment 5 Worker Ant 2017-10-05 10:19:22 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#5) for review on master by Ravishankar N (ravishankar)

Comment 6 Worker Ant 2017-10-06 07:17:12 UTC
REVIEW: https://review.gluster.org/18326 (afr: heal gfid as a part of entry heal) posted (#6) for review on master by Ravishankar N (ravishankar)

Comment 7 Worker Ant 2017-10-09 06:23:12 UTC
COMMIT: https://review.gluster.org/18326 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265
Author: Ravishankar N <ravishankar>
Date:   Wed Sep 20 12:16:06 2017 +0530

    afr: heal gfid as a part of entry heal
    
    Problem:
    If a brick crashes after an entry (file or dir) is created but before
    gfid is assigned, the good bricks will have pending entry heal xattrs
    but the heal won't complete because afr_selfheal_recreate_entry() tries
    to create the entry again and it fails with EEXIST.
    
    Fix:
    We could have fixed posx_mknod/mkdir etc to assign the gfid if the file
    already exists but the right thing to do seems to be to trigger a lookup
    on the bad brick and let it heal the gfid instead of winding an
    mknod/mkdir in the first place.
    
    Change-Id: I82f76665a7541f1893ef8d847b78af6466aff1ff
    BUG: 1493415
    Signed-off-by: Ravishankar N <ravishankar>

Comment 8 Shyamsundar 2017-12-08 17:41:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/