1213358 – Implement directory heal for ec

Bug 1213358 - Implement directory heal for ec

Summary: Implement directory heal for ec

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-04-20 11:25 UTC by Pranith Kumar K
Modified:	2016-06-16 12:53 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.8rc2
Clone Of:
Environment:
Last Closed:	2016-06-16 12:53:42 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2015-04-20 11:25:24 UTC

Description of problem:
Implement directory heal for ec

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2015-04-22 03:14:47 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2015-04-22 07:21:18 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2015-04-22 07:21:20 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Anand Avati 2015-04-22 10:02:19 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Anand Avati 2015-04-22 10:02:22 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Anand Avati 2015-04-22 10:07:12 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Anand Avati 2015-04-22 10:07:14 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 8 Anand Avati 2015-04-23 04:27:25 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 9 Anand Avati 2015-04-23 04:27:27 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#6) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 10 Anand Avati 2015-04-23 18:24:23 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#9) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 11 Anand Avati 2015-04-23 18:24:25 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#7) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 12 Anand Avati 2015-04-25 05:28:47 UTC

REVIEW: http://review.gluster.org/10240 (libglusterfs: Implement cluster-syncop) posted (#10) for review on master by Vijay Bellur (vbellur)

Comment 13 Anand Avati 2015-04-25 11:56:58 UTC

COMMIT: http://review.gluster.org/10240 committed in master by Vijay Bellur (vbellur) 
------
commit 557ea3781e984f5f3cf206dd4b8d0a81c8cbdb58
Author: Pranith Kumar K <pkarampu>
Date:   Tue Apr 14 13:45:33 2015 +0530

    libglusterfs: Implement cluster-syncop
    
    This patch implements syncop equivalent for cluster of xlators. The xlators on
    which the fop needs to be performed is taken in input arguments to the
    functions and the responses are gathered and provided as the output.
    
    This idea is taken from afr-v2 self-heal implementation by Avati.
    
    Change-Id: I2b568f4340cf921a65054b8ab0df7edc4478b5ca
    BUG: 1213358
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/10240
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 14 Anand Avati 2015-04-26 16:07:31 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata-heal implementation for ec) posted (#8) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 15 Anand Avati 2015-04-26 20:43:13 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata-heal implementation for ec) posted (#9) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 16 Anand Avati 2015-04-27 06:37:02 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata-heal implementation for ec) posted (#10) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 17 Anand Avati 2015-04-30 04:52:20 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata-heal implementation for ec) posted (#11) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 18 Anand Avati 2015-05-06 21:34:30 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#12) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 19 Anand Avati 2015-05-07 10:30:39 UTC

REVIEW: http://review.gluster.org/10298 (cluster/ec: metadata/name/entry heal implementation for ec) posted (#13) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 20 Anand Avati 2015-05-08 12:06:12 UTC

COMMIT: http://review.gluster.org/10298 committed in master by Vijay Bellur (vbellur) 
------
commit 33fdc310700da74a4142dab48d00c4753100904b
Author: Pranith Kumar K <pkarampu>
Date:   Thu Apr 16 09:25:31 2015 +0530

    cluster/ec: metadata/name/entry heal implementation for ec
    
    Metadata self-heal:
    1) Take inode lock in domain 'this->name' on 0-0 range (full file)
    2) perform lookup and get the xattrs on all the bricks
    3) Choose the brick with highest version as source
    4) Setattr uid/gid/permissions
    5) removexattr stale xattrs
    6) Setxattr existing/new xattrs
    7) xattrop with -ve values of 'dirty' and difference of highest and its own
       version values for version xattr
    8) unlock lock acquired in 1)
    
    Entry self-heal:
    1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent
       more than one self-heal
    2) we take directory lock in domain 'this->name' on 'NULL'
    3) Perform lookup on version, dirty and remember the values
    4) unlock lock acquired in 2)
    5) readdir on all the bricks and trigger name heals
    6) xattrop with -ve values of 'dirty' and difference of highest and its own
       version values for version xattr
    7) unlock lock acquired in 1)
    
    Name heal:
    1) Take 'name' lock in 'this->name' on 'NULL'
    2) Perform lookup on 'name' and get stat and xattr structures
    3) Build gfid_db where for each gfid we know what subvolumes/bricks have
       a file with 'name'
    4) Delete all the stale files i.e. the file does not exist on more than
       ec->redundancy number of bricks
    5) On all the subvolumes/bricks with missing entry create 'name' with same
       type,gfid,permissions etc.
    6) Unlock lock acquired in 1)
    Known limitation: At the moment with present design, it conservatively
    preserves the 'name' in case it can not decide whether to delete it.  this can
    happen in the following scenario:
    1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
    2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one
       of the bricks (Lets say B)
    3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2
       resulting in d1/f1 and d2/f2.
    
    Because we wanted to prevent data loss in the case above, the following
    scenario is not healable, i.e. it needs manual intervention:
    1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
    2) We have two hard links: d1/a, d2/b and another file d3/c even before the
       brick went down
    3) rename d3/c -> d2/b is performed
    4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will
       not be deleted.  One could think why not delete the link if there is
       more than 1 hardlink, but that leads to similar data loss issue I described
       earlier:
    Scenario:
    1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A)
    2) We have two hard links: d1/a, d2/b
    3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are
       successful only on one of the bricks (Lets say B)
    4) Now name self-heal on the 'names' above which can happen in parallel can
       decide to delete the file thinking it has 2 links but after all the
       self-heals do unlinks we are left with data loss.
    
    Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be
    BUG: 1213358
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/10298
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 21 Niels de Vos 2016-06-16 12:53:42 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.