Bug 1207085 - ec heal improvements
Summary: ec heal improvements
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-30 08:17 UTC by Pranith Kumar K
Modified: 2015-05-14 17:46 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.7.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-14 17:29:25 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2015-03-30 08:17:42 UTC
Description of problem:
1) ec_manager_heal is doing xattr healing before data rebuilding. There is a chance that after it executes setxattr of EC_XATTR_VERSION to latest on the bad copies, it fails in data rebuilding. There is no way to detect that some of the copies are still not rebuilt.

We need to make healing re-entrant. So first data rebuilding needs to happen. If it is successfully done, then xattr healing has to happen.

2) For doing the solution suggested in '1)' we also need to get the locking to be correct. i.e. a) we do not want multiple self-heals to happen in parallel.

3) Ec can already do 'name' healing. i.e. given a filename it will heal it correctly i.e. either delete it or create it based on which is the correct thing to do based on the 'version' of parent directory. The remaining code that needs to be added for full directory healing is to readdir on all the subvolumes of ec one by one and initiate this name-heal. This is going to be a very BIG and time consuming operation for ec xlator with lot of subvolumes. i.e. (4+2)6, (8+3)11, (8+4)12. For this release, this is the best we can do.

I am planning to break the existing ec_manager_heal into 4 different heals i.e. 1) name heal 2) data heal 3) metadata heal 4) directory heal to achieve the things outlined above. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2015-03-30 09:56:24 UTC
REVIEW: http://review.gluster.org/10045 (cluster/ec: Entry self-heal fixes) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2015-04-12 07:50:45 UTC
REVIEW: http://review.gluster.org/10045 (cluster/ec: Entry self-heal fixes) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2015-04-13 11:03:29 UTC
COMMIT: http://review.gluster.org/10045 committed in master by Vijay Bellur (vbellur) 
------
commit 0333ac8abf9d5d1cc95fea80fba098c7d2c4c8c3
Author: Pranith Kumar K <pkarampu>
Date:   Mon Mar 30 13:50:11 2015 +0530

    cluster/ec: Entry self-heal fixes
    
    - Directory deletion should always happen with 'rm -rf' flag, otherwise the
      call may fail with ENOTEMPTY.
    - Instead of doing an explicit 'link' call, perform mknod call with
      GLUSTERFS_INTERNAL_FOP_KEY which acts as 'link' if the
      gfid already exists.
    
    Change-Id: I8826f92170421db37efb67dfc00afad4ab695907
    BUG: 1207085
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/10045
    Tested-by: NetBSD Build System
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Xavier Hernandez <xhernandez>

Comment 4 Niels de Vos 2015-05-14 17:29:25 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 5 Niels de Vos 2015-05-14 17:35:55 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:38:17 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:46:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.