Description of problem: 1) ec_manager_heal is doing xattr healing before data rebuilding. There is a chance that after it executes setxattr of EC_XATTR_VERSION to latest on the bad copies, it fails in data rebuilding. There is no way to detect that some of the copies are still not rebuilt. We need to make healing re-entrant. So first data rebuilding needs to happen. If it is successfully done, then xattr healing has to happen. 2) For doing the solution suggested in '1)' we also need to get the locking to be correct. i.e. a) we do not want multiple self-heals to happen in parallel. 3) Ec can already do 'name' healing. i.e. given a filename it will heal it correctly i.e. either delete it or create it based on which is the correct thing to do based on the 'version' of parent directory. The remaining code that needs to be added for full directory healing is to readdir on all the subvolumes of ec one by one and initiate this name-heal. This is going to be a very BIG and time consuming operation for ec xlator with lot of subvolumes. i.e. (4+2)6, (8+3)11, (8+4)12. For this release, this is the best we can do. I am planning to break the existing ec_manager_heal into 4 different heals i.e. 1) name heal 2) data heal 3) metadata heal 4) directory heal to achieve the things outlined above. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
REVIEW: http://review.gluster.org/10045 (cluster/ec: Entry self-heal fixes) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/10045 (cluster/ec: Entry self-heal fixes) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/10045 committed in master by Vijay Bellur (vbellur) ------ commit 0333ac8abf9d5d1cc95fea80fba098c7d2c4c8c3 Author: Pranith Kumar K <pkarampu> Date: Mon Mar 30 13:50:11 2015 +0530 cluster/ec: Entry self-heal fixes - Directory deletion should always happen with 'rm -rf' flag, otherwise the call may fail with ENOTEMPTY. - Instead of doing an explicit 'link' call, perform mknod call with GLUSTERFS_INTERNAL_FOP_KEY which acts as 'link' if the gfid already exists. Change-Id: I8826f92170421db37efb67dfc00afad4ab695907 BUG: 1207085 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/10045 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Xavier Hernandez <xhernandez>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user