Bug 1255690 - AFR: gluster v restart force or brick process restart doesn't heal the files
AFR: gluster v restart force or brick process restart doesn't heal the files
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.3
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Ravishankar N
: Triaged
Depends On: 1239021 1253309 1256245
Blocks: 1223636
  Show dependency treegraph
 
Reported: 2015-08-21 06:44 EDT by Ravishankar N
Modified: 2015-10-01 07:03 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.7.4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1253309
Environment:
Last Closed: 2015-09-09 05:40:21 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ravishankar N 2015-08-21 06:44:57 EDT
+++ This bug was initially created as a clone of Bug #1253309 +++

Description of problem:

When one of the replica brick is down and do some file operation, gluster vol restart or brick process restart doesn't heal the files which needs to be healed.

Version-Release number of selected component (if applicable):

glusterfs-3.7.1-7.el6rhs.x86_64


How reproducible:

100%

Steps to Reproduce:

1. Create 2*2 distribute replicate volume
2. Do fuse mount 
3. create some files on mount point
4. kill one of the replica brick
5. rename the file from the mount point
6. check gluster v heal <volname> info
7. restart the volume or restart the brick process


Actual results:

Files are not healed


Expected results:

volume restart or brick process restart should heal the files which need to be healed

Additional info:

Volume Name: vol0
Type: Distributed-Replicate
Volume ID: 53c64343-c537-428c-b7b7-a45f198c42a0
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.33.214:/rhs/brick1/b001
Brick2: 10.70.33.219:/rhs/brick1/b002
Brick3: 10.70.33.225:/rhs/brick1/b003
Brick4: 10.70.44.13:/rhs/brick1/b004
Options Reconfigured:
performance.readdir-ahead: on
features.uss: enable
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
server.allow-insecure: on
features.barrier: disable
cluster.enable-shared-storage: enable


--- Additional comment from Ravishankar N on 2015-07-03 05:45:57 EDT ---

Currently in AFR-v2, when a CHILD_UP notification is received, the index heal is triggered only on that particular child. The fix is to trigger the index heal on all local children.

While this is a bug, it is not a blocker because the files will eventually get healed in 10 minutes (default heal timeout value) or when the heal command is explicitly launched via the gluster CLI.

--- Additional comment from Ravishankar N on 2015-08-13 09:15:30 EDT ---

http://review.gluster.org/#/c/11912/

--- Additional comment from Anand Avati on 2015-08-14 05:14:59 EDT ---

REVIEW: http://review.gluster.org/11912 (afr: launch index heal on local subvols up on a child-up event) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-08-14 05:44:03 EDT ---

REVIEW: http://review.gluster.org/11912 (afr: launch index heal on local subvols up on a child-up event) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2015-08-21 01:29:18 EDT ---

REVIEW: http://review.gluster.org/11912 (afr: launch index heal on local subvols up on a child-up event) posted (#4) for review on master by Ravishankar N (ravishankar@redhat.com)
Comment 1 Anand Avati 2015-08-21 06:46:35 EDT
REVIEW: http://review.gluster.org/11982 (afr: launch index heal on local subvols up on a child-up event) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)
Comment 2 Anand Avati 2015-08-24 02:54:25 EDT
COMMIT: http://review.gluster.org/11982 committed in release-3.7 by Raghavendra G (rgowdapp@redhat.com) 
------
commit 246dae5b89770a4642d4fbe4650a44475144c55a
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Thu Aug 13 18:33:08 2015 +0530

    afr: launch index heal on local subvols up on a child-up event
    
    Backport of http://review.gluster.org/#/c/11912/
    Problem:
    When a replica's child goes down and comes up, the index heal is
    triggered only on the child that just came up. This does not serve the
    intended purpose as the list of files that need to be healed
    to this child is actually captured on the other child of the replica.
    
    Fix:
    Launch index-heal on all local children of the replica xlator which just
    received a child up. Note that afr_selfheal_childup() eventually calls
    afr_shd_index_healer() which will not run the heal on non-local
    children.
    
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    
    Change-Id: Ia23e47d197f983c695ec0bcd283e74931119ee55
    BUG: 1255690
    Reviewed-on: http://review.gluster.org/11982
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Comment 3 Kaushal 2015-09-09 05:40:21 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.4, please open a new bug report.

glusterfs-3.7.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/12496
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.