Bug 1239021

Summary: AFR: gluster v restart force or brick process restart doesn't heal the files
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anil Shah <ashah>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED ERRATA QA Contact: Shruti Sampat <ssampat>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: annair, asrivast, atalur, divya, ravishankar, rhs-bugs, ssampat, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.1-13 Doc Type: Bug Fix
Doc Text:
Previously, the self-heal daemon was performing a crawl only on the brick that came up after it went down. So the pending heals were not happening immediately after the child is up, but only after the cluster.heal-timeout value. With the fix, index heal will be triggered on all local subvolumes of a replicated volume.
Story Points: ---
Clone Of:
: 1253309 (view as bug list) Environment:
Last Closed: 2015-10-05 07:18:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1223636, 1251815, 1253309, 1255690, 1256245    

Description Anil Shah 2015-07-03 09:35:03 UTC
Description of problem:

When one of the replica brick is down and do some file operation, gluster vol restart or brick process restart doesn't heal the files which needs to be healed.

Version-Release number of selected component (if applicable):

glusterfs-3.7.1-7.el6rhs.x86_64


How reproducible:

100%

Steps to Reproduce:

1. Create 2*2 distribute replicate volume
2. Do fuse mount 
3. create some files on mount point
4. kill one of the replica brick
5. rename the file from the mount point
6. check gluster v heal <volname> info
7. restart the volume or restart the brick process


Actual results:

Files are not healed


Expected results:

volume restart or brick process restart should heal the files which need to be healed

Additional info:

Volume Name: vol0
Type: Distributed-Replicate
Volume ID: 53c64343-c537-428c-b7b7-a45f198c42a0
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.33.214:/rhs/brick1/b001
Brick2: 10.70.33.219:/rhs/brick1/b002
Brick3: 10.70.33.225:/rhs/brick1/b003
Brick4: 10.70.44.13:/rhs/brick1/b004
Options Reconfigured:
performance.readdir-ahead: on
features.uss: enable
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
server.allow-insecure: on
features.barrier: disable
cluster.enable-shared-storage: enable

Comment 6 Anuradha 2015-08-18 06:52:31 UTC
Patch posted upstream - http://review.gluster.org/11912

Comment 7 Ravishankar N 2015-08-24 07:33:58 UTC
Downstream patch: https://code.engineering.redhat.com/gerrit/#/c/56024/

Comment 8 Shruti Sampat 2015-08-28 08:49:06 UTC
Verified as fixed in glusterfs-3.7.1-13.el7rhgs.x86_64. Heals are now happening as soon as volume is started with force.

Comment 9 Divya 2015-09-22 05:31:59 UTC
Ravishankar,

Made a few minor edits to the doc text. Could you review and sign-off?

Comment 10 Ravishankar N 2015-09-22 05:57:29 UTC
Looks okay to me.

Comment 12 errata-xmlrpc 2015-10-05 07:18:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html