Bug 1080946

Summary: AFR-V2 : Self-heal-daemon not completely self-healing all the files
Product: [Community] GlusterFS Reporter: spandura
Component: replicateAssignee: bugs <bugs>
Status: CLOSED DEFERRED QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: mainlineCC: atumball, bugs, ksubrahm, ravishankar, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-20 04:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2014-03-26 10:47:18 UTC
Description of problem:
=======================
In a replicate volume with 3 bricks, when 2 bricks goes offline and comes back online, the self-heal-daemon is not completely self-healing the files on to both the syncs. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3git built on Mar 25 2014 05:20:53
Repository revision: git://git.gluster.com/glusterfs.git

How reproducible:
================

Steps to Reproduce:
======================
1. Create 1 x 3 replicate volume. Start the volume. 

2. Create a fuse and nfs mount from the client. 

3. Start dbench from fuse mount and Start dd on a file from nfs,fuse mount

4. Crashed brick2 disk (xfs_progs/xfstests/src/godown <brick_mount_point>)

5. After some time reboot the node3.

Note:- IO's got stopped on the fuse mount as the glusterfs process got OOM Killed. 

6. Bring back brick2 and brick3. (service glusterd restart)

Actual results:
=================
self-heal is pending on brick3. 

Brick1:-
=========
root@rhs-client11 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b0/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client11 [Mar-26-2014- 6:36:12] >

Brick2:-
=========
root@rhs-client12 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b1/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client12 [Mar-26-2014- 6:36:12] >

Brick3:-
=========
root@rhs-client13 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b2/.glusterfs/indices/xattrop/ | wc
      4      29     241
root@rhs-client13 [Mar-26-2014- 6:36:12] >


Expected results:
===================
After a while, the self-heal-daemon should completely self-heal all the files from brick1 to brick2 and brick3.

Additional info:
===================
root@rhs-client11 [Mar-26-2014- 6:46:33] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 49001cac-32d9-461e-8432-647b46bb7a5a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/bricks/vol_rep_b0
Brick2: rhs-client12:/rhs/bricks/vol_rep_b1
Brick3: rhs-client13:/rhs/bricks/vol_rep_b2
root@rhs-client11 [Mar-26-2014- 6:46:37] >


root@rhs-client11 [Mar-26-2014- 6:46:44] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/bricks/vol_rep_b0		49152	Y	4179
Brick rhs-client12:/rhs/bricks/vol_rep_b1		49152	Y	21007
Brick rhs-client13:/rhs/bricks/vol_rep_b2		49152	Y	4447
NFS Server on localhost					2049	Y	13036
Self-heal Daemon on localhost				N/A	Y	13047
NFS Server on rhs-client12				2049	Y	21014
Self-heal Daemon on rhs-client12			N/A	Y	21021
NFS Server on rhs-client13				2049	Y	4451
Self-heal Daemon on rhs-client13			N/A	Y	4457
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks
 
root@rhs-client11 [Mar-26-2014- 6:46:46] >

Comment 1 Pranith Kumar K 2014-07-13 06:57:14 UTC
Ravi,
    this seems similar to the bug you are working on. Please move the bug to your name if it is so.

Pranith

Comment 2 Ravishankar N 2018-11-20 04:17:45 UTC
Closing this as the BZ is old and there are numerous fixes that have gone in for heal related issues. Please feel free  to re-open if the issue is still seen in latest versions.