Bug 1080946

Summary:	AFR-V2 : Self-heal-daemon not completely self-healing all the files
Product:	[Community] GlusterFS	Reporter:	spandura
Component:	replicate	Assignee:	bugs <bugs>
Status:	CLOSED DEFERRED	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	medium
Version:	mainline	CC:	atumball, bugs, ksubrahm, ravishankar, smohan
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-20 04:17:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description spandura 2014-03-26 10:47:18 UTC

Description of problem:
=======================
In a replicate volume with 3 bricks, when 2 bricks goes offline and comes back online, the self-heal-daemon is not completely self-healing the files on to both the syncs. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3git built on Mar 25 2014 05:20:53
Repository revision: git://git.gluster.com/glusterfs.git

How reproducible:
================

Steps to Reproduce:
======================
1. Create 1 x 3 replicate volume. Start the volume. 

2. Create a fuse and nfs mount from the client. 

3. Start dbench from fuse mount and Start dd on a file from nfs,fuse mount

4. Crashed brick2 disk (xfs_progs/xfstests/src/godown <brick_mount_point>)

5. After some time reboot the node3.

Note:- IO's got stopped on the fuse mount as the glusterfs process got OOM Killed. 

6. Bring back brick2 and brick3. (service glusterd restart)

Actual results:
=================
self-heal is pending on brick3. 

Brick1:-
=========
root@rhs-client11 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b0/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client11 [Mar-26-2014- 6:36:12] >

Brick2:-
=========
root@rhs-client12 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b1/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client12 [Mar-26-2014- 6:36:12] >

Brick3:-
=========
root@rhs-client13 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b2/.glusterfs/indices/xattrop/ | wc
      4      29     241
root@rhs-client13 [Mar-26-2014- 6:36:12] >


Expected results:
===================
After a while, the self-heal-daemon should completely self-heal all the files from brick1 to brick2 and brick3.

Additional info:
===================
root@rhs-client11 [Mar-26-2014- 6:46:33] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 49001cac-32d9-461e-8432-647b46bb7a5a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/bricks/vol_rep_b0
Brick2: rhs-client12:/rhs/bricks/vol_rep_b1
Brick3: rhs-client13:/rhs/bricks/vol_rep_b2
root@rhs-client11 [Mar-26-2014- 6:46:37] >


root@rhs-client11 [Mar-26-2014- 6:46:44] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/bricks/vol_rep_b0		49152	Y	4179
Brick rhs-client12:/rhs/bricks/vol_rep_b1		49152	Y	21007
Brick rhs-client13:/rhs/bricks/vol_rep_b2		49152	Y	4447
NFS Server on localhost					2049	Y	13036
Self-heal Daemon on localhost				N/A	Y	13047
NFS Server on rhs-client12				2049	Y	21014
Self-heal Daemon on rhs-client12			N/A	Y	21021
NFS Server on rhs-client13				2049	Y	4451
Self-heal Daemon on rhs-client13			N/A	Y	4457
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks
 
root@rhs-client11 [Mar-26-2014- 6:46:46] >

Comment 1 Pranith Kumar K 2014-07-13 06:57:14 UTC

Ravi,
    this seems similar to the bug you are working on. Please move the bug to your name if it is so.

Pranith

Comment 2 Ravishankar N 2018-11-20 04:17:45 UTC

Closing this as the BZ is old and there are numerous fixes that have gone in for heal related issues. Please feel free  to re-open if the issue is still seen in latest versions.