1080946 – AFR-V2 : Self-heal-daemon not completely self-healing all the files

Bug 1080946 - AFR-V2 : Self-heal-daemon not completely self-healing all the files

Summary: AFR-V2 : Self-heal-daemon not completely self-healing all the files

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	urgent
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-26 10:47 UTC by spandura
Modified:	2018-11-20 04:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-11-20 04:17:45 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description spandura 2014-03-26 10:47:18 UTC

Description of problem:
=======================
In a replicate volume with 3 bricks, when 2 bricks goes offline and comes back online, the self-heal-daemon is not completely self-healing the files on to both the syncs. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3git built on Mar 25 2014 05:20:53
Repository revision: git://git.gluster.com/glusterfs.git

How reproducible:
================

Steps to Reproduce:
======================
1. Create 1 x 3 replicate volume. Start the volume. 

2. Create a fuse and nfs mount from the client. 

3. Start dbench from fuse mount and Start dd on a file from nfs,fuse mount

4. Crashed brick2 disk (xfs_progs/xfstests/src/godown <brick_mount_point>)

5. After some time reboot the node3.

Note:- IO's got stopped on the fuse mount as the glusterfs process got OOM Killed. 

6. Bring back brick2 and brick3. (service glusterd restart)

Actual results:
=================
self-heal is pending on brick3. 

Brick1:-
=========
root@rhs-client11 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b0/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client11 [Mar-26-2014- 6:36:12] >

Brick2:-
=========
root@rhs-client12 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b1/.glusterfs/indices/xattrop/ | wc
    247    2216   18958
root@rhs-client12 [Mar-26-2014- 6:36:12] >

Brick3:-
=========
root@rhs-client13 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b2/.glusterfs/indices/xattrop/ | wc
      4      29     241
root@rhs-client13 [Mar-26-2014- 6:36:12] >


Expected results:
===================
After a while, the self-heal-daemon should completely self-heal all the files from brick1 to brick2 and brick3.

Additional info:
===================
root@rhs-client11 [Mar-26-2014- 6:46:33] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 49001cac-32d9-461e-8432-647b46bb7a5a
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/bricks/vol_rep_b0
Brick2: rhs-client12:/rhs/bricks/vol_rep_b1
Brick3: rhs-client13:/rhs/bricks/vol_rep_b2
root@rhs-client11 [Mar-26-2014- 6:46:37] >


root@rhs-client11 [Mar-26-2014- 6:46:44] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/bricks/vol_rep_b0		49152	Y	4179
Brick rhs-client12:/rhs/bricks/vol_rep_b1		49152	Y	21007
Brick rhs-client13:/rhs/bricks/vol_rep_b2		49152	Y	4447
NFS Server on localhost					2049	Y	13036
Self-heal Daemon on localhost				N/A	Y	13047
NFS Server on rhs-client12				2049	Y	21014
Self-heal Daemon on rhs-client12			N/A	Y	21021
NFS Server on rhs-client13				2049	Y	4451
Self-heal Daemon on rhs-client13			N/A	Y	4457
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks
 
root@rhs-client11 [Mar-26-2014- 6:46:46] >

Comment 1 Pranith Kumar K 2014-07-13 06:57:14 UTC

Ravi,
    this seems similar to the bug you are working on. Please move the bug to your name if it is so.

Pranith

Comment 2 Ravishankar N 2018-11-20 04:17:45 UTC

Closing this as the BZ is old and there are numerous fixes that have gone in for heal related issues. Please feel free  to re-open if the issue is still seen in latest versions.

Note You need to log in before you can comment on or make changes to this bug.