Description of problem: ======================= In a replicate volume with 3 bricks, when 2 bricks goes offline and comes back online, the self-heal-daemon is not completely self-healing the files on to both the syncs. Version-Release number of selected component (if applicable): ============================================================= glusterfs 3git built on Mar 25 2014 05:20:53 Repository revision: git://git.gluster.com/glusterfs.git How reproducible: ================ Steps to Reproduce: ====================== 1. Create 1 x 3 replicate volume. Start the volume. 2. Create a fuse and nfs mount from the client. 3. Start dbench from fuse mount and Start dd on a file from nfs,fuse mount 4. Crashed brick2 disk (xfs_progs/xfstests/src/godown <brick_mount_point>) 5. After some time reboot the node3. Note:- IO's got stopped on the fuse mount as the glusterfs process got OOM Killed. 6. Bring back brick2 and brick3. (service glusterd restart) Actual results: ================= self-heal is pending on brick3. Brick1:- ========= root@rhs-client11 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b0/.glusterfs/indices/xattrop/ | wc 247 2216 18958 root@rhs-client11 [Mar-26-2014- 6:36:12] > Brick2:- ========= root@rhs-client12 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b1/.glusterfs/indices/xattrop/ | wc 247 2216 18958 root@rhs-client12 [Mar-26-2014- 6:36:12] > Brick3:- ========= root@rhs-client13 [Mar-26-2014- 6:36:08] >ls -l /rhs/bricks/vol_rep_b2/.glusterfs/indices/xattrop/ | wc 4 29 241 root@rhs-client13 [Mar-26-2014- 6:36:12] > Expected results: =================== After a while, the self-heal-daemon should completely self-heal all the files from brick1 to brick2 and brick3. Additional info: =================== root@rhs-client11 [Mar-26-2014- 6:46:33] >gluster v info Volume Name: vol_rep Type: Replicate Volume ID: 49001cac-32d9-461e-8432-647b46bb7a5a Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/bricks/vol_rep_b0 Brick2: rhs-client12:/rhs/bricks/vol_rep_b1 Brick3: rhs-client13:/rhs/bricks/vol_rep_b2 root@rhs-client11 [Mar-26-2014- 6:46:37] > root@rhs-client11 [Mar-26-2014- 6:46:44] >gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/bricks/vol_rep_b0 49152 Y 4179 Brick rhs-client12:/rhs/bricks/vol_rep_b1 49152 Y 21007 Brick rhs-client13:/rhs/bricks/vol_rep_b2 49152 Y 4447 NFS Server on localhost 2049 Y 13036 Self-heal Daemon on localhost N/A Y 13047 NFS Server on rhs-client12 2049 Y 21014 Self-heal Daemon on rhs-client12 N/A Y 21021 NFS Server on rhs-client13 2049 Y 4451 Self-heal Daemon on rhs-client13 N/A Y 4457 Task Status of Volume vol_rep ------------------------------------------------------------------------------ There are no active volume tasks root@rhs-client11 [Mar-26-2014- 6:46:46] >
Ravi, this seems similar to the bug you are working on. Please move the bug to your name if it is so. Pranith
Closing this as the BZ is old and there are numerous fixes that have gone in for heal related issues. Please feel free to re-open if the issue is still seen in latest versions.