Bug 1023395

Summary: healed file information is reported under sync node in "heal info healed" command output and in sync node's glustershd.log file
Product: Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: spandura
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: rhs-bugs, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-11 05:26:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
SOS Reports none

Description spandura 2013-10-25 10:41:17 UTC
Description of problem:
=========================
In a 1 x 2 replicate volume while running ping_pong on a file one of the brick process went offline. Killed all the mount process and unmounted the mount points.

Brought back the brick online. Self-heal is happening from source to sync. But the information is reported in sync node glustershd.log file and reported under sync node when "heal info healed" command is executed. 

Version-Release number of selected component (if applicable):
==============================================================
glusterfs 3.4.0.36rhs built on Oct 22 2013 10:56:18


How reproducible:
================
Often

Steps to Reproduce:
=====================
1. Create replicate volume ( 1 x 2 ). Start the volume.

root@king [Sep-02-2013-12:31:44] >gluster v info
 
Volume Name: vol_dis_1_rep_2
Type: Replicate
Volume ID: 15a1734e-8485-4ef2-a82b-ddafff2fc97e
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: hicks.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b0
Brick2: king.lab.eng.blr.redhat.com:/rhs/bricks/vol_dis_1_rep_2_b1
Options Reconfigured:
performance.write-behind: on
cluster.self-heal-daemon: on

2. Create 4 fuse mounts. 

3. From all the mount points start ping_pong : "ping_pong -rw ping_pong_testfile 6"

4. While ping_pong is in progress get the brick pid. kill a brick (brick1) (kill -KILL <brick_pid>)

root@rhs-client11 [Oct-25-2013- 9:33:57] >ps -ef | grep glusterfsd
root      1272     1  2 09:27 ?        00:00:08 /usr/sbin/glusterfsd -s rhs-client11 --volfile-id vol_rep.rhs-client11.rhs-bricks-b1 -p /var/lib/glusterd/vols/vol_rep/run/rhs-client11-rhs-bricks-b1.pid -S /var/run/fa6cf6fce4458a2be5fc60a4dc3bc11d.socket --brick-name /rhs/bricks/b1 -l /var/log/glusterfs/bricks/rhs-bricks-b1.log --xlator-option *-posix.glusterd-uuid=8b2090ab-c382-4c8a-85ea-ebab93df4c24 --brick-port 49153 --xlator-option vol_rep-server.listen-port=49153

kill -KILL 1272

5. After some time, kill all the mount process and unmount mount points. 

6. Bring back the brick online by starting the brick from command line. For example: "/usr/sbin/glusterfsd -s rhs-client11 --volfile-id vol_rep.rhs-client11.rhs-bricks-b1 -p /var/lib/glusterd/vols/vol_rep/run/rhs-client11-rhs-bricks-b1.pid -S /var/run/fa6cf6fce4458a2be5fc60a4dc3bc11d.socket --brick-name /rhs/bricks/b1 -l /var/log/glusterfs/bricks/rhs-bricks-b1.log --xlator-option *-posix.glusterd-uuid=8b2090ab-c382-4c8a-85ea-ebab93df4c24 --brick-port 49153 --xlator-option vol_rep-server.listen-port=49153"

Actual results:
=================
root@rhs-client12 [Oct-25-2013- 9:34:32] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/bricks/b1			N/A	N	1272
Brick rhs-client12:/rhs/bricks/b2			49153	Y	29620
NFS Server on localhost					2049	Y	29867
Self-heal Daemon on localhost				N/A	Y	29875
NFS Server on rhs-client13				2049	Y	23263
Self-heal Daemon on rhs-client13			N/A	Y	23267
NFS Server on rhs-client11				2049	Y	1226
Self-heal Daemon on rhs-client11			N/A	Y	1237

root@rhs-client12 [Oct-25-2013- 9:35:03] >gluster v heal vol_rep info
Gathering list of entries to be healed on volume vol_rep has been successful 

Brick rhs-client11:/rhs/bricks/b1
Status: Brick is Not connected
Number of entries: 0

Brick rhs-client12:/rhs/bricks/b2
Number of entries: 1
/ping_pong_testfile

root@rhs-client11 [Oct-25-2013- 9:33:57] >ps -ef | grep glusterfsd
root      1272     1  2 09:27 ?        00:00:08 /usr/sbin/glusterfsd -s rhs-client11 --volfile-id vol_rep.rhs-client11.rhs-bricks-b1 -p /var/lib/glusterd/vols/vol_rep/run/rhs-client11-rhs-bricks-b1.pid -S /var/run/fa6cf6fce4458a2be5fc60a4dc3bc11d.socket --brick-name /rhs/bricks/b1 -l /var/log/glusterfs/bricks/rhs-bricks-b1.log --xlator-option *-posix.glusterd-uuid=8b2090ab-c382-4c8a-85ea-ebab93df4c24 --brick-port 49153 --xlator-option vol_rep-server.listen-port=49153

root@rhs-client12 [Oct-25-2013- 9:35:09] >gluster v heal vol_rep info
Gathering list of entries to be healed on volume vol_rep has been successful 

Brick rhs-client11:/rhs/bricks/b1
Number of entries: 0

Brick rhs-client12:/rhs/bricks/b2
Number of entries: 0

root@rhs-client12 [Oct-25-2013- 9:35:13] >gluster v heal vol_rep info healed
Gathering list of healed entries on volume vol_rep has been successful 

Brick rhs-client11:/rhs/bricks/b1
Number of entries: 1
at                    path on brick
-----------------------------------
2013-10-25 09:35:10 <gfid:e85f328d-423e-4488-9a03-e018ee85db77>

Brick rhs-client12:/rhs/bricks/b2
Number of entries: 0

Expected results:
==================
1. The files which are self-healed should be reported under source storage node.

2. The file healed information should be reported in source glustershd.log file.

Comment 1 spandura 2013-10-25 10:52:20 UTC
Created attachment 816078 [details]
SOS Reports

Comment 3 spandura 2014-06-11 05:26:43 UTC
The command "gluster volume heal <volume_name> info healed" is not supported anymore from the gluster build : 

"[root@rhs-client11 ~]# gluster --version
glusterfs 3.6.0.15 built on Jun  9 2014 11:03:54"

Refer to bug : https://bugzilla.redhat.com/show_bug.cgi?id=1104486

Hence this bug is not valid anymore. Moving the bug to CLOSED state.