Bug 830168

Summary: Error message is inconsistent for the command "gluster volume heal <vol_name> full" when executed on multiple nodes
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: glusterdAssignee: vsomyaju
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.3-betaCC: amarts, gluster-bugs, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:36:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shwetha Panduranga 2012-06-08 12:46:31 UTC
Description of problem:
------------------------
On a replicate volume when "gluster volume heal <vol_name> full" command is executed on several node when few bricks are down, each node reports different error message. 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.3.0qa45


How reproducible:
-----------------
Often

Steps to Reproduce:
1.Create a replicate volume(1X3: brick1 on node1, brick2 on node2, brick3 on node3). Start the volume
2.Bring down brick1 and brick2
3.Create a fuse mount.
4.Execute : "dd if=/dev/urandom of=./file bs=1M count=1"
5.On machine1, machine2 and machine3 execute: gluster v heal <volume_name> full

Actual results:
----------------
[06/08/12 - 08:07:21 root@AFR-Server1 ~]# gluster v heal vol1 full
Operation failed on 10.16.159.196

[06/08/12 - 08:07:17 root@AFR-Server2 ~]# gluster v heal vol1 full
Operation failed on 10.16.159.196

[06/08/12 - 08:06:10 root@AFR-Server3 ~]# gluster v heal vol1 full
Heal operation on volume vol1 has been unsuccessful

Expected results:
-----------------
Error message should be consistent on all the nodes. 

Additional info:
----------------

[06/08/12 - 08:09:04 root@AFR-Server3 ~]# gluster v status
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.184:/export_b1/dir1			24009	N	8935
Brick 10.16.159.188:/export_b1/dir1			24009	N	12896
Brick 10.16.159.196:/export_b1/dir1			24009	Y	28360
NFS Server on localhost					38467	Y	28725
Self-heal Daemon on localhost				N/A	Y	28731
NFS Server on 10.16.159.184				38467	Y	8867
Self-heal Daemon on 10.16.159.184			N/A	Y	8873
NFS Server on 10.16.159.188				38467	Y	13263
Self-heal Daemon on 10.16.159.188			N/A	Y	13269
 
[06/08/12 - 08:09:08 root@AFR-Server3 ~]# 
[06/08/12 - 08:09:10 root@AFR-Server3 ~]# gluster v info
 
Volume Name: vol1
Type: Replicate
Volume ID: e5ff8b2b-7d44-405e-8266-54e5e68b0241
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export_b1/dir1
Brick2: 10.16.159.188:/export_b1/dir1
Brick3: 10.16.159.196:/export_b1/dir1
Options Reconfigured:
cluster.eager-lock: on
performance.write-behind: on

Comment 1 vsomyaju 2013-06-13 11:43:44 UTC
Bug got fixed in the current release from rebase.

Comment 2 vsomyaju 2013-06-13 11:44:41 UTC
Now all three nodes give same output.