Bug 830168

Summary:	Error message is inconsistent for the command "gluster volume heal <vol_name> full" when executed on multiple nodes
Product:	[Community] GlusterFS	Reporter:	Shwetha Panduranga <shwetha.h.panduranga>
Component:	glusterd	Assignee:	vsomyaju
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	3.3-beta	CC:	amarts, gluster-bugs, nsathyan
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.0	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-07-24 17:36:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Shwetha Panduranga 2012-06-08 12:46:31 UTC

Description of problem:
------------------------
On a replicate volume when "gluster volume heal <vol_name> full" command is executed on several node when few bricks are down, each node reports different error message. 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.3.0qa45


How reproducible:
-----------------
Often

Steps to Reproduce:
1.Create a replicate volume(1X3: brick1 on node1, brick2 on node2, brick3 on node3). Start the volume
2.Bring down brick1 and brick2
3.Create a fuse mount.
4.Execute : "dd if=/dev/urandom of=./file bs=1M count=1"
5.On machine1, machine2 and machine3 execute: gluster v heal <volume_name> full

Actual results:
----------------
[06/08/12 - 08:07:21 root@AFR-Server1 ~]# gluster v heal vol1 full
Operation failed on 10.16.159.196

[06/08/12 - 08:07:17 root@AFR-Server2 ~]# gluster v heal vol1 full
Operation failed on 10.16.159.196

[06/08/12 - 08:06:10 root@AFR-Server3 ~]# gluster v heal vol1 full
Heal operation on volume vol1 has been unsuccessful

Expected results:
-----------------
Error message should be consistent on all the nodes. 

Additional info:
----------------

[06/08/12 - 08:09:04 root@AFR-Server3 ~]# gluster v status
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.184:/export_b1/dir1			24009	N	8935
Brick 10.16.159.188:/export_b1/dir1			24009	N	12896
Brick 10.16.159.196:/export_b1/dir1			24009	Y	28360
NFS Server on localhost					38467	Y	28725
Self-heal Daemon on localhost				N/A	Y	28731
NFS Server on 10.16.159.184				38467	Y	8867
Self-heal Daemon on 10.16.159.184			N/A	Y	8873
NFS Server on 10.16.159.188				38467	Y	13263
Self-heal Daemon on 10.16.159.188			N/A	Y	13269
 
[06/08/12 - 08:09:08 root@AFR-Server3 ~]# 
[06/08/12 - 08:09:10 root@AFR-Server3 ~]# gluster v info
 
Volume Name: vol1
Type: Replicate
Volume ID: e5ff8b2b-7d44-405e-8266-54e5e68b0241
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.159.184:/export_b1/dir1
Brick2: 10.16.159.188:/export_b1/dir1
Brick3: 10.16.159.196:/export_b1/dir1
Options Reconfigured:
cluster.eager-lock: on
performance.write-behind: on

Comment 1 vsomyaju 2013-06-13 11:43:44 UTC

Bug got fixed in the current release from rebase.

Comment 2 vsomyaju 2013-06-13 11:44:41 UTC

Now all three nodes give same output.