1294612 – Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"

Bug 1294612 - Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"

Summary: Self heal command gives error "Launching heal operation to perform index self...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Ravishankar N
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	1200252
Blocks:	1299184 1302291 1306922
TreeView+	depends on / blocked

Reported:	2015-12-29 09:35 UTC by Ravishankar N
Modified:	2019-11-14 07:16 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.9-1
Doc Type:	Bug Fix
Doc Text:	Previously, if any bricks were down when an index heal was started, the following message was displayed: "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful". This was confusing to users because the heal could still succeed if at least one source and one sink brick were available. The message has been modified to avoid confusion, and now reads: "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."
Clone Of:	1200252
Clones:	1302291 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:00:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Comment 7 Nag Pavan Chilakam 2016-05-02 13:16:11 UTC

QATP:
====

TC#1: heal command must throw more meaningful message instead of saying heal unsuccessful when the heal is happening successfully

1. have a x3 volume and start it

2.Now mount the volume and write a file of say 1GB

3. bring down one brick and keep writing data to the file

4. Now bring up the brick which was down, and immediately bring down another brick

Info: Now as there are mulitple sources for x3 and given that one source was brought down in step 4., that means there is still another source available to heal the data of the first brick brought down.

6. Now issue a heal command immediately

Expected Behavior:the heal must complete successfully but the error message thrown by cli must be more clearly talking about down bricks

Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful""

Changed Behavior with fix: heal must throw more meaningful o/p as "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running.""

TC#2: heal command should say that triggering heal is unsuccessful as some bricks may be down
1. have a x2 volume and start it

2.Now mount the volume and write a file of say 1GB

3. bring down one brick and keep writing data to the file

4. Now issue a heal on the volume "glust v heal <vname>"

Expected Behavior:

Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful""

TC#3: heal command should say that triggering heal is unsuccessful as some bricks may be down

1. have a x3 volume and start it

2.Now mount the volume and write a file of say 1GB

3. bring down one brick and keep writing data to the file

4. Now issue a heal on the volume "glust v heal <vname>"

Expected Behavior:

Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful""

TC#4: heal info must throw proper output when one of the multiple source brick is brought down

1. have a x3 volume and start it

2.Now mount the volume and write a file of say 1GB

3. bring down one brick and keep writing data to the file

4. Now bring up the brick which was down, and after a few seconds bring down another brick

6. Now issue a heal command immediately

7. Also issue a heal info command.-------->FAILs with duplicate entries for same file

Expected Behavior:the heal must complete successfully but the error message thrown by cli must be more clearly talking about down bricks

Previous behavior: the heal command used to throw following error " Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful""

Comment 8 Nag Pavan Chilakam 2016-05-02 13:18:20 UTC

Ran the above published QATP, following is the result:
TC#1 Passed--->main case to validate the fix
TC#2 Passed
TC#3 Passed
TC#4 failed. Raised a seperate bug 1332194 - gluster volume heal info throwing duplicate file or gfid entries 
But as this failure is not really related to this fix is not regressed due to this fix, hence moving this fix to verified

version tested:
glusterfs-client-xlators-3.7.9-2.el7rhgs.x86_64
glusterfs-server-3.7.9-2.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-3.7.9-2.el7rhgs.x86_64
glusterfs-api-3.7.9-2.el7rhgs.x86_64
glusterfs-cli-3.7.9-2.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-2.el7rhgs.x86_64
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-libs-3.7.9-2.el7rhgs.x86_64
glusterfs-fuse-3.7.9-2.el7rhgs.x86_64
glusterfs-rdma-3.7.9-2.el7rhgs.x86_64
[root@dhcp35-191 glusterfs]#

Comment 9 Nag Pavan Chilakam 2016-05-06 07:52:31 UTC

raised a bug 1333705 - gluster volume heal info "healed" and "heal-failed" showing wrong information
which could be coz of this fix. but given the commands are to be deprecated, hence not severe

Comment 13 errata-xmlrpc 2016-06-23 05:00:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.