1333705 – gluster volume heal info "healed" and "heal-failed" showing wrong information

Bug 1333705 - gluster volume heal info "healed" and "heal-failed" showing wrong information

Summary: gluster volume heal info "healed" and "heal-failed" showing wrong information

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	low
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Mohit Agrawal
QA Contact:	Vijay Avuthu
Docs Contact:
URL:
Whiteboard:	rebase
Duplicates (1):	1538779 (view as bug list)
Depends On:	1331340
Blocks:	1388509 1452915 1500658 1500660 1500662 1503134
TreeView+	depends on / blocked

Reported:	2016-05-06 07:51 UTC by Nag Pavan Chilakam
Modified:	2019-04-03 09:28 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1388509 1500658 1500660 1500662 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:27:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:29:17 UTC

Description Nag Pavan Chilakam 2016-05-06 07:51:24 UTC

Description of problem:
=====================
When we issue below commands on an afr volume to see heal information
gluster v heal <vname> info heal-failed
gluster v heal <vname> info healed
it must throw the write information.
But both the outputs are misleading as below:
"Gathering list of heal failed entries on volume olia has been unsuccessful on bricks that are down. Please check if all brick processes are running."

This could be a case of regression due to 1294612 - Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"

However, given that healed and heal-failed commands could be deprecated soon, hence moved the above mentioned bug to verified.

However raising this for tracking purpose


Version-Release number of selected component (if applicable):
===============
[root@dhcp35-191 ~]# rpm -qa|grep gluster
glusterfs-fuse-3.7.9-3.el7rhgs.x86_64
glusterfs-rdma-3.7.9-3.el7rhgs.x86_64
glusterfs-3.7.9-3.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-api-3.7.9-3.el7rhgs.x86_64
glusterfs-cli-3.7.9-3.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-3.el7rhgs.x86_64
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
glusterfs-libs-3.7.9-3.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-3.el7rhgs.x86_64
glusterfs-server-3.7.9-3.el7rhgs.x86_64
gluster-nagios-common-0.2.3-1.el7rhgs.noarch



there is already a bug on upstream,1331340 - heal-info heal-failed shows wrong output when all the bricks are online, but raising a seperate one downstream, as we hit it downstream too(could have cloned, but for certain reasons tought of raising new one)
You can change status accordingly

Comment 2 Nag Pavan Chilakam 2016-05-06 07:53:07 UTC

note, that all bricks were online

Comment 4 Ravishankar N 2017-05-21 04:18:09 UTC

Upstream patch by Mohit is pending reviews: https://review.gluster.org/#/c/15724/

Comment 7 Ravishankar N 2018-01-30 03:45:09 UTC

*** Bug 1538779 has been marked as a duplicate of this bug. ***

Comment 8 Vijay Avuthu 2018-02-19 06:52:24 UTC

verified in 3.12.2-4 build with 4 * 3 volume

> "info healed" and "info heal-failed" has been deprecated in 3.4 and the same has been verified

# gluster vol heal 43 info healed

Usage:
volume heal <VOLNAME> [enable | disable | full |statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [summary | split-brain] |split-brain {bigger-file <FILE> | latest-mtime <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]} |granular-entry-heal {enable | disable}]



# gluster vol heal 43 info heal-failed

Usage:
volume heal <VOLNAME> [enable | disable | full |statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [summary | split-brain] |split-brain {bigger-file <FILE> | latest-mtime <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]} |granular-entry-heal {enable | disable}]

# gluster vol help | grep -i heal
volume heal <VOLNAME> [enable | disable | full |statistics [heal-count [replica <HOSTNAME:BRICKNAME>]] |info [summary | split-brain] |split-brain {bigger-file <FILE> | latest-mtime <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]} |granular-entry-heal {enable | disable}] - self-heal commands on volume specified by <VOLNAME>
#


Changing status to Verified

Comment 9 David Galloway 2018-08-27 13:45:18 UTC

+1

I randomly hit this bug on 08-25-2018 and it has started causing my RHV VMs to pause due to unknown storage errors.

Comment 11 errata-xmlrpc 2018-09-04 06:27:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.