1311839 – False positives in heal info

Bug 1311839 - False positives in heal info

Summary: False positives in heal info

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Pranith Kumar K
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster-HC-1 1299184
TreeView+	depends on / blocked

Reported:	2016-02-25 07:29 UTC by Sahina Bose
Modified:	2016-09-17 12:10 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.9-3
Doc Type:	Bug Fix
Doc Text:	When files were being checked by the self heal daemon and the 'gluster volume heal info' command was run simultaneously, the command's output listed files that were being checked but not healed as possibly being healed. This could not be corrected in a backward-compatible way, so a new option has been added to implement the fix behavior. To ensure that false positives are not listed in the heal info output for a volume, use the following command to set a granular locking scheme for that volume: # gluster volume set <volname> locking-scheme granular
Clone Of:
Environment:
Last Closed:	2016-06-23 05:09:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Sahina Bose 2016-02-25 07:29:36 UTC

Description of problem:

While monitoring if a gluster volume has files requiring heal (using nagios plugins), there are couple of entries that keep toggling between "Possibly undergoing heal" and not.

The workload is oVirt VMs running on gluster volume, and there are metadata files which are constantly being updated - this may be the cause for false positives.

This needs to be fixed, as otherwise the monitoring will not be reliable if it constantly keeps flapping.

Comment 5 Pranith Kumar K 2016-04-21 11:35:52 UTC

Please use the command:
"gluster volume set <volname> locking-scheme granular" for things to work properly.

Comment 7 Nag Pavan Chilakam 2016-05-26 12:19:37 UTC

QATp:
    TC#1:False positives must not be seen with heal info saying "possibly undergoing heals"

    1. Create a dist-rep volume and start it

    2. Mount the volume on two different clients

    3. Now create a dir and in that say files {1..10}

    4. Now from each of the mounts use dd to write into a file to append about 10GB data to one file each from each of the clients(make sure the files are in different subvols)

    5. Now while the dd is happening, trigger "gluster v heal info " on all the cluster nodes and either monitor through watch command or redirect them to a log file

    6. Now once the dd command is done, look into the log files where heal info was recorded and grep for "Possibly undergoing heal"

    It can be seen that there are false positives saying the files where dd is happening is possibly under heal, even though they are not(check the xattrs to confirm same)

    This is because of afr v1 option where two different nodes try to take a lock and can lead to this spurious error"

    7. Now for the same volume or a different volume set the option below:

    "gluster volume set <volname> locking-scheme granular" for things to work properly

    8. Now rerun steps 4 and 5 (the dd and parallel monitor)

    the spurios msg "Possibly undergoing heal" will not be seen

Comment 8 Nag Pavan Chilakam 2016-05-26 12:20:39 UTC

ran the above qatp on the below build:
glusterfs-cli-3.7.9-6.el7rhgs.x86_64
glusterfs-libs-3.7.9-6.el7rhgs.x86_64
glusterfs-fuse-3.7.9-6.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-6.el7rhgs.x86_64
glusterfs-server-3.7.9-6.el7rhgs.x86_64
python-gluster-3.7.9-5.el7rhgs.noarch
glusterfs-3.7.9-6.el7rhgs.x86_64
glusterfs-api-3.7.9-6.el7rhgs.x86_64



The false postives are not seen anymore , hence moving to verified

Comment 10 Pranith Kumar K 2016-06-06 16:46:15 UTC

When files, such as virtual machine metadata files, were checked by self-heal daemon to be healed or not if heal info command is executed in parallel, it was giving that the file is possibly undergoing heal even when it is not. File getting updated frequently is not the root cause.

Comment 14 errata-xmlrpc 2016-06-23 05:09:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.