Bug 1311839 - False positives in heal info
False positives in heal info
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
Unspecified Unspecified
high Severity medium
: ---
: RHGS 3.1.3
Assigned To: Pranith Kumar K
: ZStream
Depends On:
Blocks: Gluster-HC-1 1299184
  Show dependency treegraph
Reported: 2016-02-25 02:29 EST by Sahina Bose
Modified: 2016-09-17 08:10 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.7.9-3
Doc Type: Bug Fix
Doc Text:
When files were being checked by the self heal daemon and the 'gluster volume heal info' command was run simultaneously, the command's output listed files that were being checked but not healed as possibly being healed. This could not be corrected in a backward-compatible way, so a new option has been added to implement the fix behavior. To ensure that false positives are not listed in the heal info output for a volume, use the following command to set a granular locking scheme for that volume: # gluster volume set <volname> locking-scheme granular
Story Points: ---
Clone Of:
Last Closed: 2016-06-23 01:09:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1240 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 Update 3 2016-06-23 04:51:28 EDT

  None (edit)
Description Sahina Bose 2016-02-25 02:29:36 EST
Description of problem:

While monitoring if a gluster volume has files requiring heal (using nagios plugins), there are couple of entries that keep toggling between "Possibly undergoing heal" and not.

The workload is oVirt VMs running on gluster volume, and there are metadata files which are constantly being updated - this may be the cause for false positives.

This needs to be fixed, as otherwise the monitoring will not be reliable if it constantly keeps flapping.
Comment 5 Pranith Kumar K 2016-04-21 07:35:52 EDT
Please use the command:
"gluster volume set <volname> locking-scheme granular" for things to work properly.
Comment 7 nchilaka 2016-05-26 08:19:37 EDT
    TC#1:False positives must not be seen with heal info saying "possibly undergoing heals"

    1. Create a dist-rep volume and start it

    2. Mount the volume on two different clients

    3. Now create a dir and in that say files {1..10}

    4. Now from each of the mounts use dd to write into a file to append about 10GB data to one file each from each of the clients(make sure the files are in different subvols)

    5. Now while the dd is happening, trigger "gluster v heal info " on all the cluster nodes and either monitor through watch command or redirect them to a log file

    6. Now once the dd command is done, look into the log files where heal info was recorded and grep for "Possibly undergoing heal"

    It can be seen that there are false positives saying the files where dd is happening is possibly under heal, even though they are not(check the xattrs to confirm same)

    This is because of afr v1 option where two different nodes try to take a lock and can lead to this spurious error"

    7. Now for the same volume or a different volume set the option below:

    "gluster volume set <volname> locking-scheme granular" for things to work properly

    8. Now rerun steps 4 and 5 (the dd and parallel monitor)

    the spurios msg "Possibly undergoing heal" will not be seen
Comment 8 nchilaka 2016-05-26 08:20:39 EDT
ran the above qatp on the below build:

The false postives are not seen anymore , hence moving to verified
Comment 10 Pranith Kumar K 2016-06-06 12:46:15 EDT
When files, such as virtual machine metadata files, were checked by self-heal daemon to be healed or not if heal info command is executed in parallel, it was giving that the file is possibly undergoing heal even when it is not. File getting updated frequently is not the root cause.
Comment 14 errata-xmlrpc 2016-06-23 01:09:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.