Bug 1322850

Summary:	Healing queue rarely empty
Product:	[Community] GlusterFS	Reporter:	Pranith Kumar K <pkarampu>
Component:	replicate	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	bugs, hgowtham, nicolas
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8rc2	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1294675	Environment:
Last Closed:	2016-06-16 14:02:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1294675
Bug Blocks:

Description Pranith Kumar K 2016-03-31 12:38:28 UTC

+++ This bug was initially created as a clone of Bug #1294675 +++

Description of problem:
From the command line of each host, and now constantly monitored by our Nagios/Centreon setup, we see that our 3 nodes replica-3 gluster storage volume is very frequently healing files, not to say constantly.

Version-Release number of selected component (if applicable):
Our setup : 3 Centos 7.2 nodes, with gluster 3.7.6 in replica-3, used as storage+compute for an oVirt 3.5.6 DC.

How reproducible:
Install an oVirt setup on 3 nodes with glusterFS as direct gluster storage.
We have only 3 VMs running on it, so approx not more than 8 files (yes : only 8 files - the VM qemu files).

Steps to Reproduce:
1. Just run it and watch : all is nice
2. Run "gluster volume heal some_vol info" on random nodes
3. Read that more than zero files are getting healed

Actual results:
More than zero files are getting healed

Expected results:
I expected the "Number of entries" of every node to appear in the graph as a flat zero line, most of the times, except for the rare cases of node reboot, after which healing is launched and takes some minutes (sometimes hours) but is doing good. 

Additional info:
At first, I found out that I forgot to bump up the cluster.op-version, but this has been done, everything rebooted and back to up.
But this DC is very lightly used, and I'm sure the gluster clients (that are the gluster nodes themselves) should read and write in a synchronous and proper way, not leading to any healing need.

Please see :
https://www.mail-archive.com/gluster-users@gluster.org/msg22890.html

--- Additional comment from Pranith Kumar K on 2016-01-11 04:45:59 EST ---

hi Nicolas Ecarnot,
      Thanks for raising the bug. "gluster volume heal <volname> info" is designed to be run one per the volume. If we run multiple processes it may lead to "Possibly undergoing heal" messages as the two try to take same locks and they will fail.

Pranith

--- Additional comment from Nicolas Ecarnot on 2016-01-11 04:48:11 EST ---

(In reply to Pranith Kumar K from comment #1)
> hi Nicolas Ecarnot,
>       Thanks for raising the bug. "gluster volume heal <volname> info" is
> designed to be run one per the volume. If we run multiple processes it may
> lead to "Possibly undergoing heal" messages as the two try to take same
> locks and they will fail.
> 
> Pranith

Thank you Pranith for your answer.

Do you advice us to setup our Nagios/Centreon to run only *ONE* check per volume?
If so, please don't close this bug, let us change the setup, wait one week and I'll report the result here.

Tell me.

--- Additional comment from Pranith Kumar K on 2016-01-18 05:56:32 EST ---

hi Nicolas Ecarnot,
      Sorry for the delay. Sure doing that will definitely help us. There could still be one corner case of self-heal-daemon and heal info conflicting for same locks. But I would like to hear more from you.

Pranith

--- Additional comment from Nicolas Ecarnot on 2016-01-18 08:35:28 EST ---

(In reply to Pranith Kumar K from comment #3)
> hi Nicolas Ecarnot,
>       Sorry for the delay. Sure doing that will definitely help us. There
> could still be one corner case of self-heal-daemon and heal info conflicting
> for same locks. But I would like to hear more from you.
> 
> Pranith

On january 12, 2016, we modified our Nagios/Centreon to offset the checks of our 3 nodes'healing status.

2 weeks later, the graphs are showing a great decrease of healing cases, though not null.
This sounds encouraging.

Being recently noticed about sharding, this is the next feature to try and see whether it could improve the healing cases.
I let you decide if this is enough to close this bug - my opinion is that I'm still surprised that the healing cases is *not* constantly zero, but you choose.

Comment 1 Vijay Bellur 2016-03-31 12:40:37 UTC

REVIEW: http://review.gluster.org/13873 (cluster/afr: Fix spurious entries in heal info) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Vijay Bellur 2016-04-05 02:27:27 UTC

REVIEW: http://review.gluster.org/13873 (cluster/afr: Fix spurious entries in heal info) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Vijay Bellur 2016-04-15 11:29:44 UTC

REVIEW: http://review.gluster.org/13873 (cluster/afr: Fix spurious entries in heal info) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Vijay Bellur 2016-04-20 11:51:31 UTC

COMMIT: http://review.gluster.org/13873 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit b6a0780d86e7c6afe7ae0d9a87e6fe5c62b4d792
Author: Pranith Kumar K <pkarampu>
Date:   Thu Mar 31 14:40:09 2016 +0530

    cluster/afr: Fix spurious entries in heal info
    
    Problem:
    Locking schemes in afr-v1 were locking the directory/file completely during
    self-heal. Newer schemes of locking don't require Full directory, file locking.
    But afr-v2 still has compatibility code to work-well with older clients, where
    in entry-self-heal it takes a lock on a special 256 character name which can't
    be created on the fs. Similarly for data self-heal there used to be a lock on
    (LLONG_MAX-2, 1). Old locking scheme requires heal info to take sh-domain locks
    before examining heal-state.  If it doesn't take sh-domain locks, then there is
    a possibility of heal-info hanging till self-heal completes because of
    compatibility locks.  But the problem with heal-info taking sh-domain locks is
    that if two heal-info or shd, heal-info try to inspect heal state in parallel
    using trylocks on sh-domain, there is a possibility that both of them assuming
    a heal is in progress. This was leading to spurious entries being shown in
    heal-info.
    
    Fix:
    As long as there is afr-v1 way of locking, we can't fix this problem with
    simple solutions.  If we know that the cluster is running newer versions of
    locking schemes, in those cases we can give accurate information in heal-info.
    So introduce a new option called 'locking-scheme' which if it is 'granular'
    will give correct information in heal-info. Not only that, Extra network hops
    for taking compatibility locks, sh-domain locks in heal info will not be
    necessary anymore. Thus it improves performance.
    
    BUG: 1322850
    Change-Id: Ia563c5f096b5922009ff0ec1c42d969d55d827a3
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/13873
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Ashish Pandey <aspandey>
    Reviewed-by: Anuradha Talur <atalur>
    Reviewed-by: Krutika Dhananjay <kdhananj>

Comment 5 Niels de Vos 2016-06-16 14:02:34 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user