1335429 – Self heal shows different information for the same volume from each node

Bug 1335429 - Self heal shows different information for the same volume from each node

Summary: Self heal shows different information for the same volume from each node

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1334566 1335433
Blocks:
TreeView+	depends on / blocked

Reported:	2016-05-12 08:39 UTC by Pranith Kumar K
Modified:	2017-03-27 18:21 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.9.0
Clone Of:	1334566
Environment:
Last Closed:	2017-03-27 18:21:34 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pranith Kumar K 2016-05-12 08:39:33 UTC

+++ This bug was initially created as a clone of Bug #1334566 +++

Description of problem:
I noticed that nagios was showing heal required in the BAGL test environment, but when I checked on node gprfc085, self heal was 0.

However, I ran the following 

for node in gprfc085 gprfc086 gprfc087; do pssh -P -t 60 -H $node 'date; gluster vol heal engine info ; sleep 1'; done

and could see that on node 85, self heal was 0 but the other two nodes show shards listed.

Trying to understand why...I did note that for some reason cluster.data-self-heal/entry-self-heal and meta

To date, this issue is ONLY against the 'engine' volume, which is sharded volume and has the hosted_engine vm running on node '86


Version-Release number of selected component (if applicable):


How reproducible:
Each time.

Steps to Reproduce:
1. Run vol heal commands on each node at around the same time
2. 
3.

Actual results:
1 node shows the volume is clean, the other 2 invariably report shards in the heal list.

Expected results:
I would expect all nodes to have the same view of heal state

Additional info:
output attached
glusterfs-3.7.9-3 build

--- Additional comment from Sahina Bose on 2016-05-11 10:57:22 EDT ---

Krutika, can you take a look?

Comment 1 Vijay Bellur 2016-05-12 08:40:48 UTC

REVIEW: http://review.gluster.org/14302 (cluster/afr: Handle non-zero source in heal-info decision) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anuradha 2016-05-12 19:00:17 UTC

Moving this to post as patch was sent.

Comment 3 Nicolas Ecarnot 2016-05-12 20:02:07 UTC

Hi,

Just to understand basically : is this bug harmful to our data?

Comment 4 Vijay Bellur 2016-05-12 22:29:37 UTC

COMMIT: http://review.gluster.org/14302 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 7dc5d73410f0e9f846c593887637001ca43bc4a0
Author: Pranith Kumar K <pkarampu>
Date:   Thu May 12 13:55:44 2016 +0530

    cluster/afr: Handle non-zero source in heal-info decision
    
    Problem:
    Spurious entries are reported in heal info when the mount is on second/third
    brick of the replica pair because local-child is given preference in selecting
    source. The code is supposed to suggest the file needs heal if the (source < 0)
    (failure code path), but instead it is written as if any non-zero value
    is considered failure.
    
    Fix:
    Treat +ve source as success case
    
    BUG: 1335429
    Change-Id: I1be7f9defef2ae03be7eec8d7d49bf34adeca82c
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14302
    Reviewed-by: Krutika Dhananjay <kdhananj>
    Reviewed-by: Anuradha Talur <atalur>
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 5 Pranith Kumar K 2016-05-12 22:43:14 UTC

(In reply to Nicolas Ecarnot from comment #3)
> Hi,
> 
> Just to understand basically : is this bug harmful to our data?

Not at all. It is just wrong reporting.

Comment 6 Shyamsundar 2017-03-27 18:21:34 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report.

glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.