Bug 1334566 - Self heal shows different information for the same volume from each node
Summary: Self heal shows different information for the same volume from each node
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: 3.7.9
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1335429 1335433 1335437
TreeView+ depends on / blocked
 
Reported: 2016-05-10 03:23 UTC by Paul Cuzner
Modified: 2016-06-28 12:17 UTC (History)
7 users (show)

Fixed In Version: glusterfs-3.7.12
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1335429 1335433 1335437 (view as bug list)
Environment:
Last Closed: 2016-06-28 12:17:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
vol heal output from each node (2.41 KB, text/plain)
2016-05-10 03:23 UTC, Paul Cuzner
no flags Details

Description Paul Cuzner 2016-05-10 03:23:39 UTC
Created attachment 1155512 [details]
vol heal output from each node

Description of problem:
I noticed that nagios was showing heal required in the BAGL test environment, but when I checked on node gprfc085, self heal was 0.

However, I ran the following 

for node in gprfc085 gprfc086 gprfc087; do pssh -P -t 60 -H $node 'date; gluster vol heal engine info ; sleep 1'; done

and could see that on node 85, self heal was 0 but the other two nodes show shards listed.

Trying to understand why...I did note that for some reason cluster.data-self-heal/entry-self-heal and meta

To date, this issue is ONLY against the 'engine' volume, which is sharded volume and has the hosted_engine vm running on node '86


Version-Release number of selected component (if applicable):


How reproducible:
Each time.

Steps to Reproduce:
1. Run vol heal commands on each node at around the same time
2. 
3.

Actual results:
1 node shows the volume is clean, the other 2 invariably report shards in the heal list.

Expected results:
I would expect all nodes to have the same view of heal state

Additional info:
output attached
glusterfs-3.7.9-3 build

Comment 1 Sahina Bose 2016-05-11 14:57:22 UTC
Krutika, can you take a look?

Comment 2 Vijay Bellur 2016-05-12 08:52:00 UTC
REVIEW: http://review.gluster.org/14304 (cluster/afr: Handle non-zero source in heal-info decision) posted (#1) for review on release-3.7 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Krutika Dhananjay 2016-05-12 08:54:18 UTC
Nice catch, Paul! :)

Comment 4 Anuradha 2016-05-12 19:04:27 UTC
Moving to post as Pranith sent the fix.

Comment 5 Vijay Bellur 2016-05-12 22:29:55 UTC
COMMIT: http://review.gluster.org/14304 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) 
------
commit 1d6852f2f892792b4bb54c24be2681f57f77d7fc
Author: Pranith Kumar K <pkarampu>
Date:   Thu May 12 13:55:44 2016 +0530

    cluster/afr: Handle non-zero source in heal-info decision
    
            Backport of http://review.gluster.org/14302
    
    Problem:
    Spurious entries are reported in heal info when the mount is on second/third
    brick of the replica pair because local-child is given preference in selecting
    source. The code is supposed to suggest the file needs heal if the (source < 0)
    (failure code path), but instead it is written as if any non-zero value
    is considered failure.
    
    Fix:
    Treat +ve source as success case
    
    BUG: 1334566
    Change-Id: Iac6d68cc429496756a9d8f6e21e71aa5f6b932ee
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/14304
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Ravishankar N <ravishankar>
    Reviewed-by: Anuradha Talur <atalur>
    CentOS-regression: Gluster Build System <jenkins.com>

Comment 6 Kaushal 2016-06-28 12:17:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report.

glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.