1335437 – Self heal shows different information for the same volume from each node

Bug 1335437 - Self heal shows different information for the same volume from each node

Summary: Self heal shows different information for the same volume from each node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	sharding
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Pranith Kumar K
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:	1334566
Blocks:	Gluster-HC-1 1311817
TreeView+	depends on / blocked

Reported:	2016-05-12 08:45 UTC by Sahina Bose
Modified:	2016-06-23 05:23 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.7.9-5
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1334566
Environment:
Last Closed:	2016-06-23 05:23:05 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Sahina Bose 2016-05-12 08:45:11 UTC

+++ This bug was initially created as a clone of Bug #1334566 +++

Description of problem:
I noticed that nagios was showing heal required in the BAGL test environment, but when I checked on node gprfc085, self heal was 0.

However, I ran the following 

for node in gprfc085 gprfc086 gprfc087; do pssh -P -t 60 -H $node 'date; gluster vol heal engine info ; sleep 1'; done

and could see that on node 85, self heal was 0 but the other two nodes show shards listed.

Trying to understand why...I did note that for some reason cluster.data-self-heal/entry-self-heal and meta

To date, this issue is ONLY against the 'engine' volume, which is sharded volume and has the hosted_engine vm running on node '86


Version-Release number of selected component (if applicable):


How reproducible:
Each time.

Steps to Reproduce:
1. Run vol heal commands on each node at around the same time
2. 
3.

Actual results:
1 node shows the volume is clean, the other 2 invariably report shards in the heal list.

Expected results:
I would expect all nodes to have the same view of heal state

Additional info:
output attached
glusterfs-3.7.9-3 build

--- Additional comment from Sahina Bose on 2016-05-11 10:57:22 EDT ---

Krutika, can you take a look?

Comment 6 RamaKasturi 2016-06-03 13:09:16 UTC

Verified and works fine with build glusterfs-3.7.9-6.el7rhgs.x86_64.

Brought down one of the brick in the data volume where fio is running and brought it up back after some time so that self heal kicks in. Ran the script "for node in <node1> <node2> <node3>; do pssh -P -t 60 -H $node 'date; gluster vol heal data info ; sleep 1'; done.

Verified that undergoing and unsyncedentries of  Volume heal info - data  in nagios and the script returns the same values. 

In nagios when Volume heal info -data displaying '0' heal info from all the nodes returns '0'

Will reopen the bug if i hit the issue again.

Comment 8 errata-xmlrpc 2016-06-23 05:23:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.