+++ This bug was initially created as a clone of Bug #1334566 +++ Description of problem: I noticed that nagios was showing heal required in the BAGL test environment, but when I checked on node gprfc085, self heal was 0. However, I ran the following for node in gprfc085 gprfc086 gprfc087; do pssh -P -t 60 -H $node 'date; gluster vol heal engine info ; sleep 1'; done and could see that on node 85, self heal was 0 but the other two nodes show shards listed. Trying to understand why...I did note that for some reason cluster.data-self-heal/entry-self-heal and meta To date, this issue is ONLY against the 'engine' volume, which is sharded volume and has the hosted_engine vm running on node '86 Version-Release number of selected component (if applicable): How reproducible: Each time. Steps to Reproduce: 1. Run vol heal commands on each node at around the same time 2. 3. Actual results: 1 node shows the volume is clean, the other 2 invariably report shards in the heal list. Expected results: I would expect all nodes to have the same view of heal state Additional info: output attached glusterfs-3.7.9-3 build --- Additional comment from Sahina Bose on 2016-05-11 10:57:22 EDT --- Krutika, can you take a look?
Verified and works fine with build glusterfs-3.7.9-6.el7rhgs.x86_64. Brought down one of the brick in the data volume where fio is running and brought it up back after some time so that self heal kicks in. Ran the script "for node in <node1> <node2> <node3>; do pssh -P -t 60 -H $node 'date; gluster vol heal data info ; sleep 1'; done. Verified that undergoing and unsyncedentries of Volume heal info - data in nagios and the script returns the same values. In nagios when Volume heal info -data displaying '0' heal info from all the nodes returns '0' Will reopen the bug if i hit the issue again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240