Bug 1306922 - Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"
Self heal command gives error "Launching heal operation to perform index self...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.7
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Ravishankar N
: Triaged
Depends On: 1200252 1294612 1302291
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-12 03:26 EST by Ravishankar N
Modified: 2016-04-19 03:24 EDT (History)
1 user (show)

See Also:
Fixed In Version: glusterfs-3.7.9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1302291
Environment:
Last Closed: 2016-04-19 03:24:46 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ravishankar N 2016-02-12 03:26:32 EST
Description of problem:
1. create a 1x3 replica using a 3 node cluster
2. Kill one brick, run 'gluster vol heal <volname>`

RCA:
If any of the bricks is down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. If op_ret is non zero, CLI prints "Launching heal...unsuccessful". For the bricks that are up and need heal, the healing happens without any issues.

A reasonable fix seems to be to print a more meaningful message on the CLI like "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."
Comment 1 Vijay Bellur 2016-02-12 03:30:14 EST
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)
Comment 2 Vijay Bellur 2016-02-17 00:04:48 EST
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)
Comment 3 Vijay Bellur 2016-02-17 04:53:55 EST
COMMIT: http://review.gluster.org/13435 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit 45301bcd97825206f7f19b25a4ad722e7dc13cc6
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Mon Jan 18 12:16:31 2016 +0000

    cli/ afr: op_ret for index heal launch
    
    Backport of http://review.gluster.org/#/c/13303/
    
    Problem:
    If index heal is launched when some of the bricks are down, glustershd of that
    node sends a -1 op_ret to glusterd which eventually propagates it to the CLI.
    Also, glusterd sometimes sends an err_str and sometimes not (depending on the
    failure happening in the brick-op phase or commit-op phase). So the message that
    gets displayed varies in each case:
    
    "Launching heal operation to perform index self heal on volume testvol has been
    unsuccessful"
                    (OR)
    "Commit failed on <host>. Please check log file for details."
    
    Fix:
    1. Modify afr_xl_op() to return -1 even if index healing of atleast one brick
    fails.
    2. Ignore glusterd's error string in gf_cli_heal_volume_cbk and print a more
    meaningful message.
    
    The patch also fixes a bug in glusterfs_handle_translator_op() where if we
    encounter an error in notify of one xlator, we break out of the loop instead of
    sending the notify to other xlators.
    
    Change-Id: I957f6c4b4d0a45453ffd5488e425cab5a3e0acca
    BUG: 1306922
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reviewed-on: http://review.gluster.org/13435
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Comment 4 Kaushal 2016-04-19 03:24:46 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.

glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.