Red Hat Bugzilla – Bug 1306922
Self heal command gives error "Launching heal operation to perform index self heal on volume vol0 has been unsuccessful"
Last modified: 2016-04-19 03:24:46 EDT
Description of problem:
1. create a 1x3 replica using a 3 node cluster
2. Kill one brick, run 'gluster vol heal <volname>`
If any of the bricks is down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. If op_ret is non zero, CLI prints "Launching heal...unsuccessful". For the bricks that are up and need heal, the healing happens without any issues.
A reasonable fix seems to be to print a more meaningful message on the CLI like "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#1) for review on release-3.7 by Ravishankar N (email@example.com)
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#2) for review on release-3.7 by Ravishankar N (firstname.lastname@example.org)
COMMIT: http://review.gluster.org/13435 committed in release-3.7 by Pranith Kumar Karampuri (email@example.com)
Author: Ravishankar N <firstname.lastname@example.org>
Date: Mon Jan 18 12:16:31 2016 +0000
cli/ afr: op_ret for index heal launch
Backport of http://review.gluster.org/#/c/13303/
If index heal is launched when some of the bricks are down, glustershd of that
node sends a -1 op_ret to glusterd which eventually propagates it to the CLI.
Also, glusterd sometimes sends an err_str and sometimes not (depending on the
failure happening in the brick-op phase or commit-op phase). So the message that
gets displayed varies in each case:
"Launching heal operation to perform index self heal on volume testvol has been
"Commit failed on <host>. Please check log file for details."
1. Modify afr_xl_op() to return -1 even if index healing of atleast one brick
2. Ignore glusterd's error string in gf_cli_heal_volume_cbk and print a more
The patch also fixes a bug in glusterfs_handle_translator_op() where if we
encounter an error in notify of one xlator, we break out of the loop instead of
sending the notify to other xlators.
Signed-off-by: Ravishankar N <email@example.com>
Smoke: Gluster Build System <firstname.lastname@example.org>
NetBSD-regression: NetBSD Build System <email@example.com>
CentOS-regression: Gluster Build System <firstname.lastname@example.org>
Reviewed-by: Pranith Kumar Karampuri <email@example.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report.
glusterfs-3.7.9 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.