Description of problem: 1. create a 1x3 replica using a 3 node cluster 2. Kill one brick, run 'gluster vol heal <volname>` RCA: If any of the bricks is down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. If op_ret is non zero, CLI prints "Launching heal...unsuccessful". For the bricks that are up and need heal, the healing happens without any issues. A reasonable fix seems to be to print a more meaningful message on the CLI like "Launching heal operation to perform index self heal on volume vol0 has not been been successful on all nodes. Please check if all brick processes are running."
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar)
REVIEW: http://review.gluster.org/13435 (cli/ afr: op_ret for index heal launch) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar)
COMMIT: http://review.gluster.org/13435 committed in release-3.7 by Pranith Kumar Karampuri (pkarampu) ------ commit 45301bcd97825206f7f19b25a4ad722e7dc13cc6 Author: Ravishankar N <ravishankar> Date: Mon Jan 18 12:16:31 2016 +0000 cli/ afr: op_ret for index heal launch Backport of http://review.gluster.org/#/c/13303/ Problem: If index heal is launched when some of the bricks are down, glustershd of that node sends a -1 op_ret to glusterd which eventually propagates it to the CLI. Also, glusterd sometimes sends an err_str and sometimes not (depending on the failure happening in the brick-op phase or commit-op phase). So the message that gets displayed varies in each case: "Launching heal operation to perform index self heal on volume testvol has been unsuccessful" (OR) "Commit failed on <host>. Please check log file for details." Fix: 1. Modify afr_xl_op() to return -1 even if index healing of atleast one brick fails. 2. Ignore glusterd's error string in gf_cli_heal_volume_cbk and print a more meaningful message. The patch also fixes a bug in glusterfs_handle_translator_op() where if we encounter an error in notify of one xlator, we break out of the loop instead of sending the notify to other xlators. Change-Id: I957f6c4b4d0a45453ffd5488e425cab5a3e0acca BUG: 1306922 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: http://review.gluster.org/13435 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.9, please open a new bug report. glusterfs-3.7.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-users/2016-March/025922.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user