Description of problem:
On a 5 node CNS setup, volume creation fails when glusterd is stopped on 2 nodes.
sh-4.2# heketi-cli volume create --size=2
Error: Unable to execute command on glusterfs-storage-djxwn: volume create: vol_83f171a561341a55c5ac087510ae0aa2: failed: Host 10.70.46.45 not connected
Version-Release number of selected component (if applicable):
rpm -qa | grep 'heketi'
1/1 - This issue should be consistently reproducible
Steps to Reproduce:
1. create a 5 node cns cluster
2. stop glusterd service on any 2 random nodes - oc rsh <gluster pod> and systemctl stop glusterd
3. create volume using heketi
volume creation fails
volume creation should succeed
heketi logs shall be attached
Created attachment 1420000 [details]
The fix is to have node monitoring on always. Hence the latest container should work without any addition/change in the ENV.
heketi monitoring is enabled by default with rhgs-volmanager-container-3.3.1-8.4
[heketi] INFO 2018/04/16 09:27:29 Loaded kubernetes executor
[heketi] INFO 2018/04/16 09:27:29 Block: Auto Create Block Hosting Volume set to true
[heketi] INFO 2018/04/16 09:27:29 Block: New Block Hosting Volume size 100 GB
[heketi] INFO 2018/04/16 09:27:29 GlusterFS Application Loaded
[heketi] INFO 2018/04/16 09:27:29 Started Node Health Cache Monitor
Listening on port 8080
Moving the bug to verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.