Description of problem: On a 5 node CNS setup, volume creation fails when glusterd is stopped on 2 nodes. sh-4.2# heketi-cli volume create --size=2 Error: Unable to execute command on glusterfs-storage-djxwn: volume create: vol_83f171a561341a55c5ac087510ae0aa2: failed: Host 10.70.46.45 not connected Version-Release number of selected component (if applicable): rpm -qa | grep 'heketi' heketi-6.0.0-7.2.el7rhgs.x86_64 python-heketi-6.0.0-7.2.el7rhgs.x86_64 heketi-client-6.0.0-7.2.el7rhgs.x86_64 How reproducible: 1/1 - This issue should be consistently reproducible Steps to Reproduce: 1. create a 5 node cns cluster 2. stop glusterd service on any 2 random nodes - oc rsh <gluster pod> and systemctl stop glusterd 3. create volume using heketi Actual results: volume creation fails Expected results: volume creation should succeed Additional info: heketi logs shall be attached
Created attachment 1420000 [details] heketi_logs
The fix is to have node monitoring on always. Hence the latest container should work without any addition/change in the ENV.
heketi monitoring is enabled by default with rhgs-volmanager-container-3.3.1-8.4 [heketi] INFO 2018/04/16 09:27:29 Loaded kubernetes executor [heketi] INFO 2018/04/16 09:27:29 Block: Auto Create Block Hosting Volume set to true [heketi] INFO 2018/04/16 09:27:29 Block: New Block Hosting Volume size 100 GB [heketi] INFO 2018/04/16 09:27:29 GlusterFS Application Loaded [heketi] INFO 2018/04/16 09:27:29 Started Node Health Cache Monitor Listening on port 8080 Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1178