Description of problem: When one node is down in a 3 node CRS configuration, and a PVC request is made with HA as 2, block-device creation should succeed. However, it fails and the pvc remains in pending state. # oc get pvc/test-03 NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE test-03 Pending glusterblockip 5m snippet from heketi logs: [sshexec] ERROR 2017/09/01 13:19:28 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:173: Failed to run command [/bin/bash -c 'gluster-block create vol_314b6b07b1e82a528a7bd1e2d2d00d20/blockvol_22433e831468f01ec243fb0f06ac4897 ha 2 auth enable 10.70.46.1,10.70.47.105 1G --json'] on dhcp47-25.lab.eng.blr.redhat.com:22: Err[Process exited with status 255]: Stdout [{ "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.47.105 : No route to host" } This issue was seen in CRS configuration, but should be seen in CNS as well. Version-Release number of selected component (if applicable): cns-deploy-5.0.0-25.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a 3 node CRS setup with heketi in openshift 2. fail one of the node 3. Try to create a blockdevice with ha:2 Actual results: block-volume fails to create Expected results: block-volume should succeed as there are 2 node available for this request to complete Additional info:
The next heketi build will have the fix and as soon as build is available , I we will move this to ON_QA.
Although, the patch is in the build, this is not going to work as expected with kubernetes/openshift because of how labels are provided. Will send another patch on top to eliminate the problem.
This is fixed in cns-deploy v45.
Verified in build - cns-deploy-5.0.0-46.el7rhgs.x86_64 When one node is brought down on a 3 node TSP and block device is created with ha count 2, block device gets created. However, it takes ~3 minutes to create the block device when one node is down. I'll raise a separate bug to bring this time delay to a lower value. For now, this bug shall be moved to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2879