Descriptionkrishnaram Karthick
2017-09-01 13:26:30 UTC
Description of problem:
When one node is down in a 3 node CRS configuration, and a PVC request is made with HA as 2, block-device creation should succeed. However, it fails and the pvc remains in pending state.
# oc get pvc/test-03
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
test-03 Pending glusterblockip 5m
snippet from heketi logs:
[sshexec] ERROR 2017/09/01 13:19:28 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:173: Failed to run command [/bin/bash -c 'gluster-block create vol_314b6b07b1e82a528a7bd1e2d2d00d20/blockvol_22433e831468f01ec243fb0f06ac4897 ha 2 auth enable 10.70.46.1,10.70.47.105 1G --json'] on dhcp47-25.lab.eng.blr.redhat.com:22: Err[Process exited with status 255]: Stdout [{ "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.47.105 : No route to host" }
This issue was seen in CRS configuration, but should be seen in CNS as well.
Version-Release number of selected component (if applicable):
cns-deploy-5.0.0-25.el7rhgs.x86_64
How reproducible:
Always
Steps to Reproduce:
1. create a 3 node CRS setup with heketi in openshift
2. fail one of the node
3. Try to create a blockdevice with ha:2
Actual results:
block-volume fails to create
Expected results:
block-volume should succeed as there are 2 node available for this request to complete
Additional info:
The next heketi build will have the fix and as soon as build is available , I we will move this to ON_QA.
Comment 4Raghavendra Talur
2017-09-17 20:56:18 UTC
Although, the patch is in the build, this is not going to work as expected with kubernetes/openshift because of how labels are provided. Will send another patch on top to eliminate the problem.
Comment 6krishnaram Karthick
2017-09-19 11:38:55 UTC
Verified in build - cns-deploy-5.0.0-46.el7rhgs.x86_64
When one node is brought down on a 3 node TSP and block device is created with ha count 2, block device gets created.
However, it takes ~3 minutes to create the block device when one node is down. I'll raise a separate bug to bring this time delay to a lower value.
For now, this bug shall be moved to verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2017:2879
Description of problem: When one node is down in a 3 node CRS configuration, and a PVC request is made with HA as 2, block-device creation should succeed. However, it fails and the pvc remains in pending state. # oc get pvc/test-03 NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE test-03 Pending glusterblockip 5m snippet from heketi logs: [sshexec] ERROR 2017/09/01 13:19:28 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:173: Failed to run command [/bin/bash -c 'gluster-block create vol_314b6b07b1e82a528a7bd1e2d2d00d20/blockvol_22433e831468f01ec243fb0f06ac4897 ha 2 auth enable 10.70.46.1,10.70.47.105 1G --json'] on dhcp47-25.lab.eng.blr.redhat.com:22: Err[Process exited with status 255]: Stdout [{ "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.47.105 : No route to host" } This issue was seen in CRS configuration, but should be seen in CNS as well. Version-Release number of selected component (if applicable): cns-deploy-5.0.0-25.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. create a 3 node CRS setup with heketi in openshift 2. fail one of the node 3. Try to create a blockdevice with ha:2 Actual results: block-volume fails to create Expected results: block-volume should succeed as there are 2 node available for this request to complete Additional info: