Bug 1487645 - block-volume creation fails when one of the node is down in a 3 node RHGS cluster
Summary: block-volume creation fails when one of the node is down in a 3 node RHGS clu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.6
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1445448
TreeView+ depends on / blocked
 
Reported: 2017-09-01 13:26 UTC by krishnaram Karthick
Modified: 2019-01-12 13:46 UTC (History)
15 users (show)

Fixed In Version: heketi-5.0.0-12 rhgs-volmanager-docker-5.0.0-16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-11 07:09:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2879 0 normal SHIPPED_LIVE heketi bug fix and enhancement update 2017-10-11 11:07:06 UTC

Description krishnaram Karthick 2017-09-01 13:26:30 UTC
Description of problem:
When one node is down in a 3 node CRS configuration, and a PVC request is made with HA as 2, block-device creation should succeed. However, it fails and the pvc remains in pending state.

# oc get pvc/test-03
NAME      STATUS    VOLUME    CAPACITY   ACCESSMODES   STORAGECLASS     AGE
test-03   Pending                                      glusterblockip   5m

snippet from heketi logs:

[sshexec] ERROR 2017/09/01 13:19:28 /src/github.com/heketi/heketi/pkg/utils/ssh/ssh.go:173: Failed to run command [/bin/bash -c 'gluster-block create vol_314b6b07b1e82a528a7bd1e2d2d00d20/blockvol_22433e831468f01ec243fb0f06ac4897  ha 2 auth enable  10.70.46.1,10.70.47.105 1G --json'] on dhcp47-25.lab.eng.blr.redhat.com:22: Err[Process exited with status 255]: Stdout [{ "RESULT": "FAIL", "errCode": 255, "errMsg": "failed to configure on 10.70.47.105 : No route to host" }

This issue was seen in CRS configuration, but should be seen in CNS as well.

Version-Release number of selected component (if applicable):
cns-deploy-5.0.0-25.el7rhgs.x86_64

How reproducible:
Always

Steps to Reproduce:
1. create a 3 node CRS setup with heketi in openshift
2. fail one of the node
3. Try to create a blockdevice with ha:2

Actual results:
block-volume fails to create

Expected results:
block-volume should succeed as there are 2 node available for this request to complete

Additional info:

Comment 3 Humble Chirammal 2017-09-14 11:40:36 UTC
The next heketi build will have the fix and as soon as  build is available , I we will move this to ON_QA.

Comment 4 Raghavendra Talur 2017-09-17 20:56:18 UTC
Although, the patch is in the build, this is not going to work as expected with  kubernetes/openshift because of how labels are provided. Will send another patch on top to eliminate the problem.

Comment 5 Humble Chirammal 2017-09-19 06:26:18 UTC
This is fixed in cns-deploy v45.

Comment 6 krishnaram Karthick 2017-09-19 11:38:55 UTC
Verified in build - cns-deploy-5.0.0-46.el7rhgs.x86_64

When one node is brought down on a 3 node TSP and block device is created with ha count 2, block device gets created.

However, it takes ~3 minutes to create the block device when one node is down. I'll raise a separate bug to bring this time delay to a lower value. 

For now, this bug shall be moved to verified.

Comment 7 errata-xmlrpc 2017-10-11 07:09:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879


Note You need to log in before you can comment on or make changes to this bug.