Bug 1434673

Summary: cns-deploy throws a fake error after it has setup heketi successfully
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: cns-deploy-toolAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED ERRATA QA Contact: Apeksha <akhakhar>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.5CC: akhakhar, annair, hchiramm, jarrpa, madam, mliyazud, mzywusko, pprakash, rreddy, rtalur, sselvan
Target Milestone: ---Keywords: Regression
Target Release: CNS 3.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cns-deploy-4.0.0-13.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-20 18:28:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1415600    
Attachments:
Description Flags
log file of cns-deploy none

Description krishnaram Karthick 2017-03-22 05:56:19 UTC
Created attachment 1265308 [details]
log file of cns-deploy

Description of problem:

cns-deploy throws the following error after setting up gluster pods and heketi successfully. 

snpippet of error message:

Determining heketi service URL ... OK
Failed to communicate with heketi service.
Please verify that a router has been properly configured.

complete log file is attached to the bug.

[root@dhcp46-202 ~]# oc get pods -o wide
NAME                             READY     STATUS    RESTARTS   AGE       IP             NODE
glusterfs-hcp7j                  1/1       Running   0          13h       10.70.46.165   dhcp46-165.lab.eng.blr.redhat.com
glusterfs-jg4kw                  1/1       Running   0          13h       10.70.47.21    dhcp47-21.lab.eng.blr.redhat.com
glusterfs-vx1s0                  1/1       Running   0          13h       10.70.47.51    dhcp47-51.lab.eng.blr.redhat.com
heketi-1-wgdc8                   1/1       Running   0          13h       10.131.0.4     dhcp47-180.lab.eng.blr.redhat.com
mongodb-1-1-kzkrt                1/1       Running   0          8m        10.130.0.4     dhcp47-78.lab.eng.blr.redhat.com
storage-project-router-1-2l40s   1/1       Running   0          13h       10.70.47.65    dhcp47-65.lab.eng.blr.redhat.com



Version-Release number of selected component (if applicable):
heketi-client-4.0.0-3.el7rhgs.x86_64
cns-deploy-4.0.0-6.el7rhgs.x86_64


How reproducible:
This is quite consistently seen

Steps to Reproduce:
1. On a fresh openshift cluster, run cns-deploy

Actual results:
Following error message is thrown towards the end of cns-deploy although heketi configuration is successful

Failed to communicate with heketi service.
Please verify that a router has been properly configured.

Expected results:
These messages should not be seen

Additional info:
The issue seen here is a regression as we had never seen this message during 3.4 on successful completion of cns-deploy

Comment 5 Apeksha 2017-03-22 10:44:54 UTC
Added this 2 debug statements in cns_deploy script:
echo "service endpoint ${heketi_service}"
${CLI} describe routes/heketi

Output of cns_deploy command: http://pastebin.test.redhat.com/467105

So, the endpoint is none:
Name:			heketi
Namespace:		aplo
Created:		3 seconds ago
Labels:			glusterfs=heketi-route
			template=heketi
Annotations:		openshift.io/host.generated=true
Requested Host:		heketi-aplo.cloudapps.myaplo.com
			  exposed on router aplo-router 4 seconds ago
Path:			<none>
TLS Termination:	<none>
Insecure Policy:	<none>
Endpoint Port:		<all endpoint ports>

Service:	heketi
Weight:		100 (100%)
Endpoints:	<none>


After a minute i see that the endpoints gets updated:

[root@rhsauto049 ~]# oc describe routes/heketi
Name:			heketi
Namespace:		aplo
Created:		2 minutes ago
Labels:			glusterfs=heketi-route
			template=heketi
Annotations:		openshift.io/host.generated=true
Requested Host:		heketi-aplo.cloudapps.myaplo.com
			  exposed on router aplo-router 2 minutes ago
Path:			<none>
TLS Termination:	<none>
Insecure Policy:	<none>
Endpoint Port:		<all endpoint ports>

Service:	heketi
Weight:		100 (100%)
Endpoints:	10.131.0.3:8080

Comment 6 Mohamed Ashiq 2017-03-22 11:16:54 UTC
Patch upstream for the same:

https://github.com/gluster/gluster-kubernetes/pull/208

Comment 11 Ramakrishna Reddy Yekulla 2017-03-24 16:22:15 UTC
 git show 7dd195f8
commit 7dd195f8ad1396e933daa2957bba5159fa0d29f5
Author: Mohamed Ashiq Liyazudeen <mliyazud>
Date:   Wed Mar 22 16:41:52 2017 +0530

    Should not proceed if endpoint is none for heketi service
    
    Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud>

diff --git a/deploy/gk-deploy b/deploy/gk-deploy
index 3e09cf0..0f138e5 100755
--- a/deploy/gk-deploy
+++ b/deploy/gk-deploy
@@ -465,11 +465,10 @@ output "OK"
 
 heketi_service=""
 debug -n "Determining heketi service URL ... "
-while [[ "x${heketi_service}" == "x" ]]; do
-  if [[ "${CLI}" == *oc\ * ]]; then
-    heketi_service=$(${CLI} describe routes/deploy-heketi | grep "Requested Host:" | awk '{print $3}')
-  else
-    heketi_service=$(${CLI} describe svc/deploy-heketi | grep "Endpoints:" | awk '{print $2}')
+while [[ "x${heketi_service}" == "x" ]] || [[ "${heketi_service}" == "<none>" ]]; do
+  heketi_service=$(${CLI} describe svc/deploy-heketi | grep "Endpoints:" | awk '{print $2}')
+  if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then
+      heketi_service=$(${CLI} describe routes/deploy-heketi | grep "Requested Host:" | awk '{print $3}')
   fi
   sleep 1
 done
@@ -528,11 +527,10 @@ output "OK"
 
 heketi_service=""
 debug -n "Determining heketi service URL ... "
-while [[ "x${heketi_service}" == "x" ]]; do
-  if [[ "${CLI}" == *oc\ * ]]; then
-    heketi_service=$(${CLI} describe routes/heketi | grep "Requested Host:" | awk '{print $3}')
-  else
-    heketi_service=$(${CLI} describe svc/heketi | grep "Endpoints:" | awk '{print $2}')
+while [[ "x${heketi_service}" == "x" ]] || [[ "${heketi_service}" == "<none>" ]]; do
+  heketi_service=$(${CLI} describe svc/heketi | grep "Endpoints:" | awk '{print $2}')
+  if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then
+      heketi_service=$(${CLI} describe routes/heketi | grep "Requested Host:" | awk '{print $3}')
   fi
   sleep 1
 done


========

RPMDIFF is failing with the following message:
---------------------------------------------
 	

usr/bin/cns-deploy is no longer a valid /bin/bash script on x86_64:
/usr/bin/cns-deploy: line 492: conditional binary operator expected
/usr/bin/cns-deploy: line 492: syntax error near `"${CLI}"'
/usr/bin/cns-deploy: line 492: `  if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then'


=========

Comment 12 Mohamed Ashiq 2017-03-26 19:07:29 UTC
My Bad. Sorry for the trouble. Addressed the issue Upstream.

https://github.com/gluster/gluster-kubernetes/pull/212

Comment 13 Mohamed Ashiq 2017-03-26 22:37:39 UTC
(In reply to Mohamed Ashiq from comment #12)
> My Bad. Sorry for the trouble. Addressed the issue Upstream.
> 
> https://github.com/gluster/gluster-kubernetes/pull/212

Merged Upstream. Thanks Obnox. Ramky can you trigger a build with the PR.

Comment 15 Raghavendra Talur 2017-04-03 15:27:49 UTC
patch at https://github.com/gluster/gluster-kubernetes/pull/230

Comment 16 Prasanth 2017-04-04 08:03:17 UTC
Verified as fixed in cns-deploy-4.0.0-12.el7rhgs

Comment 19 Apeksha 2017-04-05 06:55:07 UTC
Cns-deploy fails when run from client, on build - cns-deploy-4.0.0-12.el7rhgs

Comment 20 Raghavendra Talur 2017-04-05 12:13:43 UTC
patch upstream at https://github.com/gluster/gluster-kubernetes/pull/233

Comment 21 Apeksha 2017-04-10 12:21:54 UTC
Cns_deploy, works from client on build : cns-deploy-4.0.0-13.el7rhgs.x86_64

Output of cns_deploy command:

Using OpenShift CLI.
NAME              STATUS    AGE
storage-project   Active    1h
Using namespace "storage-project".
Checking that heketi pod is not running ... 
Checking status of pods matching 'glusterfs=heketi-pod':
No resources found.
Timed out waiting for pods matching 'glusterfs=heketi-pod'.
OK
template "deploy-heketi" created
serviceaccount "heketi-service-account" created
template "heketi" created
template "glusterfs" created
role "edit" added: "system:serviceaccount:storage-project:heketi-service-account"
Marking 'dhcp46-205.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-205.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-127.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-127.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-108.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-108.lab.eng.blr.redhat.com" labeled
Deploying GlusterFS pods.
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ... 
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-04l4t   1/1       Running   0         1m
glusterfs-9hd1t   1/1       Running   0         1m
glusterfs-tdzpg   1/1       Running   0         1m
OK
service "deploy-heketi" created
route "deploy-heketi" created
deploymentconfig "deploy-heketi" created
Waiting for deploy-heketi pod to start ... 
Checking status of pods matching 'glusterfs=heketi-pod':
deploy-heketi-1-vsbcp   1/1       Running   0         52s
OK
Determining heketi service URL ... OK
Creating cluster ... ID: 4ede397bbe0563850bf37e14d77e726e
Creating node dhcp46-205.lab.eng.blr.redhat.com ... ID: da847efddcdbcadb93bb343772ecb090
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
Creating node dhcp46-127.lab.eng.blr.redhat.com ... ID: 09e92fc0ae6693f834de2e8d485dfb87
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
Creating node dhcp46-108.lab.eng.blr.redhat.com ... ID: d159d21af107268cfe5bd90fd7226879
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
heketi topology loaded.
Saving heketi-storage.json
secret "heketi-storage-secret" created
endpoints "heketi-storage-endpoints" created
service "heketi-storage-endpoints" created
job "heketi-storage-copy-job" created

Checking status of pods matching 'job-name=heketi-storage-copy-job':
heketi-storage-copy-job-gcxkl   0/1       Completed   0         5s
deploymentconfig "deploy-heketi" deleted
route "deploy-heketi" deleted
service "deploy-heketi" deleted
job "heketi-storage-copy-job" deleted
pod "deploy-heketi-1-vsbcp" deleted
secret "heketi-storage-secret" deleted
service "heketi" created
route "heketi" created
deploymentconfig "heketi" created
Waiting for heketi pod to start ... 
Checking status of pods matching 'glusterfs=heketi-pod':
deploy-heketi-1-vsbcp   1/1       Terminating   0         1m
OK
Determining heketi service URL ... OK
heketi is now running.
Ready to create and provide GlusterFS volumes.

Comment 22 errata-xmlrpc 2017-04-20 18:28:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1112