Created attachment 1265308 [details] log file of cns-deploy Description of problem: cns-deploy throws the following error after setting up gluster pods and heketi successfully. snpippet of error message: Determining heketi service URL ... OK Failed to communicate with heketi service. Please verify that a router has been properly configured. complete log file is attached to the bug. [root@dhcp46-202 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-hcp7j 1/1 Running 0 13h 10.70.46.165 dhcp46-165.lab.eng.blr.redhat.com glusterfs-jg4kw 1/1 Running 0 13h 10.70.47.21 dhcp47-21.lab.eng.blr.redhat.com glusterfs-vx1s0 1/1 Running 0 13h 10.70.47.51 dhcp47-51.lab.eng.blr.redhat.com heketi-1-wgdc8 1/1 Running 0 13h 10.131.0.4 dhcp47-180.lab.eng.blr.redhat.com mongodb-1-1-kzkrt 1/1 Running 0 8m 10.130.0.4 dhcp47-78.lab.eng.blr.redhat.com storage-project-router-1-2l40s 1/1 Running 0 13h 10.70.47.65 dhcp47-65.lab.eng.blr.redhat.com Version-Release number of selected component (if applicable): heketi-client-4.0.0-3.el7rhgs.x86_64 cns-deploy-4.0.0-6.el7rhgs.x86_64 How reproducible: This is quite consistently seen Steps to Reproduce: 1. On a fresh openshift cluster, run cns-deploy Actual results: Following error message is thrown towards the end of cns-deploy although heketi configuration is successful Failed to communicate with heketi service. Please verify that a router has been properly configured. Expected results: These messages should not be seen Additional info: The issue seen here is a regression as we had never seen this message during 3.4 on successful completion of cns-deploy
Added this 2 debug statements in cns_deploy script: echo "service endpoint ${heketi_service}" ${CLI} describe routes/heketi Output of cns_deploy command: http://pastebin.test.redhat.com/467105 So, the endpoint is none: Name: heketi Namespace: aplo Created: 3 seconds ago Labels: glusterfs=heketi-route template=heketi Annotations: openshift.io/host.generated=true Requested Host: heketi-aplo.cloudapps.myaplo.com exposed on router aplo-router 4 seconds ago Path: <none> TLS Termination: <none> Insecure Policy: <none> Endpoint Port: <all endpoint ports> Service: heketi Weight: 100 (100%) Endpoints: <none> After a minute i see that the endpoints gets updated: [root@rhsauto049 ~]# oc describe routes/heketi Name: heketi Namespace: aplo Created: 2 minutes ago Labels: glusterfs=heketi-route template=heketi Annotations: openshift.io/host.generated=true Requested Host: heketi-aplo.cloudapps.myaplo.com exposed on router aplo-router 2 minutes ago Path: <none> TLS Termination: <none> Insecure Policy: <none> Endpoint Port: <all endpoint ports> Service: heketi Weight: 100 (100%) Endpoints: 10.131.0.3:8080
Patch upstream for the same: https://github.com/gluster/gluster-kubernetes/pull/208
git show 7dd195f8 commit 7dd195f8ad1396e933daa2957bba5159fa0d29f5 Author: Mohamed Ashiq Liyazudeen <mliyazud> Date: Wed Mar 22 16:41:52 2017 +0530 Should not proceed if endpoint is none for heketi service Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud> diff --git a/deploy/gk-deploy b/deploy/gk-deploy index 3e09cf0..0f138e5 100755 --- a/deploy/gk-deploy +++ b/deploy/gk-deploy @@ -465,11 +465,10 @@ output "OK" heketi_service="" debug -n "Determining heketi service URL ... " -while [[ "x${heketi_service}" == "x" ]]; do - if [[ "${CLI}" == *oc\ * ]]; then - heketi_service=$(${CLI} describe routes/deploy-heketi | grep "Requested Host:" | awk '{print $3}') - else - heketi_service=$(${CLI} describe svc/deploy-heketi | grep "Endpoints:" | awk '{print $2}') +while [[ "x${heketi_service}" == "x" ]] || [[ "${heketi_service}" == "<none>" ]]; do + heketi_service=$(${CLI} describe svc/deploy-heketi | grep "Endpoints:" | awk '{print $2}') + if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then + heketi_service=$(${CLI} describe routes/deploy-heketi | grep "Requested Host:" | awk '{print $3}') fi sleep 1 done @@ -528,11 +527,10 @@ output "OK" heketi_service="" debug -n "Determining heketi service URL ... " -while [[ "x${heketi_service}" == "x" ]]; do - if [[ "${CLI}" == *oc\ * ]]; then - heketi_service=$(${CLI} describe routes/heketi | grep "Requested Host:" | awk '{print $3}') - else - heketi_service=$(${CLI} describe svc/heketi | grep "Endpoints:" | awk '{print $2}') +while [[ "x${heketi_service}" == "x" ]] || [[ "${heketi_service}" == "<none>" ]]; do + heketi_service=$(${CLI} describe svc/heketi | grep "Endpoints:" | awk '{print $2}') + if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then + heketi_service=$(${CLI} describe routes/heketi | grep "Requested Host:" | awk '{print $3}') fi sleep 1 done ======== RPMDIFF is failing with the following message: --------------------------------------------- usr/bin/cns-deploy is no longer a valid /bin/bash script on x86_64: /usr/bin/cns-deploy: line 492: conditional binary operator expected /usr/bin/cns-deploy: line 492: syntax error near `"${CLI}"' /usr/bin/cns-deploy: line 492: ` if [[ "${heketi_service}" != "<none>"]] && [[ "${CLI}" == *oc\ * ]]; then' =========
My Bad. Sorry for the trouble. Addressed the issue Upstream. https://github.com/gluster/gluster-kubernetes/pull/212
(In reply to Mohamed Ashiq from comment #12) > My Bad. Sorry for the trouble. Addressed the issue Upstream. > > https://github.com/gluster/gluster-kubernetes/pull/212 Merged Upstream. Thanks Obnox. Ramky can you trigger a build with the PR.
patch at https://github.com/gluster/gluster-kubernetes/pull/230
Verified as fixed in cns-deploy-4.0.0-12.el7rhgs
Cns-deploy fails when run from client, on build - cns-deploy-4.0.0-12.el7rhgs
patch upstream at https://github.com/gluster/gluster-kubernetes/pull/233
Cns_deploy, works from client on build : cns-deploy-4.0.0-13.el7rhgs.x86_64 Output of cns_deploy command: Using OpenShift CLI. NAME STATUS AGE storage-project Active 1h Using namespace "storage-project". Checking that heketi pod is not running ... Checking status of pods matching 'glusterfs=heketi-pod': No resources found. Timed out waiting for pods matching 'glusterfs=heketi-pod'. OK template "deploy-heketi" created serviceaccount "heketi-service-account" created template "heketi" created template "glusterfs" created role "edit" added: "system:serviceaccount:storage-project:heketi-service-account" Marking 'dhcp46-205.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp46-205.lab.eng.blr.redhat.com" labeled Marking 'dhcp46-127.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp46-127.lab.eng.blr.redhat.com" labeled Marking 'dhcp46-108.lab.eng.blr.redhat.com' as a GlusterFS node. node "dhcp46-108.lab.eng.blr.redhat.com" labeled Deploying GlusterFS pods. daemonset "glusterfs" created Waiting for GlusterFS pods to start ... Checking status of pods matching 'glusterfs-node=pod': glusterfs-04l4t 1/1 Running 0 1m glusterfs-9hd1t 1/1 Running 0 1m glusterfs-tdzpg 1/1 Running 0 1m OK service "deploy-heketi" created route "deploy-heketi" created deploymentconfig "deploy-heketi" created Waiting for deploy-heketi pod to start ... Checking status of pods matching 'glusterfs=heketi-pod': deploy-heketi-1-vsbcp 1/1 Running 0 52s OK Determining heketi service URL ... OK Creating cluster ... ID: 4ede397bbe0563850bf37e14d77e726e Creating node dhcp46-205.lab.eng.blr.redhat.com ... ID: da847efddcdbcadb93bb343772ecb090 Adding device /dev/sdd ... OK Adding device /dev/sde ... OK Adding device /dev/sdf ... OK Creating node dhcp46-127.lab.eng.blr.redhat.com ... ID: 09e92fc0ae6693f834de2e8d485dfb87 Adding device /dev/sdd ... OK Adding device /dev/sde ... OK Adding device /dev/sdf ... OK Creating node dhcp46-108.lab.eng.blr.redhat.com ... ID: d159d21af107268cfe5bd90fd7226879 Adding device /dev/sdd ... OK Adding device /dev/sde ... OK Adding device /dev/sdf ... OK heketi topology loaded. Saving heketi-storage.json secret "heketi-storage-secret" created endpoints "heketi-storage-endpoints" created service "heketi-storage-endpoints" created job "heketi-storage-copy-job" created Checking status of pods matching 'job-name=heketi-storage-copy-job': heketi-storage-copy-job-gcxkl 0/1 Completed 0 5s deploymentconfig "deploy-heketi" deleted route "deploy-heketi" deleted service "deploy-heketi" deleted job "heketi-storage-copy-job" deleted pod "deploy-heketi-1-vsbcp" deleted secret "heketi-storage-secret" deleted service "heketi" created route "heketi" created deploymentconfig "heketi" created Waiting for heketi pod to start ... Checking status of pods matching 'glusterfs=heketi-pod': deploy-heketi-1-vsbcp 1/1 Terminating 0 1m OK Determining heketi service URL ... OK heketi is now running. Ready to create and provide GlusterFS volumes.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1112