Bug 1434055
Summary: | cns-deploy tool failed to setup: failed to communicate with heketi service | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Apeksha <akhakhar> |
Component: | cns-deploy-tool | Assignee: | Mohamed Ashiq <mliyazud> |
Status: | CLOSED ERRATA | QA Contact: | Apeksha <akhakhar> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | cns-3.5 | CC: | akhakhar, hchiramm, jarrpa, mliyazud, pprakash, rcyriac, rtalur, vinug |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | CNS 3.5 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | cns-deploy-4.0.0-6.el7rhgs | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-04-20 18:28:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1415600 |
Description
Apeksha
2017-03-20 15:59:14 UTC
Additional info : [root@rhsauto045 ~]# oc get pods NAME READY STATUS RESTARTS AGE aplo-router-1-q9d38 1/1 Running 0 1h glusterfs-5k3vj 1/1 Running 0 37m glusterfs-g42qk 1/1 Running 0 37m glusterfs-h5smd 1/1 Running 0 37m heketi-1-deploy 0/1 Error 0 32m [root@rhsauto045 ~]# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY aplo-router 1 1 1 config heketi 1 1 0 config [root@rhsauto045 ~]# oc describe dc heketi Name: heketi Namespace: aplo Created: 33 minutes ago Labels: glusterfs=heketi-dc template=heketi Description: Defines how to deploy Heketi Annotations: <none> Latest Version: 1 Selector: glusterfs=heketi-pod Replicas: 1 Triggers: Config Strategy: Recreate Template: Labels: glusterfs=heketi-pod Service Account: heketi-service-account Containers: heketi: Image: rhgs3/rhgs-volmanager-rhel7:3.2.0-2 Port: 8080/TCP Liveness: http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3 Volume Mounts: /var/lib/heketi from db (rw) Environment Variables: HEKETI_USER_KEY: HEKETI_ADMIN_KEY: HEKETI_EXECUTOR: kubernetes HEKETI_FSTAB: /var/lib/heketi/fstab HEKETI_SNAPSHOT_LIMIT: 14 HEKETI_KUBE_GLUSTER_DAEMONSET: y Volumes: db: Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime) EndpointsName: heketi-storage-endpoints Path: heketidbstorage ReadOnly: false Deployment #1 (latest): Name: heketi-1 Created: 33 minutes ago Status: Failed Replicas: 0 current / 0 desired Selector: deployment=heketi-1,deploymentconfig=heketi,glusterfs=heketi-pod Labels: glusterfs=heketi-dc,openshift.io/deployment-config.name=heketi,template=heketi Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 33m 33m 1 {deploymentconfig-controller } Normal DeploymentCreated Created new replication controller "heketi-1" for version 1 22m 22m 1 {deploymentconfig-controller } Normal ReplicationControllerScaled Scaled replication controller "heketi-1" from 1 to 0 22m 22m 1 {heketi-1-deploy } Warning FailedCreate Error creating: pods "heketi-1-" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.volumes[0]: Invalid value: "glusterfs": glusterfs volumes are not allowed to be used] I am facing the same issue in my setup. <invalid> <invalid> 1 {heketi-1-deploy } Warning FailedCreate Error creating: pods "heketi-1-" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.volumes[0]: Invalid value: "glusterfs": glusterfs volumes are not allowed to be used] I am trying to isolate the issue. I have mounted the heketidbstorage volume on all the node and did a bind mount of the db volume mountpoint into heketi container's /var/lib/heketi <invalid> <invalid> 1 {heketi-4-deploy } Warning FailedCreate Error creating: pods "heketi-4-" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used] facing issues. What I did: # mount -t glusterfs <ip>:/heketidbstorage /mnt on all the nodes then in heketi template edit - glusterfs: - endpoints: heketi-storage-endpoints - path: heketidbstorage + hostPath: + path: "/mnt" This caused the above error. This proves nothing wrong in gluster volume plugin in kube. Still debugging. # oadm policy add-scc-to-user privileged -z heketi-service-account Above worked for me. Please do the above and try again. # oc delete dc,routes,svc,ep,rc heketi # oc process heketi | oc create -f - ####### For the sake of what I tried, # oadm policy add-scc-to-user hostmount-anyuid system:serviceaccount:aplo:heketi-service-account This worked for the hostPath issue. It is a permission issue for the heketi-service-account user giving it the permission will allow it to run it. Got the fix from [1] [1] https://lists.openshift.redhat.com/openshift-archives/users/2016-May/msg00069.html @Jose We had the privileged requirements for upstream, so Is it downstream too now? Can you confirm this? (In reply to Mohamed Ashiq from comment #5) > Can you try the above? I tried the workaround, i am able to setup heketi now. Steps: 1. oc delete dc,routes,svc,ep,rc heketi 2. oadm policy add-scc-to-user privileged -z heketi-service-account 3. oc process heketi | oc create -f - [root@rhsauto045 ~]# oc get pods NAME READY STATUS RESTARTS AGE aplo-router-1-q9d38 1/1 Running 0 15h glusterfs-5k3vj 1/1 Running 0 15h glusterfs-g42qk 1/1 Running 0 15h glusterfs-h5smd 1/1 Running 0 15h heketi-1-cwr4g 1/1 Running 0 1m (In reply to Apeksha from comment #7) > (In reply to Mohamed Ashiq from comment #5) > > Can you try the above? > > I tried the workaround, i am able to setup heketi now. I think this will be the proper way to deploy in future(not a workaround). > > Steps: > 1. oc delete dc,routes,svc,ep,rc heketi > 2. oadm policy add-scc-to-user privileged -z heketi-service-account > 3. oc process heketi | oc create -f - > > [root@rhsauto045 ~]# oc get pods > NAME READY STATUS RESTARTS AGE > aplo-router-1-q9d38 1/1 Running 0 15h > glusterfs-5k3vj 1/1 Running 0 15h > glusterfs-g42qk 1/1 Running 0 15h > glusterfs-h5smd 1/1 Running 0 15h > heketi-1-cwr4g 1/1 Running 0 1m Putting back the need info Jose to confirm? I have sent a patch upstream for the same. https://github.com/gluster/gluster-kubernetes/pull/204 Got the same error message while running cns_deploy on this build - cns-deploy-4.0.0-6.el7rhgs.x86_64, heketi-client-4.0.0-3.el7rhgs.x86_64 Output of cns_Deploy and other oc commands: http://pastebin.test.redhat.com/466976 (In reply to Apeksha from comment #14) > Got the same error message while running cns_deploy on this build - > cns-deploy-4.0.0-6.el7rhgs.x86_64, heketi-client-4.0.0-3.el7rhgs.x86_64 > > Output of cns_Deploy and other oc commands: > http://pastebin.test.redhat.com/466976 Can please tell me what is the error clearly in pastebin everything looks fine in setup rather than "Failed to communicate with heketi service.\nPlease verify that a router has been properly configured." Heketi pod seems to be running perfectly good. Can you curl for /hello and give the output? Thanks for giving access to the machine. # oc describe route heketi Name: heketi Namespace: aplo Created: 9 hours ago Labels: glusterfs=heketi-route template=heketi Annotations: openshift.io/host.generated=true Requested Host: heketi-aplo.cloudapps.myaplo.com exposed on router aplo-router 9 hours ago Path: <none> TLS Termination: <none> Insecure Policy: <none> Endpoint Port: <all endpoint ports> Service: heketi Weight: 100 (100%) Endpoints: 10.129.0.4:8080 # oc get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE aplo-router 172.30.43.194 <none> 80/TCP,443/TCP,1936/TCP 9h heketi 172.30.66.222 <none> 8080/TCP 9h heketi-storage-endpoints 172.30.106.187 <none> 1/TCP 9h route points to wrong endpoint. This is the issue rather as old one is fixed. (In reply to Mohamed Ashiq from comment #16) > Thanks for giving access to the machine. > > # oc describe route heketi > Name: heketi > Namespace: aplo > Created: 9 hours ago > Labels: glusterfs=heketi-route > template=heketi > Annotations: openshift.io/host.generated=true > Requested Host: heketi-aplo.cloudapps.myaplo.com > exposed on router aplo-router 9 hours ago > Path: <none> > TLS Termination: <none> > Insecure Policy: <none> > Endpoint Port: <all endpoint ports> > > Service: heketi > Weight: 100 (100%) > Endpoints: 10.129.0.4:8080 > > > # oc get svc > NAME CLUSTER-IP EXTERNAL-IP PORT(S) > AGE > aplo-router 172.30.43.194 <none> > 80/TCP,443/TCP,1936/TCP 9h > heketi 172.30.66.222 <none> 8080/TCP > 9h > heketi-storage-endpoints 172.30.106.187 <none> 1/TCP > 9h > > > route points to wrong endpoint. It should point to the svc ip:port rather it is pointing to the pod directly. I did: # oc delete svc,route,dc heketi # oc process heketi| oc create -f - Now it seems to be working. Can you try this again in new setup? so that we can know this issue is hit for sure. > > > This is the issue rather as old one is fixed. (In reply to Mohamed Ashiq from comment #17) > (In reply to Mohamed Ashiq from comment #16) > > Thanks for giving access to the machine. > > > > # oc describe route heketi > > Name: heketi > > Namespace: aplo > > Created: 9 hours ago > > Labels: glusterfs=heketi-route > > template=heketi > > Annotations: openshift.io/host.generated=true > > Requested Host: heketi-aplo.cloudapps.myaplo.com > > exposed on router aplo-router 9 hours ago > > Path: <none> > > TLS Termination: <none> > > Insecure Policy: <none> > > Endpoint Port: <all endpoint ports> > > > > Service: heketi > > Weight: 100 (100%) > > Endpoints: 10.129.0.4:8080 > > > > > > # oc get svc > > NAME CLUSTER-IP EXTERNAL-IP PORT(S) > > AGE > > aplo-router 172.30.43.194 <none> > > 80/TCP,443/TCP,1936/TCP 9h > > heketi 172.30.66.222 <none> 8080/TCP > > 9h > > heketi-storage-endpoints 172.30.106.187 <none> 1/TCP > > 9h > > > > > > route points to wrong endpoint. > > It should point to the svc ip:port rather it is pointing to the pod > directly. > > I did: > > # oc delete svc,route,dc heketi > # oc process heketi| oc create -f - > > Now it seems to be working. > > Can you try this again in new setup? so that we can know this issue is hit > for sure. > > > > > > > This is the issue rather as old one is fixed. Please Retry this. I have not faced it and not able to reproduce in my setup. I verified the ip updation on route and everytime the pod is restarted/ newly spawned the ip is updated. This works for me nothing is there to fix in this issue. I think it is spurious and also if consistent then a bug in openshift route module as it is not updated on the right time. Moving it back to on_QA as the issue filed is not been seen and heketi is running perfectly ok. Heketi is up and running after running cns_deploy on build: cns-deploy-4.0.0-9.el7rhgs.x86_64 heketi-client-4.0.0-4.el7rhgs.x86_64, hence marking it as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1112 Marking qe-test-coverage as - since the preferred mode of deployment is using ansible |