Bug 1427846
Summary: | cns-deploy tool failed to setup: failed to communicate with heketi service | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Apeksha <akhakhar> |
Component: | cns-deploy-tool | Assignee: | Mohamed Ashiq <mliyazud> |
Status: | CLOSED ERRATA | QA Contact: | Apeksha <akhakhar> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | cns-3.5 | CC: | akhakhar, hchiramm, jarrpa, madam, mliyazud, pprakash, vinug |
Target Milestone: | --- | ||
Target Release: | CNS 3.5 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | rhgs-volmanager-rhel7:3.2.0-2 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-04-20 18:26:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1415600 |
Description
Apeksha
2017-03-01 11:46:28 UTC
Can you run below command and get the result ? #oadm policy add-scc-to-user anyuid -n aplo -z heketi-service-account The command dint give any output: [root@dhcp47-79 ~]# oadm policy add-scc-to-user anyuid -n aplo -z heketi-service-account [root@dhcp47-79 ~]# (In reply to Apeksha from comment #3) > The command dint give any output: > > [root@dhcp47-79 ~]# oadm policy add-scc-to-user anyuid -n aplo -z > heketi-service-account > [root@dhcp47-79 ~]# Didnt expect an output from above command :). Can you please rerun your tests and find the result now ? I found the following in the pastebin contents: [root@dhcp46-216 ~]# oc logs heketi-1-xzf4w -c heketi /bin/sh: /usr/sbin/heketi-start.sh: Permission denied Ashiq, do we have a permissions issue in the container? (In reply to Jose A. Rivera from comment #5) > I found the following in the pastebin contents: > > [root@dhcp46-216 ~]# oc logs heketi-1-xzf4w -c heketi > /bin/sh: /usr/sbin/heketi-start.sh: Permission denied > > Ashiq, do we have a permissions issue in the container? There is a permission setting of 500 for the startup script. However he is able to run plain docker container without issues using this image. iic, he is checking it in OCP env. (In reply to Humble Chirammal from comment #6) > (In reply to Jose A. Rivera from comment #5) > > I found the following in the pastebin contents: > > > > [root@dhcp46-216 ~]# oc logs heketi-1-xzf4w -c heketi > > /bin/sh: /usr/sbin/heketi-start.sh: Permission denied > > > > Ashiq, do we have a permissions issue in the container? > > There is a permission setting of 500 for the startup script. However he is > able to run plain docker container without issues using this image. iic, he > is checking it in OCP env. @Jose yeah, As Humble mentioned I set the start-script permission to 500. It worked for me in the docker run, But I am also facing the same permission denied in ocp setup of mine. If you can figure what is the difference in running in docker and ocp pod environment changes, we can figure what is going wrong. JFYI we also set 500 permission to scripts in gluster images, which we never faced problem. I reran cns_deploy tool after running this command - oadm policy add-scc-to-user anyuid -n aplo -z heketi-service-account, it worked fine. cns-deploy -n aplo -g topology.json -c oc -y Cluster Id: 7918d44cd584a71f7ff52a65a1dbd7dd\n\n Volumes:\n\n Nodes:\n\n\tNode Id: c14ddbc8043959bd1160ac2ad6850e02\n\tState: online\n\tCluster Id: 7918d44cd584a71f7ff52a65a1dbd7dd\n\tZone: 1\n\tManagement Hostname: dhcp47-122.lab.eng.blr.redhat.com\n\tStorage Hostname: 10.70.47.122\n\tDevices:\n\t\tId:8034e12f438b14f8dd23f3b671190a28 Name:/dev/sdd State:online Size (GiB):199 Used (GiB):0 Free (GiB):199 \n\t\t\tBricks:\n\n\tNode Id: cc8e53fd95aa9e0994aaa1428254fc8a\n\tState: online\n\tCluster Id: 7918d44cd584a71f7ff52a65a1dbd7dd\n\tZone: 1\n\tManagement Hostname: dhcp47-94.lab.eng.blr.redhat.com\n\tStorage Hostname: 10.70.47.94\n\tDevices:\n\t\tId:122a68fd55a8feff87193254a63d8784 Name:/dev/sdd State:online Size (GiB):199 Used (GiB):0 Free (GiB):199 \n\t\t\tBricks:\n\n\tNode Id: da3182148f764d2a2c98ac265c904daa\n\tState: online\n\tCluster Id: 7918d44cd584a71f7ff52a65a1dbd7dd\n\tZone: 1\n\tManagement Hostname: dhcp47-87.lab.eng.blr.redhat.com\n\tStorage Hostname: 10.70.47.87\n\tDevices:\n\t\tId:684cb1ab8c3bf1c57584f31106b378ef Name:/dev/sdd State:online Size (GiB):199 Used (GiB):0 Free (GiB):199 \n\t\t\tBricks:\n oc get pods NAME READY STATUS RESTARTS AGE aplo-router-1-xvvkl 1/1 Running 1 17h glusterfs-2bsn7 1/1 Running 0 2m glusterfs-3q880 1/1 Running 0 2m glusterfs-fl1lk 1/1 Running 0 2m heketi-1-vhl8z 1/1 Running 0 1m Ashiq, We probably never had issues with the GlusterFS pods because it runs as privileged, whereas the heketi pod is not. Given that anyuid worked, my guess is that the script has a bad owner or group. Is there anything we can do about this? Fix will be in the next build of the heketi docker image. This container build should have the fix # rhgs3/rhgs-volmanager-rhel7:3.2.0-2 . I am moving the bz to "ON_QA" I hit this issue on build - heketi-client-4.0.0-2.el7rhgs.x86_64 and cns-deploy-4.0.0-4.el7rhgs.x86_64 Output of cns_deploy command and other oc commands - http://pastebin.test.redhat.com/466117 (In reply to Apeksha from comment #18) > I hit this issue on build - heketi-client-4.0.0-2.el7rhgs.x86_64 and > cns-deploy-4.0.0-4.el7rhgs.x86_64 > > Output of cns_deploy command and other oc commands - > http://pastebin.test.redhat.com/466117 I have gone through the pastebin looks like we might be facing a new issue here. If this is not a permission denied from the volmanager-docker container then it is something new and the fix which was supposed to handle the permission denied error is fixed. If you are not seeing a permission denied error<https://bugzilla.redhat.com/show_bug.cgi?id=1427846#c5>, can you please do a verified on this one and create a new BZ. As the RCA for this issue is completely different and the old issue is fixed. Can you also quit/(edit the script to quit) when a failure happens? If you are really sure about this issue. you can add `exit 1` in the script after heketi is started so the state will be preserved (stop from abort) and helpful to debug. Please Let me know If I am wrong. Moving it back to on QA to verify the fix of permission denied. Ashiq, As suggested in #c19 and our discussion i created a new setup and put a exit 1 where it fails in the cns_deploy script and ran it. Output of cns_deploy command and other oc commands: http://pastebin.test.redhat.com/466318 Since i dont see any permission issue i have created a new bug - https://bugzilla.redhat.com/show_bug.cgi?id=1434055 I dont see the permission issue in build - cns-deploy-4.0.0-9.el7rhgs.x86_64 and rhgs3/rhgs-volmanager-rhel7:3.2.0-4, hence marking it as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1112 |