Bug 1623777
| Summary: | Fail to deploy CNS with both glusterfs and glusterfs_registry group | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wenkai Shi <weshi> | |
| Component: | Installer | Assignee: | Jose A. Rivera <jarrpa> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Johnny Liu <jialiu> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.11.0 | CC: | aos-bugs, bleanhar, crmarquesjc, jokerman, madam, mmccomas, pprakash, sarumuga, wmeng, wsun, xxia | |
| Target Milestone: | --- | Keywords: | TestBlocker | |
| Target Release: | 3.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: |
undefined
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1627454 (view as bug list) | Environment: | ||
| Last Closed: | 2018-12-21 15:23:39 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1626751, 1627454 | |||
Where is the name "heketi-storage-1-pjm59" coming from if its not in the cluster? (In reply to Jose A. Rivera from comment #4) > Where is the name "heketi-storage-1-pjm59" coming from if its not in the > cluster? I've no idea, I've reproduce this in another deployment, it's still not in the cluster, but name existing. There should be two heketi pods, one for glusterfs and one for glusterfs_registry. What is the output for "oc get po" on their respective namespaces? I'm facing the same problem. In my case, ansible creates two namespaces: app-storage (for CNS storage cluster) and infra-storage (for CNS storage for Openshift Infrastructure). The first CNS seems to execute Ok. But for the second one, I receive the same error above. One thing I noticed is that the pod the heketi-cli command is trying to execute inside is from app-storage namespace, although the command explicitly says infra-storage. I think the previous steps misread the right pod. In another words, using the example from these report: heketi-storage-1-pjm59 would be in the "app-storage" namespace. But the steps for "infra-storage" CNS would retrieve this pod instead of the equivalent one from the right namespace. Another thing I noticed is that my "deploy-heketi-registry-xxxx" pod shows the same error from above and do not create the "heketi-storage-1-xxxx" pod (that would be selected - if the selector was not wrong - for the previous heketi-cli command but is not being created). oc logs -f deploy-heketi-registry-xxxx stat: cannot stat '/var/lib/heketi/heketi.db': No such file or directory Heketi 6.0.0 [heketi] ERROR 2018/09/04 xx:yy:zz /src/github.com/heketi/heketi/apps/glusterfs/app.go:100: invalid log level: (In reply to Jose A. Rivera from comment #6) > There should be two heketi pods, one for glusterfs and one for > glusterfs_registry. What is the output for "oc get po" on their respective > namespaces? It's default namespace, for glusterfs_registry group. That does not answer my question. :) (In reply to Jose A. Rivera from comment #9) > That does not answer my question. :) Sorry... # oc get po -n default NAME READY STATUS RESTARTS AGE deploy-heketi-registry-1-ltxnl 1/1 Running 0 12m glusterfs-registry-4zjl5 1/1 Running 0 13m glusterfs-registry-7tztw 1/1 Running 0 13m glusterfs-registry-c8t54 1/1 Running 0 13m # oc get po -n glusterfs NAME READY STATUS RESTARTS AGE glusterblock-storage-provisioner-dc-1-9xv7n 1/1 Running 0 14m glusterfs-storage-8ddtb 1/1 Running 0 18m glusterfs-storage-p5r6f 1/1 Running 0 18m glusterfs-storage-zz2bw 1/1 Running 0 18m heketi-storage-1-9s2qp 1/1 Running 0 15m PR submitted for master: https://github.com/openshift/openshift-ansible/pull/9971 PR merged. The 3.11 PR 9980 has been merged to openshift-ansible-3.11.2-1,please check the bug. Verified with version openshift-ansible-3.11.3-1.git.0.42aeb49.el7_5.noarch, installation succeed. Move to VERIFIED per comment #16. Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content. |
Description of problem: Fail to deploy CNS with both glusterfs and glusterfs_registry group, it's always failed in Verify heketi service task. Version-Release number of the following components: openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7 ansible-2.6.2-1.el7ae.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy CNS with both glusterfs and glusterfs_registry group 2. 3. Actual results: Installer failed in Verify heketi service task: ... TASK [openshift_storage_glusterfs : Verify heketi service] ********************* Thursday 30 August 2018 03:09:37 -0400 (0:00:00.115) 0:26:41.155 ******* fatal: [qe-weshi-cnsb-master-etcd-1.0830-lhj.qe.rhcloud.com]: FAILED! => {"changed": false, "cmd": ["oc", "--config=/tmp/openshift-glusterfs-ansible-OtyRlI/admin.kubeconfig", "rsh", "--namespace=default", "heketi-storage-1-pjm59", "heketi-cli", "-s", "http://localhost:8080", "--user", "admin", "--secret", "y3rSHd1iifC/3iOAp/ETuQ4WXTNepegv091mESqaA00=", "cluster", "list"], "delta": "0:00:00.243109", "end": "2018-08-30 03:11:33.915053", "msg": "non-zero return code", "rc": 1, "start": "2018-08-30 03:11:33.671944", "stderr": "Error from server (NotFound): pods \"heketi-storage-1-pjm59\" not found", "stderr_lines": ["Error from server (NotFound): pods \"heketi-storage-1-pjm59\" not found"], "stdout": "", "stdout_lines": []} ... Expected results: Installer should pass here Additional info: Login to the master, deploy-heketi pod is keep pending: # oc get po NAME READY STATUS RESTARTS AGE deploy-heketi-registry-1-wlxs9 1/1 Running 0 16m glusterfs-registry-pgdt7 1/1 Running 0 17m glusterfs-registry-skd7g 1/1 Running 0 17m glusterfs-registry-tbm2p 1/1 Running 0 17m # oc delete po deploy-heketi-registry-1-wlxs9 pod "deploy-heketi-registry-1-wlxs9" deleted # oc get po NAME READY STATUS RESTARTS AGE deploy-heketi-registry-1-667r8 1/1 Running 0 37s glusterfs-registry-pgdt7 1/1 Running 0 18m glusterfs-registry-skd7g 1/1 Running 0 18m glusterfs-registry-tbm2p 1/1 Running 0 18m # oc describe po deploy-heketi-registry-1-667r8 Name: deploy-heketi-registry-1-667r8 Namespace: default Priority: 0 PriorityClassName: <none> Node: qe-weshi-cnsb-node-registry-router-1/10.240.0.29 Start Time: Thu, 30 Aug 2018 03:27:25 -0400 Labels: deploy-heketi=support deployment=deploy-heketi-registry-1 deploymentconfig=deploy-heketi-registry glusterfs=deploy-heketi-registry-pod Annotations: openshift.io/deployment-config.latest-version=1 openshift.io/deployment-config.name=deploy-heketi-registry openshift.io/deployment.name=deploy-heketi-registry-1 openshift.io/scc=restricted Status: Running IP: 10.128.4.3 Controlled By: ReplicationController/deploy-heketi-registry-1 Containers: heketi: Container ID: docker://d737fd472c78612c4c34f68b8e00c813286d93056d78f929ba7cec61fa44471e Image: registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7 Image ID: docker-pullable://registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7@sha256:5d93c20bce1d76e508254d589ffd8d0b324a404bbab5a20deff6916dd27a1f39 Port: 8080/TCP Host Port: 0/TCP State: Running Started: Thu, 30 Aug 2018 03:27:42 -0400 Ready: True Restart Count: 0 Liveness: http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3 Environment: HEKETI_USER_KEY: oTfeiMoV1X1U/XwAdqnMj9eSFZxXw3rVnWrF2IQq3TQ= HEKETI_ADMIN_KEY: y3rSHd1iifC/3iOAp/ETuQ4WXTNepegv091mESqaA00= HEKETI_EXECUTOR: kubernetes HEKETI_FSTAB: /var/lib/heketi/fstab HEKETI_SNAPSHOT_LIMIT: 14 HEKETI_KUBE_GLUSTER_DAEMONSET: 1 HEKETI_IGNORE_STALE_OPERATIONS: true Mounts: /etc/heketi from config (rw) /var/lib/heketi from db (rw) /var/run/secrets/kubernetes.io/serviceaccount from heketi-registry-service-account-token-87f9l (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: config: Type: Secret (a volume populated by a Secret) SecretName: heketi-registry-config-secret Optional: false heketi-registry-service-account-token-87f9l: Type: Secret (a volume populated by a Secret) SecretName: heketi-registry-service-account-token-87f9l Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 47s default-scheduler Successfully assigned default/deploy-heketi-registry-1-667r8 to qe-weshi-cnsb-node-registry-router-1 Normal Pulling 45s kubelet, qe-weshi-cnsb-node-registry-router-1 pulling image "registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7" Normal Pulled 30s kubelet, qe-weshi-cnsb-node-registry-router-1 Successfully pulled image "registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7" Normal Created 30s kubelet, qe-weshi-cnsb-node-registry-router-1 Created container Normal Started 30s kubelet, qe-weshi-cnsb-node-registry-router-1 Started container # oc logs -f deploy-heketi-registry-1-667r8 stat: cannot stat '/var/lib/heketi/heketi.db': No such file or directory Heketi 6.0.0 [heketi] ERROR 2018/08/30 07:27:42 /src/github.com/heketi/heketi/apps/glusterfs/app.go:100: invalid log level: [heketi] INFO 2018/08/30 07:27:42 Loaded kubernetes executor [heketi] INFO 2018/08/30 07:27:42 Block: Auto Create Block Hosting Volume set to true [heketi] INFO 2018/08/30 07:27:42 Block: New Block Hosting Volume size 100 GB [heketi] INFO 2018/08/30 07:27:42 GlusterFS Application Loaded [heketi] INFO 2018/08/30 07:27:42 Started Node Health Cache Monitor Authorization loaded Listening on port 8080 [heketi] INFO 2018/08/30 07:27:52 Starting Node Health Status refresh [heketi] INFO 2018/08/30 07:27:52 Cleaned 0 nodes from health cache [heketi] INFO 2018/08/30 07:29:42 Starting Node Health Status refresh [heketi] INFO 2018/08/30 07:29:42 Cleaned 0 nodes from health cache [heketi] INFO 2018/08/30 07:31:42 Starting Node Health Status refresh [heketi] INFO 2018/08/30 07:31:42 Cleaned 0 nodes from health cache [heketi] INFO 2018/08/30 07:33:42 Starting Node Health Status refresh [heketi] INFO 2018/08/30 07:33:42 Cleaned 0 nodes from health cache [heketi] INFO 2018/08/30 07:35:42 Starting Node Health Status refresh [heketi] INFO 2018/08/30 07:35:42 Cleaned 0 nodes from health cache ^C