Description of problem: OCP+CNS configuration via ansible fails at S3 deployment. OCP, Gluster, gluster-block got configured successfully. snippet of playbook logs: FAILED - RETRYING: Wait for gluster-s3 PVCs (5 retries left). FAILED - RETRYING: Wait for gluster-s3 PVCs (4 retries left). FAILED - RETRYING: Wait for gluster-s3 PVCs (3 retries left). FAILED - RETRYING: Wait for gluster-s3 PVCs (2 retries left). FAILED - RETRYING: Wait for gluster-s3 PVCs (1 retries left). fatal: [10.70.46.136]: FAILED! => {"attempts": 30, "changed": false, "results": {"cmd": "/usr/bin/oc get pvc --selector=glusterfs=s3-storage-testvolume-storage -o json -n app-storage", "results": [{"apiVersion": "v1", "items": [{"apiVersion": "v1", "kind": "PersistentVolumeClaim", "metadata": {"annotations": {"volume.beta.kubernetes.io/storage-class": "glusterfs-storage", "volume.beta.kubernetes.io/storage-provisioner": "kubernetes.io/glusterfs"}, "creationTimestamp": "2018-03-02T06:46:40Z", "labels": {"gluster-s3": "storage-testvolume-pvc", "glusterfs": "s3-storage-testvolume-storage"}, "name": "gluster-s3-storage-testvolume-claim", "namespace": "app-storage", "resourceVersion": "5236", "selfLink": "/api/v1/namespaces/app-storage/persistentvolumeclaims/gluster-s3-storage-testvolume-claim", "uid": "762e1a7f-1de5-11e8-a48d-005056a509f7"}, "spec": {"accessModes": ["ReadWriteMany"], "resources": {"requests": {"storage": "2Gi"}}}, "status": {"phase": "Pending"}}, {"apiVersion": "v1", "kind": "PersistentVolumeClaim", "metadata": {"annotations": {"volume.beta.kubernetes.io/storage-class": "glusterfs-storage", "volume.beta.kubernetes.io/storage-provisioner": "kubernetes.io/glusterfs"}, "creationTimestamp": "2018-03-02T06:46:40Z", "labels": {"gluster-s3": "storage-testvolume-meta-pvc", "glusterfs": "s3-storage-testvolume-storage"}, "name": "gluster-s3-storage-testvolume-meta-claim", "namespace": "app-storage", "resourceVersion": "5239", "selfLink": "/api/v1/namespaces/app-storage/persistentvolumeclaims/gluster-s3-storage-testvolume-meta-claim", "uid": "76784745-1de5-11e8-a48d-005056a509f7"}, "spec": {"accessModes": ["ReadWriteMany"], "resources": {"requests": {"storage": "1Gi"}}}, "status": {"phase": "Pending"}}], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"} [root@dhcp46-136 ~]# oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gluster-s3-storage-testvolume-claim Pending glusterfs-storage 16m gluster-s3-storage-testvolume-meta-claim Pending glusterfs-storage 16m [root@dhcp46-136 ~]# oc describe pvc/gluster-s3-storage-testvolume-claim Name: gluster-s3-storage-testvolume-claim Namespace: app-storage StorageClass: glusterfs-storage Status: Pending Volume: Labels: gluster-s3=storage-testvolume-pvc glusterfs=s3-storage-testvolume-storage Annotations: volume.beta.kubernetes.io/storage-class=glusterfs-storage volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/glusterfs Finalizers: [] Capacity: Access Modes: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 1m (x228 over 57m) persistentvolume-controller Failed to provision volume with StorageClass "glusterfs-storage": create volume error: error creating volume Post http://heketi-storage-app-storage.router.default.svc.cluster.local/volumes: dial tcp: lookup heketi-storage-app-storage.router.default.svc.cluster.local: no such host Version-Release number of selected component (if applicable): oc version oc v3.9.0-0.53.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp46-136.lab.eng.blr.redhat.com:8443 openshift v3.9.0-0.53.0 kubernetes v1.9.1+a0ce1bc657 How reproducible: 1/1 Steps to Reproduce: 1. Deploy OCP + CNS via ansible playbook 2. wait for the configuration to complete Actual results: S3 deployment fails Expected results: S3 deployment should succeed Additional info: ansible logs & inventory file will be attached shortly.
Created attachment 1402956 [details] ansible log
I don't see a router being configured. This could be the reason for the failure.
[root@dhcp46-136 ~]# oc project app-storage Now using project "app-storage" on server "https://dhcp46-136.lab.eng.blr.redhat.com:8443". [root@dhcp46-136 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterblock-storage-provisioner-dc-1-5wrfp 1/1 Running 0 2h 10.128.0.3 dhcp46-23.lab.eng.blr.redhat.com glusterfs-storage-kznzb 1/1 Running 0 2h 10.70.47.29 dhcp47-29.lab.eng.blr.redhat.com glusterfs-storage-nnz2q 1/1 Running 0 2h 10.70.47.176 dhcp47-176.lab.eng.blr.redhat.com glusterfs-storage-zfvjl 1/1 Running 0 2h 10.70.46.23 dhcp46-23.lab.eng.blr.redhat.com heketi-storage-1-7p5b8 1/1 Running 0 2h 10.131.0.3 dhcp47-29.lab.eng.blr.redhat.com [root@dhcp46-136 ~]# oc projects You have access to the following projects and can switch between them with 'oc project <projectname>': * app-storage default kube-public kube-system management-infra openshift openshift-infra openshift-node Using project "app-storage" on server "https://dhcp46-136.lab.eng.blr.redhat.com:8443". [root@dhcp46-136 ~]# "oc get all" bash: oc get all: command not found [root@dhcp46-136 ~]# oc get all NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE ds/glusterfs-storage 3 3 3 3 3 glusterfs=storage-host 2h NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfigs/glusterblock-storage-provisioner-dc 1 1 1 config deploymentconfigs/heketi-storage 1 1 1 config NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD routes/heketi-storage heketi-storage-app-storage.router.default.svc.cluster.local heketi-storage <all> None NAME READY STATUS RESTARTS AGE po/glusterblock-storage-provisioner-dc-1-5wrfp 1/1 Running 0 2h po/glusterfs-storage-kznzb 1/1 Running 0 2h po/glusterfs-storage-nnz2q 1/1 Running 0 2h po/glusterfs-storage-zfvjl 1/1 Running 0 2h po/heketi-storage-1-7p5b8 1/1 Running 0 2h NAME DESIRED CURRENT READY AGE rc/glusterblock-storage-provisioner-dc-1 1 1 1 2h rc/heketi-storage-1 1 1 1 2h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/heketi-db-storage-endpoints ClusterIP 172.30.191.16 <none> 1/TCP 2h svc/heketi-storage ClusterIP 172.30.72.74 <none> 8080/TCP 2h
Moving this to the right project/component. Also, it should be fixed in the latest 3.9/3.10 openshift-ansible.
S3 installation got completed successfully with the following ocp build. oc version oc v3.9.30 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://dhcp46-227.lab.eng.blr.redhat.com:8443 openshift v3.9.30 kubernetes v1.9.1+a0ce1bc657 oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE gluster-s3-storage-testvolume-dc-1-wv4cd 1/1 Running 0 6m 10.129.0.4 dhcp46-79.lab.eng.blr.redhat.com glusterblock-storage-provisioner-dc-1-wcl86 1/1 Running 0 6m 10.130.0.4 dhcp47-4.lab.eng.blr.redhat.com glusterfs-storage-c572f 1/1 Running 0 19m 10.70.47.4 dhcp47-4.lab.eng.blr.redhat.com glusterfs-storage-lb4tw 1/1 Running 0 19m 10.70.46.79 dhcp46-79.lab.eng.blr.redhat.com glusterfs-storage-mc4dj 1/1 Running 0 19m 10.70.47.130 dhcp47-130.lab.eng.blr.redhat.com heketi-storage-1-2rxg8 1/1 Running 0 9m 10.129.0.3 dhcp46-79.lab.eng.blr.redhat.com Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1796