Description of problem: Docker registry didn't create during installation. Version-Release number of selected component (if applicable): openshift-ansible-3.5.17-1.git.0.561702e How reproducible: 100% Steps to Reproduce: 1.ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 2. 3. Actual results: Docker registry didn't create. Expected results: Docker registry can be created during installation. Additional info: # oc get all NAME DOCKER REPO TAGS UPDATED is/registry-console 172.30.131.16:5000/default/registry-console 3.5 2 minutes ago NAME REVISION DESIRED CURRENT TRIGGERED BY dc/registry-console 1 1 1 config dc/router 1 1 1 config NAME DESIRED CURRENT READY AGE rc/registry-console-1 1 1 1 2m rc/router-1 1 1 1 4m NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD routes/docker-registry docker-registry-default.0228-nqx.qe.rhcloud.com docker-registry <all> passthrough None routes/registry-console registry-console-default.0228-nqx.qe.rhcloud.com registry-console <all> passthrough None NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE svc/docker-registry 172.30.131.16 <none> 5000/TCP 3m svc/kubernetes 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 15m svc/registry-console 172.30.91.102 <none> 9000/TCP 2m svc/router 172.30.125.64 <none> 80/TCP,443/TCP,1936/TCP 4m NAME READY STATUS RESTARTS AGE po/registry-console-1-bg32h 1/1 Running 0 2m po/router-1-1tflp 1/1 Running 0 3m
Should be resolved by: https://github.com/openshift/openshift-ansible/pull/3493
We are not all the way fixed yet. I installed a cluster off of HEAD/master and docker-registry is now started after installation which is good. However, all s2i pushes to the registry are still failing with: Pushing image 172.27.150.13:5000/default/cakephp-mysql-example:latest ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: Get https://172.27.150.13:5000/v1/_ping: dial tcp 172.27.150.13:5000: getsockopt: connection refused oc get svc shows the builder pod is trying to use the right address: root@ip-172-31-18-225: ~ # oc get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE cakephp-mysql-example 172.25.121.91 <none> 8080/TCP 9m docker-registry 172.27.150.13 <none> 5000/TCP 14m kubernetes 172.24.0.1 <none> 443/TCP,53/UDP,53/TCP 27m mysql 172.25.245.13 <none> 3306/TCP 9m registry-console 172.24.157.170 <none> 9000/TCP 14m router 172.25.112.5 <none> 80/TCP,443/TCP,1936/TCP 15m But, curl to that svc is failing, just like the push. Something is still not plumbed through all the way. Let me know if you prefer a new bz for this. Reproducer: 3.5.0.37 with openshift-ansible HEAD/master as of 9:30PM today 1. Install cluster with openshift_registry_selector="region=infra,zone=default" openshift_hosted_registry_storage_kind=object openshift_hosted_registry_storage_provider=s3 openshift_hosted_registry_storage_s3_accesskey=key openshift_hosted_registry_storage_s3_secretkey=secretkey openshift_hosted_registry_storage_s3_bucket=aoe-svt-test openshift_hosted_registry_storage_s3_region=us-west-2 openshift_hosted_registry_replicas=1 2. Verify docker-registry up and running oot@ip-172-31-18-225: ~ # oc get pods NAME READY STATUS RESTARTS AGE cakephp-mysql-example-1-build 0/1 Error 0 15m docker-registry-1-xjqk9 1/1 Running 0 20m mysql-1-s34gz 1/1 Running 0 15m registry-console-1-5808t 1/1 Running 0 19m router-1-1jjll 1/1 Running 0 21m 3. oc new-app --template cakephp-mysql-example Build ends with: OK (1 test, 1 assertion) Pushing image 172.27.150.13:5000/default/cakephp-mysql-example:latest ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Warning: Push failed, retrying in 5s ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: Get https://172.27.150.13:5000/v1/_ping: dial tcp 172.27.150.13:5000: getsockopt: connection refused
Retest this bug with openshift-ansible-3.5.20-1.git.0.5a5fcd5.el7.noarch, still failed just like comment 5. Dig more to find out the root cause: 1. In inventory host file, have the following lines for docker-registry: openshift_hosted_registry_selector="role=node,registry=enabled" 2. After installation, docker-registry is created, but its service endpoint is not available. 3. Find out the docker-registry's svc endpoint # oc get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry 172.30.136.182 <none> 5000/TCP 1h 4. Try to access it, fail. # curl 172.30.136.182:5000 curl: (7) Failed connect to 172.30.136.182:5000; Connection refused 5. Access docker-registry pod's endpoint, succeed. # oc describe po docker-registry-1-vct81|grep IP IP: 10.129.0.4 # curl 10.129.0.4:5000 6. Dump docker-registry svc yaml. # oc get svc docker-registry -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: 2017-03-02T04:24:28Z name: docker-registry namespace: default resourceVersion: "1044" selfLink: /api/v1/namespaces/default/services/docker-registry uid: 203b1970-ff00-11e6-b00c-0e38ccef18ea spec: clusterIP: 172.30.136.182 ports: - name: 5000-tcp port: 5000 protocol: TCP targetPort: 5000 selector: registry: enabled role: node sessionAffinity: ClientIP type: ClusterIP status: loadBalancer: {} 7. Note that in the docker-registry's service spec, it is using "registry=enabled,role=node" as selector, but the setting is actually used for docker-registry nodeSelector, but not svc selector. That means when accessing docker-registry svc endpoint, the traffic can not be redirected to correct pod. That is why "Connection refused" is seen. 8. Workaround: # oc get po docker-registry-1-vct81 --show-labels NAME READY STATUS RESTARTS AGE LABELS docker-registry-1-vct81 1/1 Running 0 2h deployment=docker-registry-1,deploymentconfig=docker-registry,docker-registry=default seen from the above output, get pod's labels, and modify svc's selector to use the correct one. # oc edit svc docker-registry -n default modifying the following line: selector: registry: enabled role: node to selector: docker-registry: default # curl 172.30.72.45:5000 Now docker-registry's svc endpoint is available.
Following up the workaround mentioned in comment 6: After correct svc's selector, trigger sti build, failed with the following message. <--snip--> OK (1 test, 1 assertion) Pushing image 172.30.136.182:5000/install-test/cakephp-mysql-example:latest ... Pushed 0/5 layers, 2% complete Pushed 1/5 layers, 23% complete Pushed 2/5 layers, 62% complete Pushed 3/5 layers, 68% complete Pushed 3/5 layers, 86% complete Pushed 4/5 layers, 96% complete Pushed 5/5 layers, 100% complete Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: received unexpected HTTP status: 500 Internal Server Error <--snip--> Check docker-registry logs, get the following log: <--snp--> time="2017-03-02T07:21:03.054265315Z" level=error msg="error creating ImageStreamMapping: User \"system:serviceaccount:default:registry\" cannot create imagestreammappings in project \"install-test\"" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.136.182:5000" http.request.id=5ca78429-4ef7-4cd2-afb2-0ad34019ffdc http.request.method=PUT http.request.remoteaddr="10.129.0.1:35258" http.request.uri="/v2/install-test/cakephp-mysql-example/manifests/latest" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-514.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=f5280643-d67f-46ff-82a4-a68a47930188 openshift.auth.user="system:serviceaccount:install-test:builder" openshift.logger=registry vars.name="install-test/cakephp-mysql-example" vars.reference=latest time="2017-03-02T07:21:03.054461586Z" level=error msg="response completed with error" err.code=unknown err.detail="User \"system:serviceaccount:default:registry\" cannot create imagestreammappings in project \"install-test\"" err.message="unknown error" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.136.182:5000" http.request.id=5ca78429-4ef7-4cd2-afb2-0ad34019ffdc http.request.method=PUT http.request.remoteaddr="10.129.0.1:35258" http.request.uri="/v2/install-test/cakephp-mysql-example/manifests/latest" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-514.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" http.response.contenttype="application/json; charset=utf-8" http.response.duration=154.65229ms http.response.status=500 http.response.written=311 instance.id=f5280643-d67f-46ff-82a4-a68a47930188 openshift.auth.user="system:serviceaccount:install-test:builder" openshift.logger=registry vars.name="install-test/cakephp-mysql-example" vars.reference=latest <--snip--> Dig more, found "system:registry" role is not added to "system:serviceaccount:default:registry". # oc get clusterrolebinding|grep registry Add one more workaround step: # oadm policy add-cluster-role-to-user system:registry system:serviceaccount:default:registry cluster role "system:registry" added: "system:serviceaccount:default:registry" Re-trigger a new sti build, this time it succeed. After check installer log, found docker-registry is created from json file instead of "oadm registry" command. TASK [openshift_hosted : Create OpenShift registry] **************************** Thursday 02 March 2017 08:18:33 +0000 (0:00:00.156) 0:19:19.471 ******** changed: [ec2-54-159-73-196.compute-1.amazonaws.com] => { "changed": true, "results": { "results": [ { "cmd": "/usr/bin/oc create -f /tmp/deploymentconfigDEhV0Y -n default", "results": "", "returncode": 0 } ], "returncode": 0 }, "state": "present" } Personally I think it is better to continue to use "oadm registry" command to create docker-registry, because the command will create all the necessary resource, include docker-registry dc, rc, pod, svc, also serviceaccounts and clusterrolebinding automatically. If use json file to create docker-registry, that will need more maintenance work for the json file in future to align the change once the required resource for docker-registry does some change.
Verifying workarounds in comment 6 and comment 7 are good. Thanks. An end-to-end test for the fix for this would be to ensure that an s2i build with a registry push is successful after install.
Verified on 3.5.0.37 that the latest openshift-ansible with https://github.com/openshift/openshift-ansible/pull/3538 and https://github.com/openshift/openshift-ansible/pull/3547 have fixed the issue. The router is started after install and s2i builds work with no workarounds. Passing QA back to jialiu - I must have taken it when updating the bz last night. If you want me to be QA for this you can give it back to me.
Seem like the fix PR is already merged into openshift-ansible-3.5.22-1.git.0.8ef4cff.el7.noarch, everything is working well. Once this bug is moved to ON_QA, will verify it.
Verified with version openshift-ansible-3.5.23-1.git.0.1cd0089, installation succeed and STI build succeed.