Description of problem: When deploying OCP with Kuryr, it is not possible to push images to the local registry due to the app nodes not being able to resolve docker-registry.default.svc: error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 192.168.99.4:53: no such host How reproducible: Steps to Reproduce: 1. Deploy with kuryr enabled $oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default router-1-bq5rw 1/1 Running 0 30m 192.168.99.6 infra-node-0.openshift.example.com kube-system master-api-master-0.openshift.example.com 1/1 Running 0 40m 192.168.99.8 master-0.openshift.example.com kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 40m 192.168.99.8 master-0.openshift.example.com kube-system master-etcd-master-0.openshift.example.com 1/1 Running 0 40m 192.168.99.8 master-0.openshift.example.com openshift-infra kuryr-cni-ds-7qvnk 1/1 Running 0 42m 192.168.99.6 infra-node-0.openshift.example.com openshift-infra kuryr-cni-ds-dk229 1/1 Running 0 42m 192.168.99.4 app-node-0.openshift.example.com openshift-infra kuryr-cni-ds-lp8t7 1/1 Running 0 41m 192.168.99.8 master-0.openshift.example.com openshift-infra kuryr-controller-5fb8cdcc8c-ppr7w 1/1 Running 0 44m 192.168.99.6 infra-node-0.openshift.example.com openshift-node sync-j8g7r 1/1 Running 0 42m 192.168.99.6 infra-node-0.openshift.example.com openshift-node sync-lgtcc 1/1 Running 0 43m 192.168.99.8 master-0.openshift.example.com openshift-node sync-stcdn 1/1 Running 0 42m 192.168.99.4 app-node-0.openshift.example.com 2. Deploy the registry: $ sudo oadm registry --config=/etc/origin/master/admin.kubeconfig --service-account=registry DEPRECATED: The 'oadm' command is deprecated, please use 'oc adm' instead. --> Creating registry registry ... serviceaccount "registry" created clusterrolebinding "registry-registry-role" created deploymentconfig "docker-registry" created service "docker-registry" created --> Success 3. Wait until registry is deployed: $ oc get pods -o wide --watch NAME READY STATUS RESTARTS AGE IP NODE docker-registry-1-deploy 0/1 ContainerCreating 0 24s <none> app-node-0.openshift.example.com router-1-bq5rw 1/1 Running 0 49m 192.168.99.6 infra-node-0.openshift.example.com docker-registry-1-deploy 1/1 Running 0 1m 10.11.1.5 app-node-0.openshift.example.com docker-registry-1-hldsd 0/1 Pending 0 0s <none> <none> docker-registry-1-hldsd 0/1 Pending 0 0s <none> app-node-0.openshift.example.com docker-registry-1-hldsd 0/1 ContainerCreating 0 0s <none> app-node-0.openshift.example.com docker-registry-1-hldsd 0/1 ContainerCreating 0 1s <none> app-node-0.openshift.example.com docker-registry-1-hldsd 0/1 ContainerCreating 0 2s <none> app-node-0.openshift.example.com docker-registry-1-hldsd 0/1 Running 0 14s 10.11.1.12 app-node-0.openshift.example.com docker-registry-1-hldsd 1/1 Running 0 16s 10.11.1.12 app-node-0.openshift.example.com docker-registry-1-deploy 0/1 Completed 0 1m 10.11.1.5 app-node-0.openshift.example.com docker-registry-1-deploy 0/1 Terminating 0 1m 10.11.1.5 app-node-0.openshift.example.com docker-registry-1-deploy 0/1 Terminating 0 1m 10.11.1.5 app-node-0.openshift.example.com $ oc get all NAME READY STATUS RESTARTS AGE pod/docker-registry-1-hldsd 1/1 Running 0 21m pod/router-1-bq5rw 1/1 Running 0 1h NAME DESIRED CURRENT READY AGE replicationcontroller/docker-registry-1 1 1 1 22m replicationcontroller/router-1 1 1 1 1h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/docker-registry ClusterIP 172.30.248.24 <none> 5000/TCP 22m service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 1h service/router ClusterIP 172.30.169.37 <none> 80/TCP,443/TCP,1936/TCP 1h NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/docker-registry 1 1 1 config deploymentconfig.apps.openshift.io/router 1 1 1 config 4. Create a new project and deploy the sample application that will try to push a new image to the registry: $ oc new-project test $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git 5. Check the process: $ oc logs -f bc/ruby-ex Cloning "https://github.com/openshift/ruby-ex.git" ... Commit: bbb670185b6ce67294cc461ae9c18710e6f26089 (Merge pull request #18 from durandom/master) Author: Ben Parees <bparees.github.com> Date: Thu Dec 7 14:53:36 2017 -0500 ---> Installing application source ... ---> Building your Ruby application from source ... ---> Running 'bundle install --retry 2 --deployment --without development:test' ... Fetching gem metadata from https://rubygems.org/............... Installing puma 3.10.0 Installing rack 2.0.3 Using bundler 1.7.8 Your bundle is complete! Gems in the groups development and test were not installed. It was installed into ./bundle ---> Cleaning up unused ruby gems ... Pushing image docker-registry.default.svc:5000/test/ruby-ex:latest ... Registry server Address: Registry server User Name: serviceaccount Registry server Email: serviceaccount Registry server Password: <<non-empty>> error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 192.168.99.4:53: no such host Actual results: Failed to push image to the registry, and thus failed build: $ oc get pods NAME READY STATUS RESTARTS AGE ruby-ex-1-build 0/1 Error 0 7m Expected results: Successful push of the new image to the registry and deployment of the app.
After some investigation (thanks to Antoni) we saw on kuryr deployment there is no process listening on the VM nodes on port 53 127.0.0.1, which is there on openshift-sdn based one: tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 17516/openshift root 17516 17500 0 103481 61720 0 10:21 ? 00:00:25 openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=2 It seems that after the change that remove the proxy and dns by default by expecting SDN plugins to run containerized, we stop having to disable the proxy as it was already disabled, but we forgot to enable the dns.
Solution seems to be to do something similar to what is done for openshift-sdn role: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_sdn/files/sdn.yaml#L105 but enabling just the SkyDNS: exec openshift start network --enable=dns --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=${DEBUG_LOGLEVEL:-2}
Should be in openshift-ansible-3.10.28-1
Verified in openshift-ansible-3.10.50-1.git.0.96a93c5.el7.noarch. Verification steps: 1. Deploy OCP 3.10 on OSP 3.10, with kuryr enabled $ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default docker-registry-1-j9q8p 1/1 Running 0 2h 10.11.0.11 infra-node-0.openshift.example.com default registry-console-1-hqrx4 1/1 Running 0 2h 10.11.0.3 master-0.openshift.example.com default router-1-rpjg7 1/1 Running 0 2h 192.168.99.5 infra-node-0.openshift.example.com kube-system master-api-master-0.openshift.example.com 1/1 Running 0 2h 192.168.99.15 master-0.openshift.example.com kube-system master-controllers-master-0.openshift.example.com 1/1 Running 1 2h 192.168.99.15 master-0.openshift.example.com kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 2h 192.168.99.15 master-0.openshift.example.com openshift-infra kuryr-cni-ds-9xs42 2/2 Running 0 2h 192.168.99.5 infra-node-0.openshift.example.com openshift-infra kuryr-cni-ds-k9b6c 2/2 Running 0 2h 192.168.99.10 app-node-0.openshift.example.com openshift-infra kuryr-cni-ds-nw82s 2/2 Running 0 2h 192.168.99.15 master-0.openshift.example.com openshift-infra kuryr-cni-ds-znwrt 2/2 Running 0 2h 192.168.99.4 app-node-1.openshift.example.com openshift-infra kuryr-controller-59fc7f478b-dvvvm 1/1 Running 0 2h 192.168.99.4 app-node-1.openshift.example.com openshift-node sync-fpmst 1/1 Running 0 2h 192.168.99.15 master-0.openshift.example.com openshift-node sync-qzzvp 1/1 Running 0 2h 192.168.99.5 infra-node-0.openshift.example.com openshift-node sync-s7xzt 1/1 Running 0 2h 192.168.99.4 app-node-1.openshift.example.com openshift-node sync-zmqbh 1/1 Running 0 2h 192.168.99.10 app-node-0.openshift.example.com $ oc get all NAME READY STATUS RESTARTS AGE pod/docker-registry-1-j9q8p 1/1 Running 0 2h pod/registry-console-1-hqrx4 1/1 Running 0 2h pod/router-1-rpjg7 1/1 Running 0 2h NAME DESIRED CURRENT READY AGE replicationcontroller/docker-registry-1 1 1 1 2h replicationcontroller/registry-console-1 1 1 1 2h replicationcontroller/router-1 1 1 1 2h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/docker-registry ClusterIP 172.30.155.8 <none> 5000/TCP 2h service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP,53/UDP,53/TCP 2h service/registry-console ClusterIP 172.30.217.70 <none> 9000/TCP 2h service/router ClusterIP 172.30.152.34 <none> 80/TCP,443/TCP,1936/TCP 2h NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/docker-registry 1 1 1 config deploymentconfig.apps.openshift.io/registry-console 1 1 1 config deploymentconfig.apps.openshift.io/router 1 1 1 config NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/docker-registry docker-registry-default.apps.openshift.example.com docker-registry <all> passthrough None route.route.openshift.io/registry-console registry-console-default.apps.openshift.example.com registry-console <all> passthrough None 2. Create a new project and deploy the sample application that will try to push a new image to the registry: $ oc new-project test $ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git 3. Check the process: $ oc logs -f bc/ruby-ex Cloning "https://github.com/openshift/ruby-ex.git" ... Commit: fa07571e8bbaa408126c4a197980076d90c1bc47 (Merge pull request #22 from jankleinert/readme-updates) Author: Ben Parees <bparees.github.com> Date: Fri Sep 7 15:23:15 2018 -0400 ---> Installing application source ... ---> Building your Ruby application from source ... ... Installing puma 3.10.0 Installing rack 2.0.3 Using bundler 1.7.8 Your bundle is complete! Gems in the groups development and test were not installed. It was installed into ./bundle ---> Cleaning up unused ruby gems ... Pushing image docker-registry.default.svc:5000/test/ruby-ex:latest ... Pushed 0/10 layers, 13% complete Pushed 1/10 layers, 19% complete Pushed 2/10 layers, 36% complete Pushed 3/10 layers, 41% complete Pushed 4/10 layers, 46% complete Pushed 5/10 layers, 55% complete Pushed 6/10 layers, 66% complete Pushed 7/10 layers, 74% complete Pushed 8/10 layers, 82% complete Pushed 9/10 layers, 100% complete Pushed 10/10 layers, 100% complete Push successful The image is pushed successfully. $ oc get all -o wide NAME READY STATUS RESTARTS AGE IP NODE pod/ruby-ex-1-6g2qs 1/1 Running 0 24m 10.11.0.28 app-node-1.openshift.example.com pod/ruby-ex-1-build 0/1 Completed 0 29m 10.11.0.7 app-node-0.openshift.example.com NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicationcontroller/ruby-ex-1 1 1 1 24m ruby-ex docker-registry.default.svc:5000/test/ruby-ex@sha256:2e3ac075e9975fbc9128fe16975da030653f05e05650ffa6f3b93fea03975145 app=ruby-ex,deployment=ruby-ex-1,deploymentconfig=ruby-ex NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/ruby-ex ClusterIP 172.30.172.155 <none> 8080/TCP 29m app=ruby-ex,deploymentconfig=ruby-ex NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/ruby-ex 1 1 1 config,image(ruby-ex:latest) NAME TYPE FROM LATEST buildconfig.build.openshift.io/ruby-ex Source Git 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/ruby-ex-1 Source Git@fa07571 Complete 29 minutes ago 4m48s NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/ruby-22-centos7 docker-registry.default.svc:5000/test/ruby-22-centos7 latest 29 minutes ago imagestream.image.openshift.io/ruby-ex docker-registry.default.svc:5000/test/ruby-ex latest 24 minutes ago 4. Check the deployed app and image: $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE ruby-ex-1-6g2qs 1/1 Running 0 35m 10.11.0.28 app-node-1.openshift.example.com ruby-ex-1-build 0/1 Completed 0 40m 10.11.0.7 app-node-0.openshift.example.com In app-node-1.openshift.example.com: $ sudo docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker-registry.default.svc:5000/test/ruby-ex <none> 8a417147c48a 19 minutes ago 568 MB registry.reg-aws.openshift.com:443/openshift3/ose-node v3.10 ccaabbeb169b 3 days ago 1.27 GB registry.reg-aws.openshift.com:443/openshift3/ose-pod v3.10 ac24c586c79b 3 days ago 214 MB registry.reg-aws.openshift.com:443/openshift3/ose-pod v3.10.50 ac24c586c79b 3 days ago 214 MB docker-registry.engineering.redhat.com/rhosp13/openstack-kuryr-cni latest 200e053f01d8 3 weeks ago 388 MB docker-registry.engineering.redhat.com/rhosp13/openstack-kuryr-controller latest 95371e0317f5 3 weeks ago 354 MB 5. Delete the project: $ oc delete project test
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0026