Bug 1594171 - Registry not working when deploying OCP with Kuryr
Summary: Registry not working when deploying OCP with Kuryr
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Luis Tomas Bolivar
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-22 09:59 UTC by Luis Tomas Bolivar
Modified: 2019-01-10 09:27 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
After the change that removed the proxy and dns consiguration by default in favor of expecting SDN plugins to run containerized, SkyDNS was not enabled when running kuryr. As a result it was not posible to push images to the registry due to not being able to resolve the host names. The solution is using a similar approach to what is done for openshift-sdn role, but enabling just the SkyDNS: exec openshift start network --enable=dns --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=${DEBUG_LOGLEVEL:-2} This enables successful pushes of new images to the registry
Clone Of:
Environment:
Last Closed: 2019-01-10 09:27:09 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0026 None None None 2019-01-10 09:27:16 UTC
Github openshift openshift-ansible pull 8919 None None None 2018-06-22 15:01:48 UTC
Github openshift openshift-ansible pull 8995 None None None 2018-06-27 11:18:59 UTC

Description Luis Tomas Bolivar 2018-06-22 09:59:21 UTC
Description of problem:
When deploying OCP with Kuryr, it is not possible to push images to the local registry due to the app nodes not being able to resolve docker-registry.default.svc:

error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 192.168.99.4:53: no such host

How reproducible:

Steps to Reproduce:
1. Deploy with kuryr enabled
$oc get pods --all-namespaces -o wide
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE       IP             NODE
default           router-1-bq5rw                                      1/1       Running   0          30m       192.168.99.6   infra-node-0.openshift.example.com
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          40m       192.168.99.8   master-0.openshift.example.com
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   0          40m       192.168.99.8   master-0.openshift.example.com
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   0          40m       192.168.99.8   master-0.openshift.example.com
openshift-infra   kuryr-cni-ds-7qvnk                                  1/1       Running   0          42m       192.168.99.6   infra-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-dk229                                  1/1       Running   0          42m       192.168.99.4   app-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-lp8t7                                  1/1       Running   0          41m       192.168.99.8   master-0.openshift.example.com
openshift-infra   kuryr-controller-5fb8cdcc8c-ppr7w                   1/1       Running   0          44m       192.168.99.6   infra-node-0.openshift.example.com
openshift-node    sync-j8g7r                                          1/1       Running   0          42m       192.168.99.6   infra-node-0.openshift.example.com
openshift-node    sync-lgtcc                                          1/1       Running   0          43m       192.168.99.8   master-0.openshift.example.com
openshift-node    sync-stcdn                                          1/1       Running   0          42m       192.168.99.4   app-node-0.openshift.example.com

2. Deploy the registry:
$ sudo oadm registry --config=/etc/origin/master/admin.kubeconfig --service-account=registry
DEPRECATED: The 'oadm' command is deprecated, please use 'oc adm' instead.
--> Creating registry registry ...
    serviceaccount "registry" created
    clusterrolebinding "registry-registry-role" created
    deploymentconfig "docker-registry" created
    service "docker-registry" created
--> Success

3. Wait until registry is deployed:
$ oc get pods -o wide --watch
NAME                       READY     STATUS              RESTARTS   AGE       IP             NODE
docker-registry-1-deploy   0/1       ContainerCreating   0          24s       <none>         app-node-0.openshift.example.com
router-1-bq5rw             1/1       Running             0          49m       192.168.99.6   infra-node-0.openshift.example.com
docker-registry-1-deploy   1/1       Running   0         1m        10.11.1.5   app-node-0.openshift.example.com
docker-registry-1-hldsd   0/1       Pending   0         0s        <none>    <none>
docker-registry-1-hldsd   0/1       Pending   0         0s        <none>    app-node-0.openshift.example.com
docker-registry-1-hldsd   0/1       ContainerCreating   0         0s        <none>    app-node-0.openshift.example.com
docker-registry-1-hldsd   0/1       ContainerCreating   0         1s        <none>    app-node-0.openshift.example.com
docker-registry-1-hldsd   0/1       ContainerCreating   0         2s        <none>    app-node-0.openshift.example.com
docker-registry-1-hldsd   0/1       Running   0         14s       10.11.1.12   app-node-0.openshift.example.com
docker-registry-1-hldsd   1/1       Running   0         16s       10.11.1.12   app-node-0.openshift.example.com
docker-registry-1-deploy   0/1       Completed   0         1m        10.11.1.5   app-node-0.openshift.example.com
docker-registry-1-deploy   0/1       Terminating   0         1m        10.11.1.5   app-node-0.openshift.example.com
docker-registry-1-deploy   0/1       Terminating   0         1m        10.11.1.5   app-node-0.openshift.example.com

$ oc get all
NAME                          READY     STATUS    RESTARTS   AGE
pod/docker-registry-1-hldsd   1/1       Running   0          21m
pod/router-1-bq5rw            1/1       Running   0          1h

NAME                                      DESIRED   CURRENT   READY     AGE
replicationcontroller/docker-registry-1   1         1         1         22m
replicationcontroller/router-1            1         1         1         1h

NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
service/docker-registry   ClusterIP   172.30.248.24   <none>        5000/TCP                  22m
service/kubernetes        ClusterIP   172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     1h
service/router            ClusterIP   172.30.169.37   <none>        80/TCP,443/TCP,1936/TCP   1h

NAME                                                 REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfig.apps.openshift.io/docker-registry   1          1         1         config
deploymentconfig.apps.openshift.io/router            1          1         1         config


4. Create a new project and deploy the sample application that will try to push a new image to the registry:
$ oc new-project test
$ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

5. Check the process:
$ oc logs -f bc/ruby-ex
Cloning "https://github.com/openshift/ruby-ex.git" ...
        Commit: bbb670185b6ce67294cc461ae9c18710e6f26089 (Merge pull request #18 from durandom/master)
        Author: Ben Parees <bparees@users.noreply.github.com>
        Date:   Thu Dec 7 14:53:36 2017 -0500
---> Installing application source ...
---> Building your Ruby application from source ...
---> Running 'bundle install --retry 2 --deployment --without development:test' ...
Fetching gem metadata from https://rubygems.org/...............
Installing puma 3.10.0
Installing rack 2.0.3
Using bundler 1.7.8
Your bundle is complete!
Gems in the groups development and test were not installed.
It was installed into ./bundle
---> Cleaning up unused ruby gems ...

Pushing image docker-registry.default.svc:5000/test/ruby-ex:latest ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount@example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 192.168.99.4:53: no such host



Actual results:
Failed to push image to the registry, and thus failed build:
$ oc get pods
NAME              READY     STATUS    RESTARTS   AGE
ruby-ex-1-build   0/1       Error     0          7m

Expected results:
Successful push of the new image to the registry and deployment of the app.

Comment 1 Luis Tomas Bolivar 2018-06-22 14:26:22 UTC
After some investigation (thanks to Antoni) we saw on kuryr deployment there is no process listening on the VM nodes on port 53 127.0.0.1, which is there on openshift-sdn based one:

tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      17516/openshift 

root     17516 17500  0 103481 61720  0 10:21 ?        00:00:25 openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=2

It seems that after the change that remove the proxy and dns by default by expecting SDN plugins to run containerized, we stop having to disable the proxy as it was already disabled, but we forgot to enable the dns.

Comment 2 Luis Tomas Bolivar 2018-06-22 14:48:19 UTC
Solution seems to be to do something similar to what is done for openshift-sdn role:

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_sdn/files/sdn.yaml#L105

but enabling just the SkyDNS:
exec openshift start network --enable=dns --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=${DEBUG_LOGLEVEL:-2}

Comment 3 Scott Dodson 2018-08-14 21:40:00 UTC
Should be in openshift-ansible-3.10.28-1

Comment 4 Jon Uriarte 2018-09-24 15:39:16 UTC
Verified in openshift-ansible-3.10.50-1.git.0.96a93c5.el7.noarch.

Verification steps:

1. Deploy OCP 3.10 on OSP 3.10, with kuryr enabled

$ oc get pods --all-namespaces -o wide
NAMESPACE         NAME                                                READY     STATUS    RESTARTS   AGE       IP              NODE
default           docker-registry-1-j9q8p                             1/1       Running   0          2h        10.11.0.11      infra-node-0.openshift.example.com
default           registry-console-1-hqrx4                            1/1       Running   0          2h        10.11.0.3       master-0.openshift.example.com
default           router-1-rpjg7                                      1/1       Running   0          2h        192.168.99.5    infra-node-0.openshift.example.com
kube-system       master-api-master-0.openshift.example.com           1/1       Running   0          2h        192.168.99.15   master-0.openshift.example.com
kube-system       master-controllers-master-0.openshift.example.com   1/1       Running   1          2h        192.168.99.15   master-0.openshift.example.com
kube-system       master-etcd-master-0.openshift.example.com          1/1       Running   1          2h        192.168.99.15   master-0.openshift.example.com
openshift-infra   kuryr-cni-ds-9xs42                                  2/2       Running   0          2h        192.168.99.5    infra-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-k9b6c                                  2/2       Running   0          2h        192.168.99.10   app-node-0.openshift.example.com
openshift-infra   kuryr-cni-ds-nw82s                                  2/2       Running   0          2h        192.168.99.15   master-0.openshift.example.com
openshift-infra   kuryr-cni-ds-znwrt                                  2/2       Running   0          2h        192.168.99.4    app-node-1.openshift.example.com
openshift-infra   kuryr-controller-59fc7f478b-dvvvm                   1/1       Running   0          2h        192.168.99.4    app-node-1.openshift.example.com
openshift-node    sync-fpmst                                          1/1       Running   0          2h        192.168.99.15   master-0.openshift.example.com
openshift-node    sync-qzzvp                                          1/1       Running   0          2h        192.168.99.5    infra-node-0.openshift.example.com
openshift-node    sync-s7xzt                                          1/1       Running   0          2h        192.168.99.4    app-node-1.openshift.example.com
openshift-node    sync-zmqbh                                          1/1       Running   0          2h        192.168.99.10   app-node-0.openshift.example.com

$ oc get all
NAME                           READY     STATUS    RESTARTS   AGE
pod/docker-registry-1-j9q8p    1/1       Running   0          2h
pod/registry-console-1-hqrx4   1/1       Running   0          2h
pod/router-1-rpjg7             1/1       Running   0          2h

NAME                                       DESIRED   CURRENT   READY     AGE
replicationcontroller/docker-registry-1    1         1         1         2h
replicationcontroller/registry-console-1   1         1         1         2h
replicationcontroller/router-1             1         1         1         2h

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
service/docker-registry    ClusterIP   172.30.155.8    <none>        5000/TCP                  2h
service/kubernetes         ClusterIP   172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     2h
service/registry-console   ClusterIP   172.30.217.70   <none>        9000/TCP                  2h
service/router             ClusterIP   172.30.152.34   <none>        80/TCP,443/TCP,1936/TCP   2h

NAME                                                  REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfig.apps.openshift.io/docker-registry    1          1         1         config
deploymentconfig.apps.openshift.io/registry-console   1          1         1         config
deploymentconfig.apps.openshift.io/router             1          1         1         config

NAME                                        HOST/PORT                                             PATH      SERVICES           PORT      TERMINATION   WILDCARD
route.route.openshift.io/docker-registry    docker-registry-default.apps.openshift.example.com              docker-registry    <all>     passthrough   None
route.route.openshift.io/registry-console   registry-console-default.apps.openshift.example.com             registry-console   <all>     passthrough   None


2. Create a new project and deploy the sample application that will try to push a new image to the registry:
$ oc new-project test
$ oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git


3. Check the process:
$ oc logs -f bc/ruby-ex
Cloning "https://github.com/openshift/ruby-ex.git" ...
        Commit: fa07571e8bbaa408126c4a197980076d90c1bc47 (Merge pull request #22 from jankleinert/readme-updates)
        Author: Ben Parees <bparees@users.noreply.github.com>
        Date:   Fri Sep 7 15:23:15 2018 -0400
---> Installing application source ...
---> Building your Ruby application from source ...
...
Installing puma 3.10.0
Installing rack 2.0.3
Using bundler 1.7.8
Your bundle is complete!
Gems in the groups development and test were not installed.
It was installed into ./bundle
---> Cleaning up unused ruby gems ...

Pushing image docker-registry.default.svc:5000/test/ruby-ex:latest ...
Pushed 0/10 layers, 13% complete
Pushed 1/10 layers, 19% complete
Pushed 2/10 layers, 36% complete
Pushed 3/10 layers, 41% complete
Pushed 4/10 layers, 46% complete
Pushed 5/10 layers, 55% complete
Pushed 6/10 layers, 66% complete
Pushed 7/10 layers, 74% complete
Pushed 8/10 layers, 82% complete
Pushed 9/10 layers, 100% complete
Pushed 10/10 layers, 100% complete
Push successful

The image is pushed successfully.

$ oc get all -o wide
NAME                  READY     STATUS      RESTARTS   AGE       IP           NODE
pod/ruby-ex-1-6g2qs   1/1       Running     0          24m       10.11.0.28   app-node-1.openshift.example.com
pod/ruby-ex-1-build   0/1       Completed   0          29m       10.11.0.7    app-node-0.openshift.example.com

NAME                              DESIRED   CURRENT   READY     AGE       CONTAINERS   IMAGES                                                                                                                  SELECTOR
replicationcontroller/ruby-ex-1   1         1         1         24m       ruby-ex      docker-registry.default.svc:5000/test/ruby-ex@sha256:2e3ac075e9975fbc9128fe16975da030653f05e05650ffa6f3b93fea03975145   app=ruby-ex,deployment=ruby-ex-1,deploymentconfig=ruby-ex

NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE       SELECTOR
service/ruby-ex   ClusterIP   172.30.172.155   <none>        8080/TCP   29m       app=ruby-ex,deploymentconfig=ruby-ex

NAME                                         REVISION   DESIRED   CURRENT   TRIGGERED BY
deploymentconfig.apps.openshift.io/ruby-ex   1          1         1         config,image(ruby-ex:latest)

NAME                                     TYPE      FROM      LATEST
buildconfig.build.openshift.io/ruby-ex   Source    Git       1

NAME                                 TYPE      FROM          STATUS     STARTED          DURATION
build.build.openshift.io/ruby-ex-1   Source    Git@fa07571   Complete   29 minutes ago   4m48s

NAME                                             DOCKER REPO                                             TAGS      UPDATED
imagestream.image.openshift.io/ruby-22-centos7   docker-registry.default.svc:5000/test/ruby-22-centos7   latest    29 minutes ago
imagestream.image.openshift.io/ruby-ex           docker-registry.default.svc:5000/test/ruby-ex           latest    24 minutes ago

4. Check the deployed app and image:

$ oc get pods -o wide
NAME              READY     STATUS      RESTARTS   AGE       IP           NODE
ruby-ex-1-6g2qs   1/1       Running     0          35m       10.11.0.28   app-node-1.openshift.example.com
ruby-ex-1-build   0/1       Completed   0          40m       10.11.0.7    app-node-0.openshift.example.com

In app-node-1.openshift.example.com:
$ sudo docker images
REPOSITORY                                                                  TAG                 IMAGE ID            CREATED             SIZE
docker-registry.default.svc:5000/test/ruby-ex                               <none>              8a417147c48a        19 minutes ago      568 MB
registry.reg-aws.openshift.com:443/openshift3/ose-node                      v3.10               ccaabbeb169b        3 days ago          1.27 GB
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.10               ac24c586c79b        3 days ago          214 MB
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.10.50            ac24c586c79b        3 days ago          214 MB
docker-registry.engineering.redhat.com/rhosp13/openstack-kuryr-cni          latest              200e053f01d8        3 weeks ago         388 MB
docker-registry.engineering.redhat.com/rhosp13/openstack-kuryr-controller   latest              95371e0317f5        3 weeks ago         354 MB


5. Delete the project:
$ oc delete project test

Comment 6 errata-xmlrpc 2019-01-10 09:27:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0026


Note You need to log in before you can comment on or make changes to this bug.