Bug 1427378 - Docker registry was not created during installation
Summary: Docker registry was not created during installation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Russell Teague
QA Contact: Johnny Liu
URL:
Whiteboard: aos-scalability-35
Depends On:
Blocks: 1395168 1399388 1427040
TreeView+ depends on / blocked
 
Reported: 2017-02-28 03:23 UTC by Wenkai Shi
Modified: 2017-07-24 14:11 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-08 13:46:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 22:45:42 UTC

Description Wenkai Shi 2017-02-28 03:23:11 UTC
Description of problem:
Docker registry didn't create during installation.

Version-Release number of selected component (if applicable):
openshift-ansible-3.5.17-1.git.0.561702e

How reproducible:
100%

Steps to Reproduce:
1.ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
2.
3.

Actual results:
Docker registry didn't create.

Expected results:
Docker registry can be created during installation.

Additional info:
# oc get all
NAME                  DOCKER REPO                                   TAGS      UPDATED
is/registry-console   172.30.131.16:5000/default/registry-console   3.5       2 minutes ago

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/registry-console   1          1         1         config
dc/router             1          1         1         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/registry-console-1   1         1         1         2m
rc/router-1             1         1         1         4m

NAME                      HOST/PORT                                          PATH      SERVICES           PORT      TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.0228-nqx.qe.rhcloud.com              docker-registry    <all>     passthrough   None
routes/registry-console   registry-console-default.0228-nqx.qe.rhcloud.com             registry-console   <all>     passthrough   None

NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry    172.30.131.16   <none>        5000/TCP                  3m
svc/kubernetes         172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     15m
svc/registry-console   172.30.91.102   <none>        9000/TCP                  2m
svc/router             172.30.125.64   <none>        80/TCP,443/TCP,1936/TCP   4m

NAME                          READY     STATUS    RESTARTS   AGE
po/registry-console-1-bg32h   1/1       Running   0          2m
po/router-1-1tflp             1/1       Running   0          3m

Comment 3 Russell Teague 2017-03-01 21:44:09 UTC
Should be resolved by: https://github.com/openshift/openshift-ansible/pull/3493

Comment 5 Mike Fiedler 2017-03-02 03:22:19 UTC
We are not all the way fixed yet.  I installed a cluster off of HEAD/master and docker-registry is now started after installation which is good.

However, all s2i pushes to the registry are still failing with:

Pushing image 172.27.150.13:5000/default/cakephp-mysql-example:latest ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Get https://172.27.150.13:5000/v1/_ping: dial tcp 172.27.150.13:5000: getsockopt: connection refused

oc get svc shows the builder pod is trying to use the right address:

root@ip-172-31-18-225: ~ # oc get svc
NAME                    CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
cakephp-mysql-example   172.25.121.91    <none>        8080/TCP                  9m
docker-registry         172.27.150.13    <none>        5000/TCP                  14m
kubernetes              172.24.0.1       <none>        443/TCP,53/UDP,53/TCP     27m
mysql                   172.25.245.13    <none>        3306/TCP                  9m
registry-console        172.24.157.170   <none>        9000/TCP                  14m
router                  172.25.112.5     <none>        80/TCP,443/TCP,1936/TCP   15m

But, curl to that svc is failing, just like the push.

Something is still not plumbed through all the way.  Let me know if you prefer a new bz for this.

Reproducer:

3.5.0.37 with openshift-ansible HEAD/master as of 9:30PM today

1. Install cluster with 

openshift_registry_selector="region=infra,zone=default"
openshift_hosted_registry_storage_kind=object
openshift_hosted_registry_storage_provider=s3
openshift_hosted_registry_storage_s3_accesskey=key
openshift_hosted_registry_storage_s3_secretkey=secretkey
openshift_hosted_registry_storage_s3_bucket=aoe-svt-test
openshift_hosted_registry_storage_s3_region=us-west-2
openshift_hosted_registry_replicas=1

2.  Verify docker-registry up and running

oot@ip-172-31-18-225: ~ # oc get pods
NAME                            READY     STATUS    RESTARTS   AGE
cakephp-mysql-example-1-build   0/1       Error     0          15m
docker-registry-1-xjqk9         1/1       Running   0          20m
mysql-1-s34gz                   1/1       Running   0          15m
registry-console-1-5808t        1/1       Running   0          19m
router-1-1jjll                  1/1       Running   0          21m

3.  oc new-app --template cakephp-mysql-example

Build ends with:

OK (1 test, 1 assertion)
Pushing image 172.27.150.13:5000/default/cakephp-mysql-example:latest ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Warning: Push failed, retrying in 5s ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Get https://172.27.150.13:5000/v1/_ping: dial tcp 172.27.150.13:5000: getsockopt: connection refused

Comment 6 Johnny Liu 2017-03-02 07:05:44 UTC
Retest this bug with openshift-ansible-3.5.20-1.git.0.5a5fcd5.el7.noarch, still failed just like comment 5.

Dig more to find out the root cause:
1. In inventory host file, have the following lines for docker-registry:
openshift_hosted_registry_selector="role=node,registry=enabled"

2. After installation, docker-registry is created, but its service endpoint is not available.

3. Find out the docker-registry's svc endpoint
# oc get svc
NAME               CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
docker-registry    172.30.136.182   <none>        5000/TCP                  1h

4. Try to access it, fail.
# curl 172.30.136.182:5000
curl: (7) Failed connect to 172.30.136.182:5000; Connection refused

5. Access docker-registry pod's endpoint, succeed.
# oc describe po docker-registry-1-vct81|grep IP
IP:			10.129.0.4
# curl 10.129.0.4:5000


6. Dump docker-registry svc yaml.
# oc get svc docker-registry -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-03-02T04:24:28Z
  name: docker-registry
  namespace: default
  resourceVersion: "1044"
  selfLink: /api/v1/namespaces/default/services/docker-registry
  uid: 203b1970-ff00-11e6-b00c-0e38ccef18ea
spec:
  clusterIP: 172.30.136.182
  ports:
  - name: 5000-tcp
    port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    registry: enabled
    role: node
  sessionAffinity: ClientIP
  type: ClusterIP
status:
  loadBalancer: {}

7. Note that in the docker-registry's service spec, it is using "registry=enabled,role=node" as selector, but the setting is actually used for docker-registry nodeSelector, but not svc selector. That means when accessing docker-registry svc endpoint, the traffic can not be redirected to correct pod. That is why "Connection refused" is seen.

8. Workaround:
# oc get po docker-registry-1-vct81 --show-labels
NAME                      READY     STATUS    RESTARTS   AGE       LABELS
docker-registry-1-vct81   1/1       Running   0          2h        deployment=docker-registry-1,deploymentconfig=docker-registry,docker-registry=default

seen from the above output, get pod's labels, and modify svc's selector to use the correct one.

# oc edit svc docker-registry -n default
modifying the following line:
  selector:
    registry: enabled
    role: node
to
  selector:
    docker-registry: default


# curl 172.30.72.45:5000


Now docker-registry's svc endpoint is available.

Comment 7 Johnny Liu 2017-03-02 08:57:15 UTC
Following up the workaround mentioned in comment 6:
After correct svc's selector, trigger sti build, failed with the following message.
<--snip-->
OK (1 test, 1 assertion)
Pushing image 172.30.136.182:5000/install-test/cakephp-mysql-example:latest ...
Pushed 0/5 layers, 2% complete
Pushed 1/5 layers, 23% complete
Pushed 2/5 layers, 62% complete
Pushed 3/5 layers, 68% complete
Pushed 3/5 layers, 86% complete
Pushed 4/5 layers, 96% complete
Pushed 5/5 layers, 100% complete
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount
Registry server Password: <<non-empty>>
error: build error: Failed to push image: received unexpected HTTP status: 500 Internal Server Error
<--snip-->

Check docker-registry logs, get the following log:
<--snp-->
time="2017-03-02T07:21:03.054265315Z" level=error msg="error creating ImageStreamMapping: User \"system:serviceaccount:default:registry\" cannot create imagestreammappings in project \"install-test\"" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.136.182:5000" http.request.id=5ca78429-4ef7-4cd2-afb2-0ad34019ffdc http.request.method=PUT http.request.remoteaddr="10.129.0.1:35258" http.request.uri="/v2/install-test/cakephp-mysql-example/manifests/latest" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-514.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=f5280643-d67f-46ff-82a4-a68a47930188 openshift.auth.user="system:serviceaccount:install-test:builder" openshift.logger=registry vars.name="install-test/cakephp-mysql-example" vars.reference=latest 
time="2017-03-02T07:21:03.054461586Z" level=error msg="response completed with error" err.code=unknown err.detail="User \"system:serviceaccount:default:registry\" cannot create imagestreammappings in project \"install-test\"" err.message="unknown error" go.version=go1.7.4 http.request.contenttype="application/vnd.docker.distribution.manifest.v1+prettyjws" http.request.host="172.30.136.182:5000" http.request.id=5ca78429-4ef7-4cd2-afb2-0ad34019ffdc http.request.method=PUT http.request.remoteaddr="10.129.0.1:35258" http.request.uri="/v2/install-test/cakephp-mysql-example/manifests/latest" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-514.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" http.response.contenttype="application/json; charset=utf-8" http.response.duration=154.65229ms http.response.status=500 http.response.written=311 instance.id=f5280643-d67f-46ff-82a4-a68a47930188 openshift.auth.user="system:serviceaccount:install-test:builder" openshift.logger=registry vars.name="install-test/cakephp-mysql-example" vars.reference=latest
<--snip-->

Dig more, found "system:registry" role is not added to "system:serviceaccount:default:registry".

# oc get clusterrolebinding|grep registry

Add one more workaround step:
# oadm policy add-cluster-role-to-user system:registry system:serviceaccount:default:registry
cluster role "system:registry" added: "system:serviceaccount:default:registry"

Re-trigger a new sti build, this time it succeed.



After check installer log, found docker-registry is created from json file instead of "oadm registry" command.
TASK [openshift_hosted : Create OpenShift registry] ****************************
Thursday 02 March 2017  08:18:33 +0000 (0:00:00.156)       0:19:19.471 ******** 

changed: [ec2-54-159-73-196.compute-1.amazonaws.com] => {
    "changed": true, 
    "results": {
        "results": [
            {
                "cmd": "/usr/bin/oc create -f /tmp/deploymentconfigDEhV0Y -n default", 
                "results": "", 
                "returncode": 0
            }
        ], 
        "returncode": 0
    }, 
    "state": "present"
}

Personally I think it is better to continue to use "oadm registry" command to create docker-registry, because the command will create all the necessary resource, include docker-registry dc, rc, pod, svc, also serviceaccounts and clusterrolebinding automatically. If use json file to create docker-registry, that will need more maintenance work for the json file in future to align the change once the required resource for docker-registry does some change.

Comment 8 Mike Fiedler 2017-03-02 13:37:29 UTC
Verifying workarounds in comment 6 and comment 7 are good.   Thanks.

An end-to-end test for the fix for this would be to ensure that an s2i build with a registry push is successful after install.

Comment 9 Mike Fiedler 2017-03-03 00:47:11 UTC
Verified on 3.5.0.37 that the latest openshift-ansible with https://github.com/openshift/openshift-ansible/pull/3538 and https://github.com/openshift/openshift-ansible/pull/3547 have fixed the issue.   The router is started after install and s2i builds work with no workarounds.

Passing QA back to jialiu - I must have taken it when updating the bz last night.   If you want me to be QA for this you can give it back to me.

Comment 10 Johnny Liu 2017-03-03 07:56:54 UTC
Seem like the fix PR is already merged into openshift-ansible-3.5.22-1.git.0.8ef4cff.el7.noarch, everything is working well. Once this bug is moved to ON_QA, will verify it.

Comment 11 Wenkai Shi 2017-03-06 02:17:27 UTC
Verified with version openshift-ansible-3.5.23-1.git.0.1cd0089, installation succeed and STI build succeed.


Note You need to log in before you can comment on or make changes to this bug.