Description of problem: Found this issue when testing https://trello.com/c/TK6FwQ6k/314-2-push-to-registry-by-dns, one goal of this user story is allow user does not have to restart master service once registry svc IP is changed. While in my testing, app's dc is trying pulling image from registry svc orignal IP but not registry svc DNS. Version-Release number of selected component (if applicable): openshift-ansible-3.6.121-1.git.0.ed0b72c.el7.noarch openshift v3.6.121 kubernetes v1.6.1+5115d708d7 etcd 3.2.0 How reproducible: Always Steps to Reproduce: 1. Trigger a fresh installation with single master. 2. After installation, trigger a sti build. # oc new-app nodejs-mongodb-example -n install-test 3. check build log, sti build is pushed to registry by DNS successfully. 4. delete docker-registry svc, and re-expose it to make its svc IP is changed. # oc get svc docker-registry NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry 172.31.9.190 <none> 5000/TCP 1h # oc delete svc docker-registry service "docker-registry" deleted # oc expose dc docker-registry service "docker-registry" exposed # oc get svc docker-registry NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE docker-registry 172.31.129.56 <none> 5000/TCP 5s 5. scale down replica to 0 for the app. # oc scale --replicas=0 dc nodejs-mongodb-example -n install-test 6. Go to node, delete the pulled sti build image locally. # docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker-registry.default.svc:5000/install-test/nodejs-mongodb-example latest 5d3c16cde11d About an hour ago 460.6 MB 172.31.9.190:5000/install-test/nodejs-mongodb-example <none> 5d3c16cde11d About an hour ago 460.6 MB # docker rmi 5d3c16cde11d -f Untagged: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67 Untagged: docker-registry.default.svc:5000/install-test/nodejs-mongodb-example:latest Untagged: docker-registry.default.svc:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67 Deleted: sha256:5d3c16cde11d6c48e65e19c59481a3c2d2d60336088db7309cd4665ca4c75f78 7. scale up replica to 1 for the app again. Actual results: Failed to be scaled up. # oc get po -n install-test NAME READY STATUS RESTARTS AGE mongodb-1-2kk0n 1/1 Running 0 1h nodejs-mongodb-example-1-build 0/1 Completed 0 1h nodejs-mongodb-example-1-vg5qf 0/1 ErrImagePull 0 16s # oc describe po nodejs-mongodb-example-1-vg5qf -n install-test <--snip--> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 33s 33s 1 default-scheduler Normal Scheduled Successfully assigned nodejs-mongodb-example-1-vg5qf to qe-36-smoke-master-registry-router-1 27s 27s 1 kubelet, qe-36-smoke-master-registry-router-1 spec.containers{nodejs-mongodb-example} Normal BackOff Back-off pulling image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67" 33s 16s 4 kubelet, qe-36-smoke-master-registry-router-1 Warning DNSSearchForming Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains 30s 16s 2 kubelet, qe-36-smoke-master-registry-router-1 spec.containers{nodejs-mongodb-example} Normal Pulling pulling image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67" 27s 13s 2 kubelet, qe-36-smoke-master-registry-router-1 spec.containers{nodejs-mongodb-example} Warning Failed Failed to pull image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67": rpc error: code = 2 desc = Get https://172.31.9.190:5000/v2/: dial tcp 172.31.9.190:5000: getsockopt: no route to host 27s 13s 3 kubelet, qe-36-smoke-master-registry-router-1 Warning FailedSync Error syncing pod # oc get dc nodejs-mongodb-example -n install-test -o yaml apiVersion: v1 kind: DeploymentConfig metadata: annotations: description: Defines how to deploy the application server openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2017-06-21T04:07:59Z generation: 5 labels: app: nodejs-mongodb-example template: nodejs-mongodb-example name: nodejs-mongodb-example namespace: install-test resourceVersion: "4246" selfLink: /oapi/v1/namespaces/install-test/deploymentconfigs/nodejs-mongodb-example uid: 36be4ea4-5637-11e7-8b48-42010af00013 spec: replicas: 1 selector: name: nodejs-mongodb-example strategy: activeDeadlineSeconds: 21600 recreateParams: timeoutSeconds: 600 resources: {} type: Recreate template: metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: null labels: app: nodejs-mongodb-example name: nodejs-mongodb-example name: nodejs-mongodb-example spec: containers: - env: - name: DATABASE_SERVICE_NAME value: mongodb - name: MONGODB_USER valueFrom: secretKeyRef: key: database-user name: nodejs-mongodb-example - name: MONGODB_PASSWORD valueFrom: secretKeyRef: key: database-password name: nodejs-mongodb-example - name: MONGODB_DATABASE value: sampledb - name: MONGODB_ADMIN_PASSWORD valueFrom: secretKeyRef: key: database-admin-password name: nodejs-mongodb-example image: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /pagecount port: 8080 scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 3 name: nodejs-mongodb-example ports: - containerPort: 8080 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /pagecount port: 8080 scheme: HTTP initialDelaySeconds: 3 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 3 resources: limits: memory: 512Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 test: false triggers: - imageChangeParams: automatic: true containerNames: - nodejs-mongodb-example from: kind: ImageStreamTag name: nodejs-mongodb-example:latest namespace: install-test lastTriggeredImage: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67 type: ImageChange - type: ConfigChange status: availableReplicas: 0 conditions: - lastTransitionTime: 2017-06-21T04:11:08Z lastUpdateTime: 2017-06-21T04:11:28Z message: replication controller "nodejs-mongodb-example-1" successfully rolled out reason: NewReplicationControllerAvailable status: "True" type: Progressing - lastTransitionTime: 2017-06-21T06:01:44Z lastUpdateTime: 2017-06-21T06:01:44Z message: Deployment config does not have minimum availability. status: "False" type: Available details: causes: - imageTrigger: from: kind: ImageStreamTag name: nodejs-mongodb-example:latest namespace: install-test type: ImageChange message: image change latestVersion: 1 observedGeneration: 5 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1 In the above app dc yaml file, app's image is pointed to registry svc IP but not registry svc DNS. Expected results: In app's dc, image is be pointed to registry svc DNS but not IP. Additional info:
Once the above issue happened, restart master service, re-trigger sti build, then image in app's dc will be pointed to registry svc DNS. That means app's dc image is pointed to registry svc IP only happened in the initial moment when app's dc is created in the first time.
If this is an HA environment maybe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1463498 ? I'll investigate today.
(In reply to Scott Dodson from comment #2) > If this is an HA environment maybe this is a dupe of > https://bugzilla.redhat.com/show_bug.cgi?id=1463498 ? I'll investigate today. No, this is a single master env. Totally different bug with BZ#1463498. Pls refer to reproduce steps, in a multiple master HA env, step 3 will fail, but in this single master env, step 3 is working well.
Any idea what's going on here?
I guess the image stream still had the IP address in the pull spec and by restarting this and rebuilding the image stream pull spec was updated to point to the registry by DNS. Was this cluster live before setting the DEFAULT_REGISTRY variable or this happens on a clean cluster with the registry set?
Can we see the image stream when this breaks? If the image stream has recorded the the IP address in pull spec... The image stream is the source of truth for deployment configs.
PR: https://github.com/openshift/origin/pull/14882 After this, you have to set the OPENSHIFT_DEFAULT_REGISTRY variable for the docker-registry DC in order for the integrated registry using the DNS name over the IP address (which is the default). Currently this is hard-coded into registry image entrypoint and it should not. I guess what this cause is the registry everytime it updates the image stream it replaces the DNS with the IP address. However with the OPENSHIFT_DEFAULT_REGISTRY set for master, when the master API updates the image stream (oc tag or build?) it will revert the IP address back to the DNS. This should be tested more to make sure we don't break image streams (otherwise we will have to run ugly scripts replacing the broken IP's).
The master should have OPENSHIFT_DEFAULT_REGISTRY=docker-registry.default.svc:5000 set before that image stream is even created. I'll watch that PR.
Since I don't think there's any additional work required of the installer here I'm going to move this to Image Registry component.
*** Bug 1466583 has been marked as a duplicate of this bug. ***
Yes the variable should be set in the registry DC now. https://github.com/openshift/openshift-ansible/pull/4681
According to comment 15, move this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188