Bug 1463499

Summary: app's dc is pulling image from registry by IP but not by DNS.
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: Image RegistryAssignee: Michal Fojtik <mfojtik>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, jokerman, mfojtik, mmccomas, pweil, sdodson, yinzhou
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1466784 (view as bug list) Environment:
Last Closed: 2017-11-28 21:58:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1466784    

Description Johnny Liu 2017-06-21 06:59:20 UTC
Description of problem:
Found this issue when testing https://trello.com/c/TK6FwQ6k/314-2-push-to-registry-by-dns, one goal of this user story is allow user does not have to restart master service once registry svc IP is changed. While in my testing, app's dc is trying pulling image from registry svc orignal IP but not registry svc DNS.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.121-1.git.0.ed0b72c.el7.noarch
openshift v3.6.121
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

How reproducible:
Always

Steps to Reproduce:
1. Trigger a fresh installation with single master.
2. After installation, trigger a sti build.
# oc new-app nodejs-mongodb-example -n install-test
3. check build log, sti build is pushed to registry by DNS successfully.
4. delete docker-registry svc, and re-expose it to make its svc IP is changed.
# oc get svc docker-registry
NAME              CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
docker-registry   172.31.9.190   <none>        5000/TCP   1h

# oc delete svc docker-registry
service "docker-registry" deleted

# oc expose dc docker-registry
service "docker-registry" exposed

# oc get svc docker-registry
NAME              CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
docker-registry   172.31.129.56   <none>        5000/TCP   5s

5. scale down replica to 0 for the app.
# oc scale --replicas=0 dc nodejs-mongodb-example -n install-test
6. Go to node, delete the pulled sti build image locally.
# docker images
REPOSITORY                                                             TAG                 IMAGE ID            CREATED             SIZE
docker-registry.default.svc:5000/install-test/nodejs-mongodb-example   latest              5d3c16cde11d        About an hour ago   460.6 MB
172.31.9.190:5000/install-test/nodejs-mongodb-example                  <none>              5d3c16cde11d        About an hour ago   460.6 MB

# docker rmi 5d3c16cde11d -f
Untagged: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67
Untagged: docker-registry.default.svc:5000/install-test/nodejs-mongodb-example:latest
Untagged: docker-registry.default.svc:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67
Deleted: sha256:5d3c16cde11d6c48e65e19c59481a3c2d2d60336088db7309cd4665ca4c75f78

7. scale up replica to 1 for the app again.

Actual results:
Failed to be scaled up.
# oc get po -n install-test
NAME                             READY     STATUS         RESTARTS   AGE
mongodb-1-2kk0n                  1/1       Running        0          1h
nodejs-mongodb-example-1-build   0/1       Completed      0          1h
nodejs-mongodb-example-1-vg5qf   0/1       ErrImagePull   0          16s

# oc describe po nodejs-mongodb-example-1-vg5qf -n install-test
<--snip-->
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath				Type		Reason			Message
  ---------	--------	-----	----						-------------				--------	------			-------
  33s		33s		1	default-scheduler									Normal		Scheduled		Successfully assigned nodejs-mongodb-example-1-vg5qf to qe-36-smoke-master-registry-router-1
  27s		27s		1	kubelet, qe-36-smoke-master-registry-router-1	spec.containers{nodejs-mongodb-example}	Normal		BackOff			Back-off pulling image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67"
  33s		16s		4	kubelet, qe-36-smoke-master-registry-router-1						Warning		DNSSearchForming	Found and omitted duplicated dns domain in host search line: 'cluster.local' during merging with cluster dns domains
  30s		16s		2	kubelet, qe-36-smoke-master-registry-router-1	spec.containers{nodejs-mongodb-example}	Normal		Pulling			pulling image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67"
  27s		13s		2	kubelet, qe-36-smoke-master-registry-router-1	spec.containers{nodejs-mongodb-example}	Warning		Failed			Failed to pull image "172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67": rpc error: code = 2 desc = Get https://172.31.9.190:5000/v2/: dial tcp 172.31.9.190:5000: getsockopt: no route to host
  27s		13s		3	kubelet, qe-36-smoke-master-registry-router-1						Warning		FailedSync		Error syncing pod


# oc get dc nodejs-mongodb-example -n install-test -o yaml
apiVersion: v1
kind: DeploymentConfig
metadata:
  annotations:
    description: Defines how to deploy the application server
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2017-06-21T04:07:59Z
  generation: 5
  labels:
    app: nodejs-mongodb-example
    template: nodejs-mongodb-example
  name: nodejs-mongodb-example
  namespace: install-test
  resourceVersion: "4246"
  selfLink: /oapi/v1/namespaces/install-test/deploymentconfigs/nodejs-mongodb-example
  uid: 36be4ea4-5637-11e7-8b48-42010af00013
spec:
  replicas: 1
  selector:
    name: nodejs-mongodb-example
  strategy:
    activeDeadlineSeconds: 21600
    recreateParams:
      timeoutSeconds: 600
    resources: {}
    type: Recreate
  template:
    metadata:
      annotations:
        openshift.io/generated-by: OpenShiftNewApp
      creationTimestamp: null
      labels:
        app: nodejs-mongodb-example
        name: nodejs-mongodb-example
      name: nodejs-mongodb-example
    spec:
      containers:
      - env:
        - name: DATABASE_SERVICE_NAME
          value: mongodb
        - name: MONGODB_USER
          valueFrom:
            secretKeyRef:
              key: database-user
              name: nodejs-mongodb-example
        - name: MONGODB_PASSWORD
          valueFrom:
            secretKeyRef:
              key: database-password
              name: nodejs-mongodb-example
        - name: MONGODB_DATABASE
          value: sampledb
        - name: MONGODB_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              key: database-admin-password
              name: nodejs-mongodb-example
        image: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /pagecount
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: nodejs-mongodb-example
        ports:
        - containerPort: 8080
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /pagecount
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          limits:
            memory: 512Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  test: false
  triggers:
  - imageChangeParams:
      automatic: true
      containerNames:
      - nodejs-mongodb-example
      from:
        kind: ImageStreamTag
        name: nodejs-mongodb-example:latest
        namespace: install-test
      lastTriggeredImage: 172.31.9.190:5000/install-test/nodejs-mongodb-example@sha256:933471ee23af23a8f2890880b1be2e26e0d9c3d2373f9e1c7eac1c958d28ea67
    type: ImageChange
  - type: ConfigChange
status:
  availableReplicas: 0
  conditions:
  - lastTransitionTime: 2017-06-21T04:11:08Z
    lastUpdateTime: 2017-06-21T04:11:28Z
    message: replication controller "nodejs-mongodb-example-1" successfully rolled
      out
    reason: NewReplicationControllerAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: 2017-06-21T06:01:44Z
    lastUpdateTime: 2017-06-21T06:01:44Z
    message: Deployment config does not have minimum availability.
    status: "False"
    type: Available
  details:
    causes:
    - imageTrigger:
        from:
          kind: ImageStreamTag
          name: nodejs-mongodb-example:latest
          namespace: install-test
      type: ImageChange
    message: image change
  latestVersion: 1
  observedGeneration: 5
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

In the above app dc yaml file, app's image is pointed to registry svc IP but not registry svc DNS.

Expected results:
In app's dc, image is be pointed to registry svc DNS but not IP.

Additional info:

Comment 1 Johnny Liu 2017-06-21 07:47:13 UTC
Once the above issue happened, restart master service, re-trigger sti build, then image in app's dc will be pointed to registry svc DNS. That means app's dc image is pointed to registry svc IP only happened in the initial moment when app's dc is created in the first time.

Comment 2 Scott Dodson 2017-06-21 13:46:30 UTC
If this is an HA environment maybe this is a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1463498 ? I'll investigate today.

Comment 3 Johnny Liu 2017-06-22 02:27:45 UTC
(In reply to Scott Dodson from comment #2)
> If this is an HA environment maybe this is a dupe of
> https://bugzilla.redhat.com/show_bug.cgi?id=1463498 ? I'll investigate today.

No, this is a single master env. Totally different bug with BZ#1463498. Pls refer to reproduce steps, in a multiple master HA env, step 3 will fail, but in this single master env, step 3 is working well.

Comment 4 Scott Dodson 2017-06-26 13:42:21 UTC
Any idea what's going on here?

Comment 5 Michal Fojtik 2017-06-26 13:58:09 UTC
I guess the image stream still had the IP address in the pull spec and by restarting this and rebuilding the image stream pull spec was updated to point to the registry by DNS. Was this cluster live before setting the DEFAULT_REGISTRY variable or this happens on a clean cluster with the registry set?

Comment 6 Michal Fojtik 2017-06-26 14:03:52 UTC
Can we see the image stream when this breaks? If the image stream has recorded the the IP address in pull spec... The image stream is the source of truth for deployment configs.

Comment 7 Michal Fojtik 2017-06-26 14:50:16 UTC
PR: https://github.com/openshift/origin/pull/14882

After this, you have to set the OPENSHIFT_DEFAULT_REGISTRY variable for the docker-registry DC in order for the integrated registry using the DNS name over the IP address (which is the default).
Currently this is hard-coded into registry image entrypoint and it should not.
I guess what this cause is the registry everytime it updates the image stream it replaces the DNS with the IP address. However with the OPENSHIFT_DEFAULT_REGISTRY set for master, when the master API updates the image stream (oc tag or build?) it will revert the IP address back to the DNS.

This should be tested more to make sure we don't break image streams (otherwise we will have to run ugly scripts replacing the broken IP's).

Comment 8 Scott Dodson 2017-06-26 15:28:59 UTC
The master should have OPENSHIFT_DEFAULT_REGISTRY=docker-registry.default.svc:5000 set before that image stream is even created.

I'll watch that PR.

Comment 9 Scott Dodson 2017-06-28 18:29:31 UTC
Since I don't think there's any additional work required of the installer here I'm going to move this to Image Registry component.

Comment 11 Michal Fojtik 2017-07-03 08:34:10 UTC
*** Bug 1466583 has been marked as a duplicate of this bug. ***

Comment 14 Scott Dodson 2017-07-12 12:46:02 UTC
Yes the variable should be set in the registry DC now.

https://github.com/openshift/openshift-ansible/pull/4681

Comment 17 Johnny Liu 2017-07-17 08:46:37 UTC
According to comment 15, move this bug to verified state.

Comment 21 errata-xmlrpc 2017-11-28 21:58:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188