Description of problem: Create template and trigger 1 time of deployment with template, but it triggered 2 deployment and create two pods. oc v3.8.26 kubernetes v1.8.1+0d5291c features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://api.free-int.openshift.com:443 openshift v3.8.18 kubernetes v1.8.1+0d5291c How reproducible: Always Steps to Reproduce: 1.# oc process -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/deployment/OCP-11384/application-template-stibuild.json| oc create -f - 2.# oc get pods NAME READY STATUS RESTARTS AGE database-1-deploy 0/1 Error 0 1m frontend-2-bs7n9 1/1 Running 0 25s frontend-2-qkfbk 1/1 Running 0 25s ruby-sample-build-1-build 0/1 Completed 0 1m 3.# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY database 1 1 0 config frontend 2 2 2 config,image(origin-ruby-sample:latest) # oc get dc frontend -o yaml apiVersion: v1 kind: DeploymentConfig metadata: annotations: template.alpha.openshift.io/wait-for-ready: "true" creationTimestamp: 2018-01-05T10:34:37Z generation: 3 labels: template: application-template-stibuild name: frontend namespace: lgp3 resourceVersion: "147125443" selfLink: /oapi/v1/namespaces/lgp3/deploymentconfigs/frontend uid: 071b5fa2-f204-11e7-98b7-0ac586c2eb16 spec: replicas: 2 selector: name: frontend strategy: activeDeadlineSeconds: 21600 resources: {} rollingParams: intervalSeconds: 1 maxSurge: 25% maxUnavailable: 25% post: execNewPod: command: - /bin/true containerName: ruby-helloworld env: - name: CUSTOM_VAR2 value: custom_value2 failurePolicy: Ignore pre: execNewPod: command: - /bin/true containerName: ruby-helloworld env: - name: CUSTOM_VAR1 value: custom_value1 failurePolicy: Abort timeoutSeconds: 120 updatePeriodSeconds: 1 type: Rolling template: metadata: creationTimestamp: null labels: name: frontend spec: containers: - env: - name: MYSQL_USER valueFrom: secretKeyRef: key: mysql-user name: dbsecret - name: MYSQL_PASSWORD valueFrom: secretKeyRef: key: mysql-password name: dbsecret - name: MYSQL_DATABASE value: root image: docker-registry.default.svc:5000/lgp3/origin-ruby-sample@sha256:e4a3e3b47961386374696299edcccd406b5920cd1b9a1663b757c44ab5d2b233 imagePullPolicy: IfNotPresent name: ruby-helloworld ports: - containerPort: 8080 protocol: TCP resources: {} securityContext: capabilities: {} privileged: false terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 test: false triggers: - imageChangeParams: automatic: true containerNames: - ruby-helloworld from: kind: ImageStreamTag name: origin-ruby-sample:latest namespace: lgp3 lastTriggeredImage: docker-registry.default.svc:5000/lgp3/origin-ruby-sample@sha256:e4a3e3b47961386374696299edcccd406b5920cd1b9a1663b757c44ab5d2b233 type: ImageChange - type: ConfigChange status: availableReplicas: 2 conditions: - lastTransitionTime: 2018-01-05T10:35:54Z lastUpdateTime: 2018-01-05T10:35:54Z message: Deployment config has minimum availability. status: "True" type: Available - lastTransitionTime: 2018-01-05T10:35:51Z lastUpdateTime: 2018-01-05T10:36:02Z message: replication controller "frontend-2" successfully rolled out reason: NewReplicationControllerAvailable status: "True" type: Progressing details: causes: - imageTrigger: from: kind: DockerImage name: docker-registry.default.svc:5000/lgp3/origin-ruby-sample@sha256:e4a3e3b47961386374696299edcccd406b5920cd1b9a1663b757c44ab5d2b233 type: ImageChange message: image change latestVersion: 2 observedGeneration: 3 readyReplicas: 2 replicas: 2 unavailableReplicas: 0 updatedReplicas: 2 Actual results: Redundant deployemnt appear when trigger 1 time of deployment operation Expected results: Trigger 1 time of deployment operation should not trigger redundant deployement
Doesn't "replicas: 2" in you dc indicate that there should be two pods?
(In reply to Justin Pierce from comment #1) > Doesn't "replicas: 2" in you dc indicate that there should be two pods? Yes, it's should be two pods, but the image change should only trigger one deploy, which create a new rc with replicas=2, should not trigger two deploy, which create two deployer pod.
(In reply to Wang Haoran from comment #2) > (In reply to Justin Pierce from comment #1) > > Doesn't "replicas: 2" in you dc indicate that there should be two pods? > > Yes, it's should be two pods, but the image change should only trigger one > deploy, which create a new rc with replicas=2, should not trigger two > deploy, which create two deployer pod. I tried to reproduce based on the steps. The first time I created the template, the database errored (quota) and the build was started. When the build finished, I got: ~ → oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY database 1 1 0 config frontend 1 2 2 config,image(origin-ruby-sample:latest) Meaning just 1 deployment was triggered. I retried this 3x times and I was not able to hit double-deployment. How you triggered the second deployment? Was it triggered manually via 'oc rollout latest' or it triggered automatically for unknown reason?
Lowering priority as QA can't reproduce this in present time as well. When they successfully reproduce, logs will be provided and we can raise the priority again.
Close this bug since it could not be produced.
The problem is in the docker-registry configuration. When you use the DNS for docker-registry, the registry has to know what hostname should be used when the new images are created. By default, registry will use the docker-registry IP address. This will cause that the initial "create" call will create the image with wrong DockerImageReference (172.30.88.55:5000/haowang). However, in OpenShift we have decorator field for DockerImageReference so when the informer cache is updated (1s later), the image field value is changed to DNS based format. As a result, we can see 2 deployments (if the cache resync is fast, you see just one). To fix this, you have to set the 'REGISTRY_OPENSHIFT_SERVER_ADDR' environment variable for the ds/docker-registry to 'docker-registry.default.svc:5000'. This is likely an installer bug, when the installer should do this automatically. Ben: I guess your team owns the ansible part in registry, we need to make sure that environment variable is set when the DNS is configured for docker-registry.
The docker registries on the free/starter clusters are very old, so the installer has not touched them in some time. We could manually apply this change if necessary.
Ansible appears to be setting OPENSHIFT_DEFAULT_REGISTRY which as best I can tell was later deprecated in favor of REGISTRY_OPENSHIFT_SERVER_ADDR but should accomplish the same goal.
Can't recreate this issue in ocp env(3.9.0-0.22.0), I think it appears randomly according to analysis of comment 9.
> @Ben, as you said, OPENSHIFT_DEFAULT_REGISTRY is deprecated, would you mind update to use REGISTRY_OPENSHIFT_SERVER_ADDR ? we can do it, but setting either value should currently work so it's not the problem here. Can you open a separate bug to track making that change? Was either value set on the registry in this case? I suspect neither value was set.
@Ben, OPENSHIFT_DEFAULT_REGISTRY was set in te DC of registry.
@Ben bug opened https://bugzilla.redhat.com/show_bug.cgi?id=1537593
If this cannot be recreated anymore because the environment was taken down, I suggest closing it. If we have an environment where it can be recreated, we need to see: the registry pod yaml the master config the imagestream yaml the deploymentconfig is referencing the deploymentconfig yaml Note that in my environment where I was not able to recreate it, the only variable I set was OPENSHIFT_DEFAULT_REGISTRY=docker-registry.default.svc.local and I set it for the master process, I did not explicitly set it on the registry (the registry will default to this value anyway). Regarding comment 9+13, I do not think this is random. Getting two deployments might be random(it's a bit of a race) but it's fundamentally caused by the imagestream being defined in terms of the registry ip address (imagestream.status.dockerImageRepository points to an ip address), while the master has an OPENSHIFT_DEFAULT_REGISTRY value set to something else. (Michal can clarify the behavior but it sounds like there is a controller his team owns that, upon seeing an image reference to an ip that corresponds to the registry, rewrites that image reference to match the OPENSHIFT_DEFAULT_REGISTRY value). How you get an imagestream that's defined in terms of an ip address, when OPENSHIFT_DEFAULT_REGISTRY is defined, is not obvious to me (again, I could not recreate it, even when i did not explicitly configure the registry deploymentconfig).
Discussed w/ Michal. This definitely appears to be what he inititially suspected: if you set OPENSHIFT_DEFAULT_REGISTRY on the master, you must also set the registry address env variable on the registry DC (there are several variables that will accomplish this, but REGISTRY_OPENSHIFT_SERVER_ADDR is that current preferred variable name). I've also confirmed that the ansible installer does set this on the registry for new installs, and i've updated the ansible installer to set REGISTRY_OPENSHIFT_SERVER_ADDR (previously it was setting OPENSHIFT_DEFAULT_REGISTRY which also works but is deprecated for the registry). So basically this is also working as expected. However i'm going to update the documentation to make it clear to users that they must keep the registry url setting on the Master in sync w/ the registry url setting on the Registry, so leaving this bug open to do that.
doc fixes here: https://github.com/openshift/openshift-docs/pull/7296
Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/c019b45d2030f95f85d0a1404b80ee8d5f3e1bd3 document keeping the registry addr configuration in sync bug 1531500 https://bugzilla.redhat.com/show_bug.cgi?id=1531500
Moved to verify
i'm guessing you ran into this: https://github.com/openshift/image-registry/issues/58
registry logs would allow us to confirm. (since that issue is now fixed, a new install might also resolve it for you).
I tried it on new installed env(3.9.0-0.34.0), this problem fixed, and regrading to comment 37, I tried it, there is not problem now. # oc logs docker-registry-1-4p66z -n default | grep URL time="2018-01-31T05:55:54.960130259Z" level=info msg="Using \"docker-registry.default.svc:5000\" as Docker Registry URL" go.version=go1.9.2 instance.id=e505ee2c-5dc9-46b3-aeed-36263726f317
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748