Bug 1673993 - router deployment is not upgraded on cluster upgrade
Summary: router deployment is not upgraded on cluster upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.1.0
Assignee: Miciah Dashiel Butler Masters
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-08 16:41 UTC by Seth Jennings
Modified: 2019-06-04 10:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:42:43 UTC
Target Upstream Version:


Attachments (Terms of Use)
deployment of operator (2.64 KB, text/plain)
2019-03-15 03:37 UTC, Hongan Li
no flags Details
deployment of router (4.76 KB, text/plain)
2019-03-15 03:38 UTC, Hongan Li
no flags Details
ingresscontroller (632 bytes, text/plain)
2019-03-15 03:38 UTC, Hongan Li
no flags Details
clusterversion (2.28 KB, text/plain)
2019-03-15 03:38 UTC, Hongan Li
no flags Details
logs of operator (216.58 KB, text/plain)
2019-03-15 03:42 UTC, Hongan Li
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:43:51 UTC
Github openshift cluster-ingress-operator pull 159 None None None 2019-03-08 15:59:18 UTC

Description Seth Jennings 2019-02-08 16:41:26 UTC
After an upgrade, router deployment is not upgraded

$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.alpha-2019-02-08-113402   True        False         44m     Cluster version is 4.0.0-0.alpha-2019-02-08-113402

$ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n 
...
spec:
  containers:
...
    - name: IMAGE
      value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b
    image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:05b39eda84a9e22db9c0173b415c49296ec16b9d8e3585a96fea9a0bf08b4c0b


$ oc get deployment -oyaml router-default 
...
spec:
...
  template:
...
    spec:
      containers:
...
        image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version

Comment 3 Hongan Li 2019-03-14 12:04:35 UTC
upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.

$ oc get deployment -o yaml -n openshift-ingress-operator | grep image
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:736bb8db8e15b80fd6a51525364c5b86f90d4757e6930b20a80abfdab03f5a42


$ oc get deployment -o yaml -n openshift-ingress | grep image
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

Comment 4 Miciah Dashiel Butler Masters 2019-03-15 03:08:58 UTC
> upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.

The CVO is responsible for updating the operator, so I do not understand why the upgrade would fail to update the operator image.

I have verified that changing the IMAGE environment variable in the operator deployment causes the operator to roll out a new router deployment with the updated image, so I do not understand why the upgrade is failing to update the router deployment.

Can you provide the following?

1. yaml of the deployment for the operator
2. yaml of the deployment for the router
3. logs for the operator
4. yaml of the "ingress" clusteroperator
5. yaml of the "version" clusterversion

Comment 5 Hongan Li 2019-03-15 03:37:13 UTC
Created attachment 1544255 [details]
deployment of operator

Comment 6 Hongan Li 2019-03-15 03:38:01 UTC
Created attachment 1544256 [details]
deployment of router

Comment 7 Hongan Li 2019-03-15 03:38:32 UTC
Created attachment 1544257 [details]
ingresscontroller

Comment 8 Hongan Li 2019-03-15 03:38:56 UTC
Created attachment 1544258 [details]
clusterversion

Comment 9 Hongan Li 2019-03-15 03:42:07 UTC
Created attachment 1544259 [details]
logs of operator

Comment 10 Hongan Li 2019-03-15 03:51:33 UTC
And another issue is the operator's .status.conditions is not updated after upgrade.

$ oc get clusteroperator ingress 
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
ingress   4.0.0-0.nightly-2019-03-14-040908   True        False         False     20h

$ oc get clusteroperator ingress -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-03-14T03:49:35Z
  generation: 1
  name: ingress
  resourceVersion: "215168"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/ingress
  uid: 2f275d66-460c-11e9-aa58-067ff0e0256a
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-03-14T03:49:37Z
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-03-14T03:49:37Z
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-03-14T07:33:30Z
    status: "True"
    type: Available
  extension: null
  relatedObjects:
  - group: ""
    name: openshift-ingress-operator
    resource: namespaces
  - group: ""
    name: openshift-ingress
    resource: namespaces
  versions:
  - name: operator
    version: 4.0.0-0.nightly-2019-03-14-040908
  - name: ingress-controller
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

Comment 11 Miciah Dashiel Butler Masters 2019-03-15 03:54:57 UTC
I compared the release payloads for 4.0.0-0.nightly-2019-03-13-233958 and 4.0.0-0.nightly-2019-03-14-040908:

1. mkdir 4.0.0-0.nightly-2019-03-14-040908
2. cd 4.0.0-0.nightly-2019-03-14-040908
3. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908 --only-files
4. cd ..
5. mkdir 4.0.0-0.nightly-2019-03-13-233958
6. cd 4.0.0-0.nightly-2019-03-13-233958
7. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-13-233958 --only-files
8. cd ..
9. diff -u */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml
10. grep -A1 -e IMAGE -- */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml

Step 9 has the following output:

    --- 4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml     2019-03-13 23:41:34.000000000 +0000
    +++ 4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml     2019-03-14 04:11:08.000000000 +0000
    @@ -32,7 +32,7 @@
               - cluster-ingress-operator
               env:
                 - name: RELEASE_VERSION
    -              value: "4.0.0-0.nightly-2019-03-13-233958"
    +              value: "4.0.0-0.nightly-2019-03-14-040908"
                 - name: WATCH_NAMESPACE
                   valueFrom:
                     fieldRef:

Step 10 has the following output:

    4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml:            - name: IMAGE
    4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml-              value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
    --
    4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml:            - name: IMAGE
    4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml-              value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

The upgrade's release payload made no change to the operator image version, so not updating the operator image is the correct behavior.

Comment 12 Miciah Dashiel Butler Masters 2019-03-15 03:58:42 UTC
You updated to 4.0.0-0.nightly-2019-03-14-040908, right? So this looks correct:

      versions:
      - name: operator
        version: 4.0.0-0.nightly-2019-03-14-040908

Comment 13 Hongan Li 2019-03-15 04:26:53 UTC
Thanks for you explanation, Miciah. I will check and try another available target version.

And another question is the case when the bug reported, is that possible that the operator image changed but router image no change? or router image changed but operator image no change?

Comment 14 Miciah Dashiel Butler Masters 2019-03-15 04:58:49 UTC
The operator deployment's container image and the router deployment's container image can be different and can change independently.  The operator deployment's container image should not matter for verifying this bug.

What matters are the "IMAGE" environment variable (which specifies the image to use for the router) and the router deployment's container image.  The problem that Seth reported was that the operator deployment's "IMAGE" variable specified the following router image:

    $ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n 
    ...
    spec:
      containers:
    ...
        - name: IMAGE
          value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b

and the router deployment had a different image:

    $ oc get deployment -oyaml router-default 
    ...
    spec:
    ...
      template:
    ...
        spec:
          containers:
    ...
            image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version

Note "2019-02-08-113402" versus "2019-02-08-055616".

If an upgrade changes the router image, then CVO should update the "IMAGE" variable in the operator deployment, and then cluster-ingress-operator should update the container image in the router deployment to match the "IMAGE" variable.  If cluster-ingress-operator updates the router deployment to have the image specified in the "IMAGE" variable, then I believe this bug is fixed.

Comment 15 Hongan Li 2019-03-15 09:27:58 UTC
verified with upgrading from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-15-043409, and the issue has been fixed.

after upgrade the router image has been updated:
 
$ oc get deployment -n openshift-ingress-operator -o yaml | grep quay -C 1
          - name: IMAGE
            value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aca829201f14e91f1d8be8f46f8e71556a56aefd12600ad2ce44972e3e622e99
          imagePullPolicy: IfNotPresent

$ oc get deployment -n openshift-ingress -o yaml | grep quay 
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49

Comment 17 errata-xmlrpc 2019-06-04 10:42:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.