Bug 1673993

Summary: router deployment is not upgraded on cluster upgrade
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs, dmace, hongli
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:42:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
deployment of operator
none
deployment of router
none
ingresscontroller
none
clusterversion
none
logs of operator none

Description Seth Jennings 2019-02-08 16:41:26 UTC
After an upgrade, router deployment is not upgraded

$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.alpha-2019-02-08-113402   True        False         44m     Cluster version is 4.0.0-0.alpha-2019-02-08-113402

$ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n 
...
spec:
  containers:
...
    - name: IMAGE
      value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b
    image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:05b39eda84a9e22db9c0173b415c49296ec16b9d8e3585a96fea9a0bf08b4c0b


$ oc get deployment -oyaml router-default 
...
spec:
...
  template:
...
    spec:
      containers:
...
        image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version

Comment 3 Hongan Li 2019-03-14 12:04:35 UTC
upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.

$ oc get deployment -o yaml -n openshift-ingress-operator | grep image
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:736bb8db8e15b80fd6a51525364c5b86f90d4757e6930b20a80abfdab03f5a42


$ oc get deployment -o yaml -n openshift-ingress | grep image
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

Comment 4 Miciah Dashiel Butler Masters 2019-03-15 03:08:58 UTC
> upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.

The CVO is responsible for updating the operator, so I do not understand why the upgrade would fail to update the operator image.

I have verified that changing the IMAGE environment variable in the operator deployment causes the operator to roll out a new router deployment with the updated image, so I do not understand why the upgrade is failing to update the router deployment.

Can you provide the following?

1. yaml of the deployment for the operator
2. yaml of the deployment for the router
3. logs for the operator
4. yaml of the "ingress" clusteroperator
5. yaml of the "version" clusterversion

Comment 5 Hongan Li 2019-03-15 03:37:13 UTC
Created attachment 1544255 [details]
deployment of operator

Comment 6 Hongan Li 2019-03-15 03:38:01 UTC
Created attachment 1544256 [details]
deployment of router

Comment 7 Hongan Li 2019-03-15 03:38:32 UTC
Created attachment 1544257 [details]
ingresscontroller

Comment 8 Hongan Li 2019-03-15 03:38:56 UTC
Created attachment 1544258 [details]
clusterversion

Comment 9 Hongan Li 2019-03-15 03:42:07 UTC
Created attachment 1544259 [details]
logs of operator

Comment 10 Hongan Li 2019-03-15 03:51:33 UTC
And another issue is the operator's .status.conditions is not updated after upgrade.

$ oc get clusteroperator ingress 
NAME      VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
ingress   4.0.0-0.nightly-2019-03-14-040908   True        False         False     20h

$ oc get clusteroperator ingress -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: 2019-03-14T03:49:35Z
  generation: 1
  name: ingress
  resourceVersion: "215168"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/ingress
  uid: 2f275d66-460c-11e9-aa58-067ff0e0256a
spec: {}
status:
  conditions:
  - lastTransitionTime: 2019-03-14T03:49:37Z
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-03-14T03:49:37Z
    status: "False"
    type: Progressing
  - lastTransitionTime: 2019-03-14T07:33:30Z
    status: "True"
    type: Available
  extension: null
  relatedObjects:
  - group: ""
    name: openshift-ingress-operator
    resource: namespaces
  - group: ""
    name: openshift-ingress
    resource: namespaces
  versions:
  - name: operator
    version: 4.0.0-0.nightly-2019-03-14-040908
  - name: ingress-controller
    version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

Comment 11 Miciah Dashiel Butler Masters 2019-03-15 03:54:57 UTC
I compared the release payloads for 4.0.0-0.nightly-2019-03-13-233958 and 4.0.0-0.nightly-2019-03-14-040908:

1. mkdir 4.0.0-0.nightly-2019-03-14-040908
2. cd 4.0.0-0.nightly-2019-03-14-040908
3. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908 --only-files
4. cd ..
5. mkdir 4.0.0-0.nightly-2019-03-13-233958
6. cd 4.0.0-0.nightly-2019-03-13-233958
7. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-13-233958 --only-files
8. cd ..
9. diff -u */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml
10. grep -A1 -e IMAGE -- */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml

Step 9 has the following output:

    --- 4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml     2019-03-13 23:41:34.000000000 +0000
    +++ 4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml     2019-03-14 04:11:08.000000000 +0000
    @@ -32,7 +32,7 @@
               - cluster-ingress-operator
               env:
                 - name: RELEASE_VERSION
    -              value: "4.0.0-0.nightly-2019-03-13-233958"
    +              value: "4.0.0-0.nightly-2019-03-14-040908"
                 - name: WATCH_NAMESPACE
                   valueFrom:
                     fieldRef:

Step 10 has the following output:

    4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml:            - name: IMAGE
    4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml-              value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
    --
    4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml:            - name: IMAGE
    4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml-              value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9

The upgrade's release payload made no change to the operator image version, so not updating the operator image is the correct behavior.

Comment 12 Miciah Dashiel Butler Masters 2019-03-15 03:58:42 UTC
You updated to 4.0.0-0.nightly-2019-03-14-040908, right? So this looks correct:

      versions:
      - name: operator
        version: 4.0.0-0.nightly-2019-03-14-040908

Comment 13 Hongan Li 2019-03-15 04:26:53 UTC
Thanks for you explanation, Miciah. I will check and try another available target version.

And another question is the case when the bug reported, is that possible that the operator image changed but router image no change? or router image changed but operator image no change?

Comment 14 Miciah Dashiel Butler Masters 2019-03-15 04:58:49 UTC
The operator deployment's container image and the router deployment's container image can be different and can change independently.  The operator deployment's container image should not matter for verifying this bug.

What matters are the "IMAGE" environment variable (which specifies the image to use for the router) and the router deployment's container image.  The problem that Seth reported was that the operator deployment's "IMAGE" variable specified the following router image:

    $ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n 
    ...
    spec:
      containers:
    ...
        - name: IMAGE
          value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b

and the router deployment had a different image:

    $ oc get deployment -oyaml router-default 
    ...
    spec:
    ...
      template:
    ...
        spec:
          containers:
    ...
            image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version

Note "2019-02-08-113402" versus "2019-02-08-055616".

If an upgrade changes the router image, then CVO should update the "IMAGE" variable in the operator deployment, and then cluster-ingress-operator should update the container image in the router deployment to match the "IMAGE" variable.  If cluster-ingress-operator updates the router deployment to have the image specified in the "IMAGE" variable, then I believe this bug is fixed.

Comment 15 Hongan Li 2019-03-15 09:27:58 UTC
verified with upgrading from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-15-043409, and the issue has been fixed.

after upgrade the router image has been updated:
 
$ oc get deployment -n openshift-ingress-operator -o yaml | grep quay -C 1
          - name: IMAGE
            value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aca829201f14e91f1d8be8f46f8e71556a56aefd12600ad2ce44972e3e622e99
          imagePullPolicy: IfNotPresent

$ oc get deployment -n openshift-ingress -o yaml | grep quay 
          image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49

Comment 17 errata-xmlrpc 2019-06-04 10:42:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758