Bug 1673993
| Summary: | router deployment is not upgraded on cluster upgrade | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Seth Jennings <sjenning> | ||||||||||||
| Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> | ||||||||||||
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> | ||||||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||||||
| Severity: | urgent | ||||||||||||||
| Priority: | urgent | CC: | aos-bugs, dmace, hongli | ||||||||||||
| Version: | 4.1.0 | ||||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | 4.1.0 | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2019-06-04 10:42:43 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.
$ oc get deployment -o yaml -n openshift-ingress-operator | grep image
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:736bb8db8e15b80fd6a51525364c5b86f90d4757e6930b20a80abfdab03f5a42
$ oc get deployment -o yaml -n openshift-ingress | grep image
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
> upgrade from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-14-040908, both operator and router deployment are not updated.
The CVO is responsible for updating the operator, so I do not understand why the upgrade would fail to update the operator image.
I have verified that changing the IMAGE environment variable in the operator deployment causes the operator to roll out a new router deployment with the updated image, so I do not understand why the upgrade is failing to update the router deployment.
Can you provide the following?
1. yaml of the deployment for the operator
2. yaml of the deployment for the router
3. logs for the operator
4. yaml of the "ingress" clusteroperator
5. yaml of the "version" clusterversion
Created attachment 1544255 [details]
deployment of operator
Created attachment 1544256 [details]
deployment of router
Created attachment 1544257 [details]
ingresscontroller
Created attachment 1544258 [details]
clusterversion
Created attachment 1544259 [details]
logs of operator
And another issue is the operator's .status.conditions is not updated after upgrade.
$ oc get clusteroperator ingress
NAME VERSION AVAILABLE PROGRESSING FAILING SINCE
ingress 4.0.0-0.nightly-2019-03-14-040908 True False False 20h
$ oc get clusteroperator ingress -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
creationTimestamp: 2019-03-14T03:49:35Z
generation: 1
name: ingress
resourceVersion: "215168"
selfLink: /apis/config.openshift.io/v1/clusteroperators/ingress
uid: 2f275d66-460c-11e9-aa58-067ff0e0256a
spec: {}
status:
conditions:
- lastTransitionTime: 2019-03-14T03:49:37Z
status: "False"
type: Failing
- lastTransitionTime: 2019-03-14T03:49:37Z
status: "False"
type: Progressing
- lastTransitionTime: 2019-03-14T07:33:30Z
status: "True"
type: Available
extension: null
relatedObjects:
- group: ""
name: openshift-ingress-operator
resource: namespaces
- group: ""
name: openshift-ingress
resource: namespaces
versions:
- name: operator
version: 4.0.0-0.nightly-2019-03-14-040908
- name: ingress-controller
version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
I compared the release payloads for 4.0.0-0.nightly-2019-03-13-233958 and 4.0.0-0.nightly-2019-03-14-040908:
1. mkdir 4.0.0-0.nightly-2019-03-14-040908
2. cd 4.0.0-0.nightly-2019-03-14-040908
3. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-14-040908 --only-files
4. cd ..
5. mkdir 4.0.0-0.nightly-2019-03-13-233958
6. cd 4.0.0-0.nightly-2019-03-13-233958
7. oc image extract registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-13-233958 --only-files
8. cd ..
9. diff -u */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml
10. grep -A1 -e IMAGE -- */release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml
Step 9 has the following output:
--- 4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml 2019-03-13 23:41:34.000000000 +0000
+++ 4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml 2019-03-14 04:11:08.000000000 +0000
@@ -32,7 +32,7 @@
- cluster-ingress-operator
env:
- name: RELEASE_VERSION
- value: "4.0.0-0.nightly-2019-03-13-233958"
+ value: "4.0.0-0.nightly-2019-03-14-040908"
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
Step 10 has the following output:
4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml: - name: IMAGE
4.0.0-0.nightly-2019-03-13-233958/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml- value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
--
4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml: - name: IMAGE
4.0.0-0.nightly-2019-03-14-040908/release-manifests/0000_70_cluster-ingress-operator_02-deployment.yaml- value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7f9343c326e3e92e5e4b1b23f9626e9299539a0737fc386121a05585871bcbc9
The upgrade's release payload made no change to the operator image version, so not updating the operator image is the correct behavior.
You updated to 4.0.0-0.nightly-2019-03-14-040908, right? So this looks correct:
versions:
- name: operator
version: 4.0.0-0.nightly-2019-03-14-040908
Thanks for you explanation, Miciah. I will check and try another available target version. And another question is the case when the bug reported, is that possible that the operator image changed but router image no change? or router image changed but operator image no change? The operator deployment's container image and the router deployment's container image can be different and can change independently. The operator deployment's container image should not matter for verifying this bug.
What matters are the "IMAGE" environment variable (which specifies the image to use for the router) and the router deployment's container image. The problem that Seth reported was that the operator deployment's "IMAGE" variable specified the following router image:
$ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n
...
spec:
containers:
...
- name: IMAGE
value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b
and the router deployment had a different image:
$ oc get deployment -oyaml router-default
...
spec:
...
template:
...
spec:
containers:
...
image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version
Note "2019-02-08-113402" versus "2019-02-08-055616".
If an upgrade changes the router image, then CVO should update the "IMAGE" variable in the operator deployment, and then cluster-ingress-operator should update the container image in the router deployment to match the "IMAGE" variable. If cluster-ingress-operator updates the router deployment to have the image specified in the "IMAGE" variable, then I believe this bug is fixed.
verified with upgrading from 4.0.0-0.nightly-2019-03-13-233958 to 4.0.0-0.nightly-2019-03-15-043409, and the issue has been fixed.
after upgrade the router image has been updated:
$ oc get deployment -n openshift-ingress-operator -o yaml | grep quay -C 1
- name: IMAGE
value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:aca829201f14e91f1d8be8f46f8e71556a56aefd12600ad2ce44972e3e622e99
imagePullPolicy: IfNotPresent
$ oc get deployment -n openshift-ingress -o yaml | grep quay
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:930f94f53c1062d94d226afa25a7e85f321ba2bfd63b5c481cc2c1d8121a2a49
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
After an upgrade, router deployment is not upgraded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-02-08-113402 True False 44m Cluster version is 4.0.0-0.alpha-2019-02-08-113402 $ oc get pod -oyaml ingress-operator-6c7d78f9ff-6b78n ... spec: containers: ... - name: IMAGE value: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-113402@sha256:05b39eda84a9e22db9c0173b415c49296ec16b9d8e3585a96fea9a0bf08b4c0b $ oc get deployment -oyaml router-default ... spec: ... template: ... spec: containers: ... image: registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-02-08-055616@sha256:6991fb24697317cb8a1b8a4cfd129d77d05a199f382a4c5ba7eae7ad55bb386b <-- does not match cluster version