Bug 1792005

Summary: Cannot abort an upgrade from 4.2 to 4.3 and rollback to 4.2 - probe changes are not correctly applied (console-operator cannot be reverted)
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: aos-bugs, bgilbert, bpeterse, ccoleman, jiajliu, jokerman, spadgett, wking
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1792004
: 1792015 (view as bug list) Environment:
Last Closed: 2020-03-04 04:51:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1792004    
Bug Blocks: 1792015    

Description W. Trevor King 2020-01-16 20:49:29 UTC
+++ This bug was initially created as a clone of Bug #1792004 +++

+++ This bug was initially created as a clone of Bug #1791863 +++

--- Additional comment from Samuel Padgett on 2020-01-16 19:53:02 UTC ---

I believe this is happening because the CVO is not removing the console-operator readiness probe added in 4.3 when downgrading to 4.2.

The 4.2 operator deployment manifest does not have a readiness probe:
https://github.com/openshift/console-operator/blob/release-4.2/manifests/07-operator.yaml

The 4.3 operator deployment does:
https://github.com/openshift/console-operator/blob/release-4.3/manifests/07-operator.yaml#L66-L75

The failing 4.2 console-operator pod has the probe when it shouldn't:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/pods/console-operator-7bb76df6d6-m4qh7/console-operator-7bb76df6d6-m4qh7.yaml

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/apps/deployments.yaml

Since the 4.2 console operator has no `/readyz` endpoint, the readiness probe fails:
> Jan 16 03:34:39.681 W ns/openshift-console-operator pod/console-operator-7bb76df6d6-m4qh7 Readiness probe failed: HTTP probe failed with statuscode: 404 (390 times)

The workaround would be to edit the console-operator deployment YAML after downgrade to manually from the liveness and readiness probes.

Comment 4 liujia 2020-02-24 10:14:59 UTC
Version:
4.2.0-0.nightly-2020-02-23-045604

1. install ocp 4.2.0-0.nightly-2020-02-23-045604
2. trigger upgrade from 4.2.0-0.nightly-2020-02-23-045604 to 4.3.3
3. monitor the upgrade progress and all cluster operators status
4. abort above upgrade and trigger downgrade to 4.2.0-0.nightly-2020-02-23-045604 after all operators updated to target version but the upgrade status is still not 100%.
5. check downgrade from 4.3.3 to 4.2.0-0.nightly-2020-02-23-045604 failed due to another issue with machine-config, but not openshift-console-operator.
# ./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.0-0.nightly-2020-02-23-045604   True        False         False      81m
cloud-credential                           4.2.0-0.nightly-2020-02-23-045604   True        False         False      92m
cluster-autoscaler                         4.2.0-0.nightly-2020-02-23-045604   True        False         False      86m
console                                    4.2.0-0.nightly-2020-02-23-045604   True        False         False      23m
dns                                        4.2.0-0.nightly-2020-02-23-045604   True        False         False      91m
image-registry                             4.2.0-0.nightly-2020-02-23-045604   True        False         False      46m
ingress                                    4.2.0-0.nightly-2020-02-23-045604   True        False         False      43m
insights                                   4.2.0-0.nightly-2020-02-23-045604   True        False         False      92m
kube-apiserver                             4.2.0-0.nightly-2020-02-23-045604   True        False         False      89m
kube-controller-manager                    4.2.0-0.nightly-2020-02-23-045604   True        False         False      89m
kube-scheduler                             4.2.0-0.nightly-2020-02-23-045604   True        False         False      89m
machine-api                                4.2.0-0.nightly-2020-02-23-045604   True        False         False      92m
machine-config                             4.3.3                               False       True          True       3m54s
marketplace                                4.2.0-0.nightly-2020-02-23-045604   True        False         False      24m
monitoring                                 4.2.0-0.nightly-2020-02-23-045604   True        False         False      25m
network                                    4.2.0-0.nightly-2020-02-23-045604   True        False         False      91m
node-tuning                                4.2.0-0.nightly-2020-02-23-045604   True        False         False      25m
openshift-apiserver                        4.2.0-0.nightly-2020-02-23-045604   True        False         False      37m
openshift-controller-manager               4.2.0-0.nightly-2020-02-23-045604   True        False         False      89m
openshift-samples                          4.2.0-0.nightly-2020-02-23-045604   True        False         False      19m
operator-lifecycle-manager                 4.2.0-0.nightly-2020-02-23-045604   True        False         False      91m
operator-lifecycle-manager-catalog         4.2.0-0.nightly-2020-02-23-045604   True        False         False      91m
operator-lifecycle-manager-packageserver   4.2.0-0.nightly-2020-02-23-045604   True        False         False      24m
service-ca                                 4.2.0-0.nightly-2020-02-23-045604   True        False         False      91m
service-catalog-apiserver                  4.2.0-0.nightly-2020-02-23-045604   True        False         False      88m
service-catalog-controller-manager         4.2.0-0.nightly-2020-02-23-045604   True        False         False      88m
storage                                    4.2.0-0.nightly-2020-02-23-045604   True        False         False      26m

File a new bug to track the new issue in bz1806483, and close this one since original issue has been fixed.

Comment 6 errata-xmlrpc 2020-03-04 04:51:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0614