Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1792015

Summary: Cannot abort an upgrade from 4.2 to 4.3 and rollback to 4.2 - probe changes are not correctly applied (console-operator cannot be reverted)
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: Cluster Version OperatorAssignee: W. Trevor King <wking>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, bpeterse, ccoleman, jiajliu, jokerman, spadgett, wking
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1792005 Environment:
Last Closed: 2020-04-22 16:24:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1792005    
Bug Blocks:    

Description W. Trevor King 2020-01-16 21:02:04 UTC
+++ This bug was initially created as a clone of Bug #1792005 +++

+++ This bug was initially created as a clone of Bug #1792004 +++

+++ This bug was initially created as a clone of Bug #1791863 +++

--- Additional comment from Samuel Padgett on 2020-01-16 19:53:02 UTC ---

I believe this is happening because the CVO is not removing the console-operator readiness probe added in 4.3 when downgrading to 4.2.

The 4.2 operator deployment manifest does not have a readiness probe:
https://github.com/openshift/console-operator/blob/release-4.2/manifests/07-operator.yaml

The 4.3 operator deployment does:
https://github.com/openshift/console-operator/blob/release-4.3/manifests/07-operator.yaml#L66-L75

The failing 4.2 console-operator pod has the probe when it shouldn't:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/pods/console-operator-7bb76df6d6-m4qh7/console-operator-7bb76df6d6-m4qh7.yaml

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/apps/deployments.yaml

Since the 4.2 console operator has no `/readyz` endpoint, the readiness probe fails:
> Jan 16 03:34:39.681 W ns/openshift-console-operator pod/console-operator-7bb76df6d6-m4qh7 Readiness probe failed: HTTP probe failed with statuscode: 404 (390 times)

The workaround would be to edit the console-operator deployment YAML after downgrade to manually from the liveness and readiness probes.

Comment 1 Scott Dodson 2020-01-30 20:45:49 UTC
We don't support chained rollbacks, so no one would expect to upgrade from 4.1 to 4.2 to 4.3 then eventually back to 4.1, if the probe wasn't added in 4.2 should we really be fixing this in 4.1 at this point in the lifecycle?

Comment 2 Scott Dodson 2020-01-30 20:47:31 UTC
I guess other as of yet unknown 4.1 to 4.2 upgrades could be adding a liveness probe we're worried about?

Comment 3 W. Trevor King 2020-01-31 05:34:55 UTC
> I guess other as of yet unknown 4.1 to 4.2 upgrades could be adding a liveness probe we're worried about?

Yeah, this would be the only reason we'd need this in 4.1.  Mostly backporting is easier than worrying if we'll need the fix, because this change is ~9 safe lines.

Comment 6 liujia 2020-03-19 09:24:10 UTC
Since we don't support an upgrade from 4.1 to 4.2 to 4.3 then eventually back to 4.1(from comment1), and currently a probe was not added during 4.1-4.2 upgrade. So do regression test against 4.1-4.2-4.1 for this bug's verify.

Version: 4.2.0-0.nightly-2020-03-18-143046

Verify on path 4.1-4.2-4.1, console operator downgraded succeed, but other operators failed, will track it in another bug. Close this one.

Comment 8 errata-xmlrpc 2020-04-22 16:24:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1446