Bug 1792015 - Cannot abort an upgrade from 4.2 to 4.3 and rollback to 4.2 - probe changes are not correctly applied (console-operator cannot be reverted)
Summary: Cannot abort an upgrade from 4.2 to 4.3 and rollback to 4.2 - probe changes a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.1.z
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
Depends On: 1792005
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-16 21:02 UTC by W. Trevor King
Modified: 2020-04-22 16:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1792005
Environment:
Last Closed: 2020-04-22 16:24:19 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 301 0 None closed Bug 1792015: lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil in required 2020-06-15 07:45:28 UTC
Red Hat Product Errata RHBA-2020:1446 0 None None None 2020-04-22 16:24:23 UTC

Description W. Trevor King 2020-01-16 21:02:04 UTC
+++ This bug was initially created as a clone of Bug #1792005 +++

+++ This bug was initially created as a clone of Bug #1792004 +++

+++ This bug was initially created as a clone of Bug #1791863 +++

--- Additional comment from Samuel Padgett on 2020-01-16 19:53:02 UTC ---

I believe this is happening because the CVO is not removing the console-operator readiness probe added in 4.3 when downgrading to 4.2.

The 4.2 operator deployment manifest does not have a readiness probe:
https://github.com/openshift/console-operator/blob/release-4.2/manifests/07-operator.yaml

The 4.3 operator deployment does:
https://github.com/openshift/console-operator/blob/release-4.3/manifests/07-operator.yaml#L66-L75

The failing 4.2 console-operator pod has the probe when it shouldn't:

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/pods/console-operator-7bb76df6d6-m4qh7/console-operator-7bb76df6d6-m4qh7.yaml

https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-rollback-4.2-to-4.3/235/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-57eb844d3650921acacb016df97664a30e55a45a554fe71a4fda297015321d0e/namespaces/openshift-console-operator/apps/deployments.yaml

Since the 4.2 console operator has no `/readyz` endpoint, the readiness probe fails:
> Jan 16 03:34:39.681 W ns/openshift-console-operator pod/console-operator-7bb76df6d6-m4qh7 Readiness probe failed: HTTP probe failed with statuscode: 404 (390 times)

The workaround would be to edit the console-operator deployment YAML after downgrade to manually from the liveness and readiness probes.

Comment 1 Scott Dodson 2020-01-30 20:45:49 UTC
We don't support chained rollbacks, so no one would expect to upgrade from 4.1 to 4.2 to 4.3 then eventually back to 4.1, if the probe wasn't added in 4.2 should we really be fixing this in 4.1 at this point in the lifecycle?

Comment 2 Scott Dodson 2020-01-30 20:47:31 UTC
I guess other as of yet unknown 4.1 to 4.2 upgrades could be adding a liveness probe we're worried about?

Comment 3 W. Trevor King 2020-01-31 05:34:55 UTC
> I guess other as of yet unknown 4.1 to 4.2 upgrades could be adding a liveness probe we're worried about?

Yeah, this would be the only reason we'd need this in 4.1.  Mostly backporting is easier than worrying if we'll need the fix, because this change is ~9 safe lines.

Comment 6 liujia 2020-03-19 09:24:10 UTC
Since we don't support an upgrade from 4.1 to 4.2 to 4.3 then eventually back to 4.1(from comment1), and currently a probe was not added during 4.1-4.2 upgrade. So do regression test against 4.1-4.2-4.1 for this bug's verify.

Version: 4.2.0-0.nightly-2020-03-18-143046

Verify on path 4.1-4.2-4.1, console operator downgraded succeed, but other operators failed, will track it in another bug. Close this one.

Comment 8 errata-xmlrpc 2020-04-22 16:24:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1446


Note You need to log in before you can comment on or make changes to this bug.