Bug 1973983 - 4.7 nightly upgrade to 4.8 and then downgrade back to 4.7 nightly doesn't work - ingresscontroller "default" is degraded
Summary: 4.7 nightly upgrade to 4.8 and then downgrade back to 4.7 nightly doesn't wor...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Stephen Greene
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On: 1975964
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-19 16:44 UTC by To Hung Sze
Modified: 2022-08-04 22:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1975964 (view as bug list)
Environment:
Last Closed: 2021-07-26 17:35:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-ingress-operator pull 627 0 None open [release-4.7] Bug 1973983: Canary: Handle downgrades from 4.8 to 4.7 properly 2021-06-22 19:56:13 UTC
Red Hat Product Errata RHBA-2021:2762 0 None None None 2021-07-26 17:35:45 UTC

Description To Hung Sze 2021-06-19 16:44:40 UTC
Description of problem:
Upgrade from 4.7 nightly -> 4.8.0-fc.8 works.
Then downgrading back to 4.7 nightly fails.

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-06-17-173140

How reproducible: always (2 out of 2 tries)

Steps to Reproduce:
1. Install IPI on GCP
2. Upgrade to 4.8.0-fc.8 works
3. Downgrade back to 4.7 nightly fails

OpenShift release version:
4.7.0-0.nightly-2021-06-17-173140

Cluster Platform:
GCP


Actual results:
$ ./oc adm upgrade
info: An upgrade is in progress. Unable to apply 4.7.0-0.nightly-2021-06-17-173140: an unknown error has occurred: MultipleErrors

./oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2021-06-17-173140   True        False         False      155m
baremetal                                  4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
cloud-credential                           4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
cluster-autoscaler                         4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
config-operator                            4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
console                                    4.7.0-0.nightly-2021-06-17-173140   True        False         False      23h
csi-snapshot-controller                    4.7.0-0.nightly-2021-06-17-173140   True        False         True       27h
dns                                        4.8.0-0.nightly-2021-06-18-055840   True        False         False      25h
etcd                                       4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
image-registry                             4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
ingress                                    4.7.0-0.nightly-2021-06-17-173140   True        False         True       23h
insights                                   4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
kube-apiserver                             4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
kube-controller-manager                    4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
kube-scheduler                             4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
kube-storage-version-migrator              4.7.0-0.nightly-2021-06-17-173140   True        False         False      24h
machine-api                                4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
machine-approver                           4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
machine-config                             4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
marketplace                                4.7.0-0.nightly-2021-06-17-173140   True        False         False      23h
monitoring                                 4.7.0-0.nightly-2021-06-17-173140   True        False         False      23h
network                                    4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
node-tuning                                4.8.0-0.nightly-2021-06-18-055840   True        False         False      25h
openshift-apiserver                        4.7.0-0.nightly-2021-06-17-173140   True        False         False      24h
openshift-controller-manager               4.7.0-0.nightly-2021-06-17-173140   True        False         False      27h
openshift-samples                          4.7.0-0.nightly-2021-06-17-173140   True        False         False      23h
operator-lifecycle-manager                 4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
service-ca                                 4.8.0-0.nightly-2021-06-18-055840   True        False         False      27h
storage                                    4.7.0-0.nightly-2021-06-17-173140   True        False         False      24h



Expected results:
Downgrade succeeds - while downgrade may not be officially supported, it has been working last few releases.

Impact of the problem:
Downgrade fails

Additional info:
must-gather shows

When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 9d668a63-310a-45b1-b5f6-0af9fe23caab
ClusterVersion: Updating to "4.7.0-0.nightly-2021-06-17-173140" from "4.8.0-0.nightly-2021-06-18-055840" for 4 hours: Unable to apply 4.7.0-0.nightly-2021-06-17-173140: an unknown error has occurred: MultipleErrors
ClusterOperators:
	clusteroperator/csi-snapshot-controller is degraded because CSISnapshotStaticResourceControllerDegraded: "csi_controller_deployment_pdb.yaml" (string): the server could not find the requested resource
CSISnapshotStaticResourceControllerDegraded: "webhook_deployment_pdb.yaml" (string): the server could not find the requested resource
CSISnapshotStaticResourceControllerDegraded: 
	clusteroperator/ingress is degraded because Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 To Hung Sze 2021-06-19 16:45:30 UTC
Previous downgrade bug
https://bugzilla.redhat.com/show_bug.cgi?id=1971087

Comment 2 To Hung Sze 2021-06-19 16:47:06 UTC
Must gather is bigger than allowed - available to share. Please let me know with whom I should share it with.

Comment 5 To Hung Sze 2021-06-22 19:15:04 UTC
Shared must-gather with you @sgreene @mmasters

Comment 6 Stephen Greene 2021-06-22 19:55:24 UTC
I have identified the problem and posted a fix

https://github.com/openshift/cluster-ingress-operator/pull/627

The manual workaround would be to delete the command values set on the canary daemonset.

Comment 9 Hongan Li 2021-07-02 09:09:22 UTC
upgrade from 4.7.0-0.nightly-2021-06-30-221453 to 4.8.0-0.nightly-2021-07-01-185624 then download to 4.7.0-0.nightly-2021-06-30-221453, the co/ingress is not degraded.

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2021-06-30-221453   True        False         False      80m
baremetal                                  4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h10m
cloud-credential                           4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h15m
cluster-autoscaler                         4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h9m
config-operator                            4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h10m
console                                    4.7.0-0.nightly-2021-06-30-221453   True        False         False      81m
csi-snapshot-controller                    4.7.0-0.nightly-2021-06-30-221453   True        False         True       6h9m
dns                                        4.8.0-0.nightly-2021-07-01-185624   True        False         False      3h25m
etcd                                       4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h9m
image-registry                             4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h
ingress                                    4.7.0-0.nightly-2021-06-30-221453   True        False         False      82m
insights                                   4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h2m
kube-apiserver                             4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h6m
kube-controller-manager                    4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h8m
kube-scheduler                             4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h7m
kube-storage-version-migrator              4.7.0-0.nightly-2021-06-30-221453   True        False         False      3h9m
machine-api                                4.7.0-0.nightly-2021-06-30-221453   True        False         False      5h58m
machine-approver                           4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h10m
machine-config                             4.8.0-0.nightly-2021-07-01-185624   True        False         False      178m
marketplace                                4.7.0-0.nightly-2021-06-30-221453   True        False         False      81m
monitoring                                 4.7.0-0.nightly-2021-06-30-221453   True        False         False      81m
network                                    4.8.0-0.nightly-2021-07-01-185624   True        False         False      6h10m
node-tuning                                4.7.0-0.nightly-2021-06-30-221453   True        False         False      82m
openshift-apiserver                        4.7.0-0.nightly-2021-06-30-221453   True        False         False      3h
openshift-controller-manager               4.7.0-0.nightly-2021-06-30-221453   True        False         False      3h31m
openshift-samples                          4.7.0-0.nightly-2021-06-30-221453   True        False         False      82m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h9m
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h9m
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-06-30-221453   True        False         False      6h2m
service-ca                                 4.8.0-0.nightly-2021-07-01-185624   True        False         False      6h10m
storage                                    4.7.0-0.nightly-2021-06-30-221453   True        False         False      3h3m



please note: the downgrade still stuck on co/csi-snapshot-controller and that issue is tracked by another BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1973986

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-07-01-185624   True        True          102m    Unable to apply 4.7.0-0.nightly-2021-06-30-221453: wait has exceeded 40 minutes for these operators: csi-snapshot-controller

Comment 12 Siddharth Sharma 2021-07-09 21:07:49 UTC
OCP engineering has decided to not ship 4.7.20 due to a blocker. This bug will be shipped as part of next z-stream release 4.7.21 planned on July 27th

Comment 15 errata-xmlrpc 2021-07-26 17:35:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.21 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2762


Note You need to log in before you can comment on or make changes to this bug.