Description of problem:
During the upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807, it fails with authentication degraded.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807 hangs with authentication degraded:
oc describe co authentication shows:
Last Transition Time: 2021-05-19T14:41:33Z
Message: OAuthServiceEndpointsCheckEndpointAccessibleControllerDegraded: Get "https://10.129.0.17:6443/healthz": context canceled
Last Transition Time: 2021-05-19T14:42:48Z
Message: All is well
Last Transition Time: 2021-05-19T14:44:48Z
Message: All is well
Check the must gather log, 10.129.0.17:6443 is the ip of the pod which belongs to openshift-authentication but the pod is deleted at "May 19 14:40:35.293321" and new pods are created around "2021-05-19T14:40+" with new ips 10.130.0.38|10.128.0.53|10.129.0.56.
From the upgrade CI log, the health check happens at '[2021-05-19T17:12:11.721Z]', more than 2 hours later, but still uses the previous pod ip(10.129.0.17) not the new pod ip(10.130.0.38|10.128.0.53|10.129.0.56) to do health check. That's the failure reason for health check
must gather log link: http://file.rdu.redhat.com/~xxia/bug_1967398_must-gather.local.5095653185111688673.tar.gz
Upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807 successes.
matrix: 27_UPI on GCP with RHCOS && XPN
Test upgrade from 4.7.0-0.nightly-2021-06-07-095830 to 4.8.0-0.nightly-2021-06-07-180258
$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-07-180258 --force=true --allow-explicit-upgrade=true
During the upgrade process, force update the oauth configuration 5 times to redeploy new pods with new ips, original issue hanging with old pod's IP is gone
$ oc edit oauth cluster
Check cluster version after upgarde finished
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-06-07-180258 True False 123m Cluster version is 4.8.0-0.nightly-2021-06-07-180258
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.