Bug 1967398
| Summary: | authentication operator still uses previous deleted pod ip rather than the new created pod ip to do health check | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | liyao |
| Component: | apiserver-auth | Assignee: | Standa Laznicka <slaznick> |
| Status: | CLOSED ERRATA | QA Contact: | pmali |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | aos-bugs, mfojtik, scuppett, surbania |
| Target Milestone: | --- | Keywords: | Upgrades |
| Target Release: | 4.8.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
A stale condition might have cause the authentication operator to appear degraded after upgrade even though there were no problems.
Consequence:
False-positive cluster degradation.
Fix:
Remove old and unused conditions from the operator's status.
Result:
The authentication operator should correctly report as "Degraded" only when there is an actual problem.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:11:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Test upgrade from 4.7.0-0.nightly-2021-06-07-095830 to 4.8.0-0.nightly-2021-06-07-180258 $ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-07-180258 --force=true --allow-explicit-upgrade=true During the upgrade process, force update the oauth configuration 5 times to redeploy new pods with new ips, original issue hanging with old pod's IP is gone $ oc edit oauth cluster Check cluster version after upgarde finished $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-07-180258 True False 123m Cluster version is 4.8.0-0.nightly-2021-06-07-180258 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: During the upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807, it fails with authentication degraded. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807 How reproducible: Not sure Steps to Reproduce: 1. 2. 3. Actual results: Upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807 hangs with authentication degraded: oc describe co authentication shows: Conditions: Last Transition Time: 2021-05-19T14:41:33Z Message: OAuthServiceEndpointsCheckEndpointAccessibleControllerDegraded: Get "https://10.129.0.17:6443/healthz": context canceled Reason: OAuthServiceEndpointsCheckEndpointAccessibleController_SyncError Status: True Type: Degraded Last Transition Time: 2021-05-19T14:42:48Z Message: All is well Reason: AsExpected Status: False Type: Progressing Last Transition Time: 2021-05-19T14:44:48Z Message: All is well Reason: AsExpected Status: True Type: Available Check the must gather log, 10.129.0.17:6443 is the ip of the pod which belongs to openshift-authentication but the pod is deleted at "May 19 14:40:35.293321" and new pods are created around "2021-05-19T14:40+" with new ips 10.130.0.38|10.128.0.53|10.129.0.56. From the upgrade CI log, the health check happens at '[2021-05-19T17:12:11.721Z]', more than 2 hours later, but still uses the previous pod ip(10.129.0.17) not the new pod ip(10.130.0.38|10.128.0.53|10.129.0.56) to do health check. That's the failure reason for health check must gather log link: http://file.rdu.redhat.com/~xxia/bug_1967398_must-gather.local.5095653185111688673.tar.gz Expected results: Upgrade from 4.7.0-0.nightly-2021-05-17-040457 to 4.8.0-0.nightly-2021-05-19-092807 successes. Additional info: matrix: 27_UPI on GCP with RHCOS && XPN