Bug 1740366 - Authentication operator stuck in degraded state (RouteHealthDegraded)
Summary: Authentication operator stuck in degraded state (RouteHealthDegraded)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.0
Assignee: Matt Rogers
QA Contact: Wei Sun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-12 18:22 UTC by brad.williams
Modified: 2019-10-24 03:06 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:35:39 UTC
Target Upstream Version:
Embargoed:
brad.williams: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 184 0 'None' closed Bug 1740366: Add strict validation for router certs 2021-02-11 17:24:11 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:35:53 UTC

Description brad.williams 2019-08-12 18:22:49 UTC
Description of problem:
We created a new OS cluster (4.1.9), successfully.  It was then successfully upgraded to version 4.1.10.  This cluster was configured to use the following domain:
     online-starter.openshiftapps.com
We have a weekly CI job that ran and attempted to update the certificates to use the following domain:
     online-starter.openshift.com

The job eventually failed with the following error:
  ClusterOperator not fully ready: authentication
 	Degraded=True  :: RouteHealthDegraded: failed to GET route: x509: certificate is valid for *.apps.us-west-1.online-starter.openshift.com, not oauth-openshift.apps.us-west-1.online-starter.openshiftapps.com
  	Progressing=False  :: 
  	Available=True  :: 
  	Upgradeable=True  :: 

I then attempted to correct the issue by re-updating the certificate with the correct domain.  Unfortunately, the cluster is now wedged and the Authenitication operator is stuck with the same error as above:

  ClusterOperator not fully ready: authentication
	Degraded=True  :: RouteHealthDegraded: failed to GET route: x509: certificate is valid for *.apps.us-west-1.online-starter.openshift.com, not oauth-openshift.apps.us-west-1.online-starter.openshiftapps.com
	Progressing=False  :: 
	Available=True  :: 
	Upgradeable=True  :: 


Version-Release number of selected component (if applicable):
version   4.1.10    True        False         45h     Cluster version is quay.io/openshift-release-dev/ocp-release:4.1.10


How reproducible:
Unknown

Steps to Reproduce:
1. Install 4.1.9 cluster
2. Upgrade cluster to 4.1.10
3. Update cluster certificates with a different domain name

Actual results:
The Auth operator gets wedged and will not take any further updates

Expected results:
The ability to recover from an errant configuration involving an incorrect domain.

Additional info:

Comment 12 Chuan Yu 2019-08-27 08:55:50 UTC
Verified on 4.2.0-0.nightly-2019-08-26-202352
The authentication operator could print specific messages and reason when degraded by router.

Comment 14 errata-xmlrpc 2019-10-16 06:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.