Bug 1740366

Summary: Authentication operator stuck in degraded state (RouteHealthDegraded)
Product: OpenShift Container Platform Reporter: brad.williams
Component: apiserver-authAssignee: Matt Rogers <mrogers>
Status: CLOSED ERRATA QA Contact: Wei Sun <wsun>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: aos-bugs, deads, jupierce, mfojtik, mrogers, nagrawal, rsandu, scheng, sttts
Target Milestone: ---Flags: brad.williams: needinfo-
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:35:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description brad.williams 2019-08-12 18:22:49 UTC
Description of problem:
We created a new OS cluster (4.1.9), successfully.  It was then successfully upgraded to version 4.1.10.  This cluster was configured to use the following domain:
     online-starter.openshiftapps.com
We have a weekly CI job that ran and attempted to update the certificates to use the following domain:
     online-starter.openshift.com

The job eventually failed with the following error:
  ClusterOperator not fully ready: authentication
 	Degraded=True  :: RouteHealthDegraded: failed to GET route: x509: certificate is valid for *.apps.us-west-1.online-starter.openshift.com, not oauth-openshift.apps.us-west-1.online-starter.openshiftapps.com
  	Progressing=False  :: 
  	Available=True  :: 
  	Upgradeable=True  :: 

I then attempted to correct the issue by re-updating the certificate with the correct domain.  Unfortunately, the cluster is now wedged and the Authenitication operator is stuck with the same error as above:

  ClusterOperator not fully ready: authentication
	Degraded=True  :: RouteHealthDegraded: failed to GET route: x509: certificate is valid for *.apps.us-west-1.online-starter.openshift.com, not oauth-openshift.apps.us-west-1.online-starter.openshiftapps.com
	Progressing=False  :: 
	Available=True  :: 
	Upgradeable=True  :: 


Version-Release number of selected component (if applicable):
version   4.1.10    True        False         45h     Cluster version is quay.io/openshift-release-dev/ocp-release:4.1.10


How reproducible:
Unknown

Steps to Reproduce:
1. Install 4.1.9 cluster
2. Upgrade cluster to 4.1.10
3. Update cluster certificates with a different domain name

Actual results:
The Auth operator gets wedged and will not take any further updates

Expected results:
The ability to recover from an errant configuration involving an incorrect domain.

Additional info:

Comment 12 Chuan Yu 2019-08-27 08:55:50 UTC
Verified on 4.2.0-0.nightly-2019-08-26-202352
The authentication operator could print specific messages and reason when degraded by router.

Comment 14 errata-xmlrpc 2019-10-16 06:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922