Bug 1874713

Summary: OAuthServerDeploymentProgressing stays true even though the deployment is ready and at the expected generation
Product: OpenShift Container Platform Reporter: Praveen Kumar <prkumar>
Component: apiserver-authAssignee: Standa Laznicka <slaznick>
Status: CLOSED ERRATA QA Contact: pmali
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.6CC: aos-bugs, cfergeau, mfojtik, mfuruta, pasik
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:36:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Praveen Kumar 2020-09-02 04:48:43 UTC
Description of problem: When running the openshift 4.6 latest nightly for a single node setup the installation doesn't get succeed and have hit with following error `Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1`


Version-Release number of selected component (if applicable):
http://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.6.0-0.nightly-2020-09-01-070508/


Steps to Reproduce:
```
$ git clone https://github.com/code-ready/snc
$ cd snc
$ export MIRROR=https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/
$ export OPENSHIFT_VERSION=4.6.0-0.nightly-2020-08-10-233406
$ export OPENSHIFT_PULL_SECRET_PATH=<pull_secret_path>
$ ./snc.sh
```


Actual results:
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1 


Expected results:
Cluster should able to provision successfully


Additional info: Looking at the code changes from the commit https://github.com/openshift/cluster-authentication-operator/commit/a08be2324f36 , we tried to set the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer: true` for the operator spec under `UnsupportedConfigOverrides` section but then also it is not able to become available.

```
$ oc get co authentication -oyaml
status:                                                             
  conditions:                                                                         
  - lastTransitionTime: "2020-09-02T03:28:55Z"                               
    reason: AsExpected                                                              
    status: "False"                                                      
    type: Degraded                                                                   
  - lastTransitionTime: "2020-09-02T03:10:11Z"                                    
    message: 'OAuthServerDeploymentProgressing: Waiting for OAuth server observed
      generation 0 to match expected generation 2'                            
    reason: OAuthServerDeployment_GenerationNotObserved                             
    status: "True"                                                    
    type: Progressing                                                                                                                                                          - lastTransitionTime: "2020-09-02T02:44:30Z"                                                                                                                                   message: 'OAuthServerDeploymentAvailable: availableReplicas==0'                                                                                                              reason: OAuthServerDeployment_NoReplicas                                                                                                                                     status: "False"                                                   
    type: Available                                                     
  - lastTransitionTime: "2020-09-02T03:28:51Z"                 
    message: 'UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]'
    reason: UnsupportedConfigOverrides_UnsupportedConfigOverridesSet       
    status: "False"                                                
    type: Upgradeable 
```

Must gather logs https://www.dropbox.com/s/gbbqz74dng2e5h0/must-gather.local.4356597088010527217.tar.gz?dl=0

Comment 1 Standa Laznicka 2020-09-02 07:57:25 UTC
Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

```
2020-09-02T03:27:48.788621894Z E0902 03:27:48.788488       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:18.789036493Z E0902 03:28:18.788883       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:48.791781180Z E0902 03:28:48.791688       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1  /* HERE YOU CONFIGURED THE UNSUPPORTED OPTION ---V */
2020-09-02T03:28:51.874892131Z I0902 03:28:51.866258       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas::WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:51.886430023Z I0902 03:28:51.880036       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Upgradeable changed from True to False ("UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]")
2020-09-02T03:28:53.429528688Z I0902 03:28:53.429413       1 request.go:645] Throttling request took 1.143988842s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:55.592946849Z I0902 03:28:55.592807       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.604884728Z I0902 03:28:55.604779       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Available message changed from "WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0" to "OAuthServerDeploymentAvailable: availableReplicas==0"
2020-09-02T03:28:55.639954439Z I0902 03:28:55.639800       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T03:28:55Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.664729629Z I0902 03:28:55.664650       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("") 
2020-09-02T03:28:56.630817683Z I0902 03:28:56.630241       1 request.go:645] Throttling request took 1.038637861s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:57.829527333Z I0902 03:28:57.829444       1 request.go:645] Throttling request took 1.190364164s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/services/api
```

current status:
```
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "True"
    type: WellKnownAvailable
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "False"
    type: WellKnownReadyControllerDegraded
```

I found a different bug in setting the deployment generations (the actual reason the operator stayes progressing) that I'll fix as a part of this BZ

Comment 2 Praveen Kumar 2020-09-02 09:00:19 UTC
> Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

@Standa the logs which you shared is from auth operator pods logs? Also as I said in the BZ that if we don't update the auth operator config with unsupported one then we are getting the error which I put earlier. I am not sure which other logs to look into and after making changes, I checked the cvo status for auth and same what I put in the BZ along with must-gather logs.

Comment 5 Praveen Kumar 2020-09-03 12:37:24 UTC
I just tested the latest nightly which have this PR in and now I can see the auth operator's status as `Available` and not progressing after the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer` config set to true.

Comment 8 errata-xmlrpc 2020-10-27 16:36:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196