Bug 1874713 - OAuthServerDeploymentProgressing stays true even though the deployment is ready and at the expected generation
Summary: OAuthServerDeploymentProgressing stays true even though the deployment is rea...
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Standa Laznicka
QA Contact: pmali
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-02 04:48 UTC by Praveen Kumar
Modified: 2020-09-15 12:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 332 None closed Bug 1874713: deployment: don't panic when applying deployment fails 2020-09-15 07:44:19 UTC

Description Praveen Kumar 2020-09-02 04:48:43 UTC
Description of problem: When running the openshift 4.6 latest nightly for a single node setup the installation doesn't get succeed and have hit with following error `Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1`


Version-Release number of selected component (if applicable):
http://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.6.0-0.nightly-2020-09-01-070508/


Steps to Reproduce:
```
$ git clone https://github.com/code-ready/snc
$ cd snc
$ export MIRROR=https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/
$ export OPENSHIFT_VERSION=4.6.0-0.nightly-2020-08-10-233406
$ export OPENSHIFT_PULL_SECRET_PATH=<pull_secret_path>
$ ./snc.sh
```


Actual results:
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1 


Expected results:
Cluster should able to provision successfully


Additional info: Looking at the code changes from the commit https://github.com/openshift/cluster-authentication-operator/commit/a08be2324f36 , we tried to set the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer: true` for the operator spec under `UnsupportedConfigOverrides` section but then also it is not able to become available.

```
$ oc get co authentication -oyaml
status:                                                             
  conditions:                                                                         
  - lastTransitionTime: "2020-09-02T03:28:55Z"                               
    reason: AsExpected                                                              
    status: "False"                                                      
    type: Degraded                                                                   
  - lastTransitionTime: "2020-09-02T03:10:11Z"                                    
    message: 'OAuthServerDeploymentProgressing: Waiting for OAuth server observed
      generation 0 to match expected generation 2'                            
    reason: OAuthServerDeployment_GenerationNotObserved                             
    status: "True"                                                    
    type: Progressing                                                                                                                                                          - lastTransitionTime: "2020-09-02T02:44:30Z"                                                                                                                                   message: 'OAuthServerDeploymentAvailable: availableReplicas==0'                                                                                                              reason: OAuthServerDeployment_NoReplicas                                                                                                                                     status: "False"                                                   
    type: Available                                                     
  - lastTransitionTime: "2020-09-02T03:28:51Z"                 
    message: 'UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]'
    reason: UnsupportedConfigOverrides_UnsupportedConfigOverridesSet       
    status: "False"                                                
    type: Upgradeable 
```

Must gather logs https://www.dropbox.com/s/gbbqz74dng2e5h0/must-gather.local.4356597088010527217.tar.gz?dl=0

Comment 1 Standa Laznicka 2020-09-02 07:57:25 UTC
Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

```
2020-09-02T03:27:48.788621894Z E0902 03:27:48.788488       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:18.789036493Z E0902 03:28:18.788883       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:48.791781180Z E0902 03:28:48.791688       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1  /* HERE YOU CONFIGURED THE UNSUPPORTED OPTION ---V */
2020-09-02T03:28:51.874892131Z I0902 03:28:51.866258       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas::WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:51.886430023Z I0902 03:28:51.880036       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Upgradeable changed from True to False ("UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]")
2020-09-02T03:28:53.429528688Z I0902 03:28:53.429413       1 request.go:645] Throttling request took 1.143988842s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:55.592946849Z I0902 03:28:55.592807       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.604884728Z I0902 03:28:55.604779       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Available message changed from "WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0" to "OAuthServerDeploymentAvailable: availableReplicas==0"
2020-09-02T03:28:55.639954439Z I0902 03:28:55.639800       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T03:28:55Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.664729629Z I0902 03:28:55.664650       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("") 
2020-09-02T03:28:56.630817683Z I0902 03:28:56.630241       1 request.go:645] Throttling request took 1.038637861s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:57.829527333Z I0902 03:28:57.829444       1 request.go:645] Throttling request took 1.190364164s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/services/api
```

current status:
```
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "True"
    type: WellKnownAvailable
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "False"
    type: WellKnownReadyControllerDegraded
```

I found a different bug in setting the deployment generations (the actual reason the operator stayes progressing) that I'll fix as a part of this BZ

Comment 2 Praveen Kumar 2020-09-02 09:00:19 UTC
> Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

@Standa the logs which you shared is from auth operator pods logs? Also as I said in the BZ that if we don't update the auth operator config with unsupported one then we are getting the error which I put earlier. I am not sure which other logs to look into and after making changes, I checked the cvo status for auth and same what I put in the BZ along with must-gather logs.

Comment 5 Praveen Kumar 2020-09-03 12:37:24 UTC
I just tested the latest nightly which have this PR in and now I can see the auth operator's status as `Available` and not progressing after the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer` config set to true.


Note You need to log in before you can comment on or make changes to this bug.