Bug 1874713 - OAuthServerDeploymentProgressing stays true even though the deployment is ready and at the expected generation
Summary: OAuthServerDeploymentProgressing stays true even though the deployment is rea...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Standa Laznicka
QA Contact: pmali
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-02 04:48 UTC by Praveen Kumar
Modified: 2020-11-02 08:16 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:36:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-authentication-operator pull 332 0 None closed Bug 1874713: deployment: don't panic when applying deployment fails 2021-02-09 10:19:17 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:37:10 UTC

Description Praveen Kumar 2020-09-02 04:48:43 UTC
Description of problem: When running the openshift 4.6 latest nightly for a single node setup the installation doesn't get succeed and have hit with following error `Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1`


Version-Release number of selected component (if applicable):
http://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.6.0-0.nightly-2020-09-01-070508/


Steps to Reproduce:
```
$ git clone https://github.com/code-ready/snc
$ cd snc
$ export MIRROR=https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/
$ export OPENSHIFT_VERSION=4.6.0-0.nightly-2020-08-10-233406
$ export OPENSHIFT_PULL_SECRET_PATH=<pull_secret_path>
$ ./snc.sh
```


Actual results:
DEBUG Still waiting for the cluster to initialize: Cluster operator authentication is reporting a failure: WellKnownReadyControllerDegraded: need at least 3 kube-apiservers,
 got 1 


Expected results:
Cluster should able to provision successfully


Additional info: Looking at the code changes from the commit https://github.com/openshift/cluster-authentication-operator/commit/a08be2324f36 , we tried to set the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer: true` for the operator spec under `UnsupportedConfigOverrides` section but then also it is not able to become available.

```
$ oc get co authentication -oyaml
status:                                                             
  conditions:                                                                         
  - lastTransitionTime: "2020-09-02T03:28:55Z"                               
    reason: AsExpected                                                              
    status: "False"                                                      
    type: Degraded                                                                   
  - lastTransitionTime: "2020-09-02T03:10:11Z"                                    
    message: 'OAuthServerDeploymentProgressing: Waiting for OAuth server observed
      generation 0 to match expected generation 2'                            
    reason: OAuthServerDeployment_GenerationNotObserved                             
    status: "True"                                                    
    type: Progressing                                                                                                                                                          - lastTransitionTime: "2020-09-02T02:44:30Z"                                                                                                                                   message: 'OAuthServerDeploymentAvailable: availableReplicas==0'                                                                                                              reason: OAuthServerDeployment_NoReplicas                                                                                                                                     status: "False"                                                   
    type: Available                                                     
  - lastTransitionTime: "2020-09-02T03:28:51Z"                 
    message: 'UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]'
    reason: UnsupportedConfigOverrides_UnsupportedConfigOverridesSet       
    status: "False"                                                
    type: Upgradeable 
```

Must gather logs https://www.dropbox.com/s/gbbqz74dng2e5h0/must-gather.local.4356597088010527217.tar.gz?dl=0

Comment 1 Standa Laznicka 2020-09-02 07:57:25 UTC
Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

```
2020-09-02T03:27:48.788621894Z E0902 03:27:48.788488       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:18.789036493Z E0902 03:28:18.788883       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1
2020-09-02T03:28:48.791781180Z E0902 03:28:48.791688       1 base_controller.go:220] "WellKnownReadyController" controller failed to sync "key", err: need at least 3 kube-apiservers, got 1  /* HERE YOU CONFIGURED THE UNSUPPORTED OPTION ---V */
2020-09-02T03:28:51.874892131Z I0902 03:28:51.866258       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas::WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:51.886430023Z I0902 03:28:51.880036       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Upgradeable changed from True to False ("UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]")
2020-09-02T03:28:53.429528688Z I0902 03:28:53.429413       1 request.go:645] Throttling request took 1.143988842s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:55.592946849Z I0902 03:28:55.592807       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T02:46:15Z","message":"WellKnownReadyControllerDegraded: need at least 3 kube-apiservers, got 1","reason":"WellKnownReadyController_SyncError","status":"True","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.604884728Z I0902 03:28:55.604779       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Available message changed from "WellKnownAvailable: The well-known endpoint is not yet available: need at least 3 kube-apiservers, got 1\nOAuthServerDeploymentAvailable: availableReplicas==0" to "OAuthServerDeploymentAvailable: availableReplicas==0"
2020-09-02T03:28:55.639954439Z I0902 03:28:55.639800       1 status_controller.go:172] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2020-09-02T03:28:55Z","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2020-09-02T03:10:11Z","message":"OAuthServerDeploymentProgressing: Waiting for OAuth server observed generation 0 to match expected generation 2","reason":"OAuthServerDeployment_GenerationNotObserved","status":"True","type":"Progressing"},{"lastTransitionTime":"2020-09-02T02:44:30Z","message":"OAuthServerDeploymentAvailable: availableReplicas==0","reason":"OAuthServerDeployment_NoReplicas","status":"False","type":"Available"},{"lastTransitionTime":"2020-09-02T03:28:51Z","message":"UnsupportedConfigOverridesUpgradeable: setting: [useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer]","reason":"UnsupportedConfigOverrides_UnsupportedConfigOverridesSet","status":"False","type":"Upgradeable"}]}}
2020-09-02T03:28:55.664729629Z I0902 03:28:55.664650       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"637b9fbe-22ae-41b6-abd3-562b39643cda", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded changed from True to False ("") 
2020-09-02T03:28:56.630817683Z I0902 03:28:56.630241       1 request.go:645] Throttling request took 1.038637861s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver
2020-09-02T03:28:57.829527333Z I0902 03:28:57.829444       1 request.go:645] Throttling request took 1.190364164s, request: GET:https://172.25.0.1:443/api/v1/namespaces/openshift-oauth-apiserver/services/api
```

current status:
```
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "True"
    type: WellKnownAvailable
  - lastTransitionTime: "2020-09-02T03:28:55Z"
    reason: AsExpected
    status: "False"
    type: WellKnownReadyControllerDegraded
```

I found a different bug in setting the deployment generations (the actual reason the operator stayes progressing) that I'll fix as a part of this BZ

Comment 2 Praveen Kumar 2020-09-02 09:00:19 UTC
> Please read the logs more carefully next time and check the current status of the operator, you mislead me when I tried to troubleshoot this. Your option was properly honored:

@Standa the logs which you shared is from auth operator pods logs? Also as I said in the BZ that if we don't update the auth operator config with unsupported one then we are getting the error which I put earlier. I am not sure which other logs to look into and after making changes, I checked the cvo status for auth and same what I put in the BZ along with must-gather logs.

Comment 5 Praveen Kumar 2020-09-03 12:37:24 UTC
I just tested the latest nightly which have this PR in and now I can see the auth operator's status as `Available` and not progressing after the `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer` config set to true.

Comment 8 errata-xmlrpc 2020-10-27 16:36:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.