Description of problem: After leaving a successful IPI OCP 4.2 cluster (3 master, 2 worker) install on Azure running longer than 24 hours, several operators (authentication, monitoring, console, openshift-apiserver) got into a degraded state and/or stayed Progressing/Not available. The openshift-cluster-version operator logs show several "Unauthorized" errors for these degraded operators. Additionally I am no longer able to run some oc commands such "oc get projects", and unable to login as kubeadmin with kubeadmin password to the api-server URL. Errors from CVO logs: E0805 16:32:19.198913 1 memcache.go:135] couldn't get resource list for template.openshift.io/v1: Unauthorized E0805 16:32:19.200984 1 memcache.go:135] couldn't get resource list for user.openshift.io/v1: Unauthorized . . . E0805 16:32:39.460766 1 task.go:77] error running apply for clusteroperator "monitoring" (250 of 431): Cluster operator monitoring is reporting a failure: Failed to rollout the stack. Error: running task Updating configuration sharing failed: failed to retrieve Prometheus host: getting Route object failed: Unauthorized E0805 16:32:39.460867 1 task.go:77] error running apply for clusteroperator "openshift-apiserver" (106 of 431): Cluster operator openshift-apiserver has not yet reported success E0805 16:32:39.462231 1 sync_worker.go:311] unable to synchronize image (waiting 2m50.956499648s): Cluster operator monitoring is reporting a failure: Failed to rollout the stack. Error: running task Updating configuration sharing failed: failed to retrieve Prometheus host: getting Route object failed: Unauthorized # oc get co | grep -v "True False False" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-08-01-113533 True False True 2d4h console 4.2.0-0.nightly-2019-08-01-113533 True True True 2d4h monitoring 4.2.0-0.nightly-2019-08-01-113533 False False True 33h openshift-apiserver 4.2.0-0.nightly-2019-08-01-113533 False False False 33h For Authentication Operator: - lastTransitionTime: "2019-08-05T15:21:28Z" message: 'OAuthClientsDegraded: Unauthorized' reason: OAuthClientsDegradedError status: "True" Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-01-113533 True False 2d4h Error while reconciling 4.2.0-0.nightly-2019-08-01-113533: the cluster operator monitoring is degraded # oc version Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-alpha.0-43-g86a09cad", GitCommit:"86a09cad3831361c2f1efb70c0faa1aac611d3e0", GitTreeState:"clean", BuildDate:"2019-07-31T23:47:33Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+bf9534a", GitCommit:"bf9534a", GitTreeState:"clean", BuildDate:"2019-07-31T23:43:56Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"} OpenShift Version: 4.2.0-0.nightly-2019-08-01-113533 How reproducible: Happened once so far Steps to Reproduce: 1. IPI Install of OCP 4.2.0-0.nightly-2019-08-01-113533 on Azure 2. Initally all the cluster operators are running and available 3. Wait at least 24 hours Actual results: Some operators have degraded or progressing/not available # oc get co | grep -v "True False False" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.2.0-0.nightly-2019-08-01-113533 True False True 2d4h console 4.2.0-0.nightly-2019-08-01-113533 True True True 2d4h monitoring 4.2.0-0.nightly-2019-08-01-113533 False False True 33h openshift-apiserver 4.2.0-0.nightly-2019-08-01-113533 False False False 33h Expected results: All cluster operators after install should remain available and not progressing, not degraded Additional info: Link to must-gather logs and individual operator pod logs are provided in next comment
Blocks long running reliability tests.
*** This bug has been marked as a duplicate of bug 1736800 ***