Description of problem: In an attempt to diagnose a situation where the openshift-apiserver operator was reporting that it was not available, progressing, or degraded. In an attempt to remediate the problem, configmaps, secrets and the operator pod were deleted. The operator pod is now in a crashloopbackoff Impact: Numerous services in the cluster are failing with a 503 due to the openshift-apiserver not being available. Version-Release number of selected component (if applicable): 4.3.10 Azure IPI How reproducible: Consistently Steps to Reproduce: 1. oc delete --all secrets -n openshift-apiserver-operator 2. oc delete --all cm -n openshift-apiserver-operator 3. oc delete --all pods -n openshift-apiserver-operator Actual results: Operator is not restoring managed resources Expected results: Operator should restore managed resources Additional info: $ oc logs -f openshift-apiserver-operator-7b5648c9dd-8spms I0416 20:53:28.199998 1 cmd.go:188] Using service-serving-cert provided certificates I0416 20:53:28.200982 1 observer_polling.go:137] Starting file observer W0416 20:54:28.215408 1 builder.go:181] unable to get owner reference (falling back to namespace): the server was unable to return a response in the time allotted, but may still be processing the request (get pods) F0416 20:55:28.666769 1 cmd.go:120] the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps extension-apiserver-authentication) $oc logs -f openshift-apiserver-operator-7b5648c9dd-8spms I0416 21:07:54.099684 1 cmd.go:188] Using service-serving-cert provided certificates I0416 21:07:54.100152 1 observer_polling.go:137] Starting file observer W0416 21:07:54.127321 1 builder.go:181] unable to get owner reference (falling back to namespace): Unauthorized F0416 21:08:38.887015 1 cmd.go:120] Unauthorized
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.
Tried a few times on 4.5.0-0.ci-2020-05-06-053625 cluster but I was unable to reproduce the issue.
Today I tried a few times on a 4.4 cluster (registry.svc.ci.openshift.org/ocp/release@sha256:baa687f29b0ac155d8f4c6914056d36d68f343feb9c1e82b46eef95819d00be5) but I was unable to reproduce the issue. One thing worth mentioning is that most of the time the operator's pod was recreated after ~7min because it was waiting for a configmap. During that time cluster was fully operational no operator/service went degraded. k get secret,configmaps,po -n openshift-apiserver-operator NAME TYPE DATA AGE secret/builder-dockercfg-v2kjz kubernetes.io/dockercfg 1 5m55s secret/builder-token-2d9j2 kubernetes.io/service-account-token 4 5m55s secret/builder-token-drq82 kubernetes.io/service-account-token 4 5m55s secret/default-dockercfg-8v4l8 kubernetes.io/dockercfg 1 5m55s secret/default-token-mtg4x kubernetes.io/service-account-token 4 5m54s secret/default-token-s2jl9 kubernetes.io/service-account-token 4 5m55s secret/deployer-dockercfg-99xxg kubernetes.io/dockercfg 1 5m54s secret/deployer-token-2xvbq kubernetes.io/service-account-token 4 5m54s secret/deployer-token-knmdr kubernetes.io/service-account-token 4 5m54s secret/openshift-apiserver-operator-dockercfg-v29gm kubernetes.io/dockercfg 1 5m54s secret/openshift-apiserver-operator-serving-cert kubernetes.io/tls 2 5m54s secret/openshift-apiserver-operator-token-v5r7r kubernetes.io/service-account-token 4 5m54s secret/openshift-apiserver-operator-token-z4dvz kubernetes.io/service-account-token 4 5m54s NAME DATA AGE configmap/openshift-apiserver-operator-config 1 35s configmap/trusted-ca-bundle 1 35s NAME READY STATUS RESTARTS AGE pod/openshift-apiserver-operator-8596449546-g9ffn 0/1 ContainerCreating 0 5m50s k get secret,configmaps,po -n openshift-apiserver-operator NAME TYPE DATA AGE secret/builder-dockercfg-v2kjz kubernetes.io/dockercfg 1 6m36s secret/builder-token-2d9j2 kubernetes.io/service-account-token 4 6m36s secret/builder-token-drq82 kubernetes.io/service-account-token 4 6m36s secret/default-dockercfg-8v4l8 kubernetes.io/dockercfg 1 6m36s secret/default-token-mtg4x kubernetes.io/service-account-token 4 6m35s secret/default-token-s2jl9 kubernetes.io/service-account-token 4 6m36s secret/deployer-dockercfg-99xxg kubernetes.io/dockercfg 1 6m35s secret/deployer-token-2xvbq kubernetes.io/service-account-token 4 6m35s secret/deployer-token-knmdr kubernetes.io/service-account-token 4 6m35s secret/openshift-apiserver-operator-dockercfg-v29gm kubernetes.io/dockercfg 1 6m35s secret/openshift-apiserver-operator-serving-cert kubernetes.io/tls 2 6m35s secret/openshift-apiserver-operator-token-v5r7r kubernetes.io/service-account-token 4 6m35s secret/openshift-apiserver-operator-token-z4dvz kubernetes.io/service-account-token 4 6m35s NAME DATA AGE configmap/openshift-apiserver-operator-config 1 76s configmap/openshift-apiserver-operator-lock 0 15s configmap/trusted-ca-bundle 1 76s NAME READY STATUS RESTARTS AGE pod/openshift-apiserver-operator-8596449546-g9ffn 1/1 Running 0 6m31s
Making this low priority and moving it off 4.5 blocker list because this is not fatal or critical, it only means the events won't be reported against deployment but against namespace, which is pretty much cosmetic.
I'm moving this to the next sprint as I still want to reproduce it on 4.3 version.
Tried a few times on 4.3.35 cluster but I was unable to reproduce the issue. oc delete --all secrets -n openshift-apiserver-operator; oc delete --all cm -n openshift-apiserver-operator; oc delete --all pods -n openshift-apiserver-operator secret "builder-dockercfg-wlvkw" deleted secret "builder-token-khhql" deleted secret "default-dockercfg-ptxf7" deleted secret "default-token-74lsl" deleted secret "deployer-dockercfg-gqfmg" deleted secret "deployer-token-hcs42" deleted secret "openshift-apiserver-operator-dockercfg-mznqd" deleted secret "openshift-apiserver-operator-serving-cert" deleted secret "openshift-apiserver-operator-token-ddtxx" deleted configmap "openshift-apiserver-operator-config" deleted configmap "openshift-apiserver-operator-lock" deleted configmap "trusted-ca-bundle" deleted pod "openshift-apiserver-operator-9fc94f644-9rrjf" deleted k get secret,configmaps,po -n openshift-apiserver-operator NAME TYPE DATA AGE secret/builder-dockercfg-s52ct kubernetes.io/dockercfg 1 78s secret/builder-token-g5sfw kubernetes.io/service-account-token 4 77s secret/builder-token-jrzx5 kubernetes.io/service-account-token 4 78s secret/default-dockercfg-vsshh kubernetes.io/dockercfg 1 77s secret/default-token-22sxw kubernetes.io/service-account-token 4 77s secret/default-token-vhc9d kubernetes.io/service-account-token 4 77s secret/deployer-dockercfg-kntjf kubernetes.io/dockercfg 1 77s secret/deployer-token-phstj kubernetes.io/service-account-token 4 77s secret/deployer-token-wf9g6 kubernetes.io/service-account-token 4 77s secret/openshift-apiserver-operator-dockercfg-58lj4 kubernetes.io/dockercfg 1 76s secret/openshift-apiserver-operator-serving-cert kubernetes.io/tls 2 76s secret/openshift-apiserver-operator-token-6g9tn kubernetes.io/service-account-token 4 76s secret/openshift-apiserver-operator-token-kq5vn kubernetes.io/service-account-token 4 76s NAME DATA AGE configmap/openshift-apiserver-operator-config 1 25s configmap/openshift-apiserver-operator-lock 0 3s configmap/trusted-ca-bundle 1 25s NAME READY STATUS RESTARTS AGE pod/openshift-apiserver-operator-9fc94f644-swqpn 1/1 Running 0 73s k get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.3.35 True False False 10m cloud-credential 4.3.35 True False False 26m cluster-autoscaler 4.3.35 True False False 19m console 4.3.35 True False False 13m dns 4.3.35 True False False 25m image-registry 4.3.35 True False False 17m ingress 4.3.35 True False False 16m insights 4.3.35 True False False 20m kube-apiserver 4.3.35 True False False 24m kube-controller-manager 4.3.35 True False False 23m kube-scheduler 4.3.35 True False False 23m machine-api 4.3.35 True False False 20m machine-config 4.3.35 True False False 24m marketplace 4.3.35 True False False 19m monitoring 4.3.35 True False False 14m network 4.3.35 True False False 25m node-tuning 4.3.35 True False False 20m openshift-apiserver 4.3.35 True False False 21m openshift-controller-manager 4.3.35 True False False 22m openshift-samples 4.3.35 True False False 16m operator-lifecycle-manager 4.3.35 True False False 20m operator-lifecycle-manager-catalog 4.3.35 True False False 20m operator-lifecycle-manager-packageserver 4.3.35 True False False 16m service-ca 4.3.35 True False False 25m service-catalog-apiserver 4.3.35 True False False 17m service-catalog-controller-manager 4.3.35 True False False 20m storage 4.3.35 True False False 20m
I'm closing this issue as I wasn't able to reproduce it on 4.5, 4.4 and 4.3 (Note that I was testing on AWS)