Description of problem: Projects that have been deleted stay in "Terminating" forever. Version-Release number of selected component (if applicable): 4.1.11 How reproducible: Randomly. Once it starts happening 100% Steps to Reproduce: 1. Create and delete projects 2. At some point projects will refuse to terminate Actual results: Expected results: Additional info: Had a conversation with David Eads about this topic. He walked me through some debugging. oc get apiservers shows all api servers to be Available. However reading the kube-controller-manager pod logs shows a few errors: Initially there are a lot of these: kube-controller-manager-ip-10-0-144-91.us-west-2.compute.internal kube-controller-manager-12 E0827 17:57:55.771687 1 namespace_controller.go:148] unable to retrieve the complete list of server APIs: apps.openshift.io/v1: the server is currently unable to handle the request, authorization.openshift.io/v1: the server is currently unable to handle the request, build.openshift.io/v1: the server is currently unable to handle the request, image.openshift.io/v1: the server is currently unable to handle the request, mutators.kubedb.com/v1alpha1: the server is currently unable to handle the request, oauth.openshift.io/v1: the server is currently unable to handle the request, packages.operators.coreos.com/v1: the server is currently unable to handle the request, project.openshift.io/v1: the server is currently unable to handle the request, quota.openshift.io/v1: the server is currently unable to handle the request, route.openshift.io/v1: the server is currently unable to handle the request, security.openshift.io/v1: the server is currently unable to handle the request, servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request, template.openshift.io/v1: the server is currently unable to handle the request, user.openshift.io/v1: the server is currently unable to handle the request, validators.kubedb.com/v1alpha1: the server is currently unable to handle the request After a while the error changes to: kube-controller-manager-ip-10-0-144-91.us-west-2.compute.internal kube-controller-manager-12 E0827 17:59:24.537248 1 namespace_controller.go:148] unable to retrieve the complete list of server APIs: mutators.kubedb.com/v1alpha1: the server could not find the requested resource, packages.operators.coreos.com/v1: the server is currently unable to handle the request, validators.kubedb.com/v1alpha1: the server could not find the requested resource Also commit 3193c39722126914c05a6f6d22c1eb3c04a2b9d6 should have produced an annotation outlining the deletion failure in the status of the namespace: namespace-controller.kcm.openshift.io/deletion-error This annotation is missing even in 4.1.11
We need must-gather output to analyse this (https://docs.openshift.com/container-platform/4.1/cli_reference/administrator-cli-commands.html#must-gather).
Uploaded must-gather output from one of our worst clusters to https://drive.google.com/open?id=1xTguCi9pHZ6IkWqhvq_NybPikdyHAIkq Also shared with Stefan and David directly (just in case)
Analyzed the two clusters: - test cluster: kubedb is installed which provides mutating and validating admission webhooks served through an aggregated API server. This API server does not serve /apis/{mutators,validators}.kubedb.com/v1alpha1. Hence, the namespace controller inside kube-controller-manager falls over and stop deleting namespaces. - prod cluster: service catalog leaves serviceinstance object with kubernetes-incubator/service-catalog finalizer and does not do its job of deleting them. Hence, the namespace controller cannot finish namespace deletion.
From the service catalog controller manager: 0828 15:16:00.829683 1 event.go:221] Event(v1.ObjectReference{Kind:"ServiceInstance", Namespace:"f025-demo-templates", Name:"f025-nodejs-mongodb-demo", UID:"03af3c2a-c8e2-11e9-9770-0a580a800134", APIVersion:"servicecatalog.k8s.io/v1beta1", ResourceVersion:"64034520", FieldPath:""}): type: 'Warning' reason: 'DeprovisionBlockedByExistingCredentials' All associated ServiceBindings must be removed before this ServiceInstance can be deleted