Description of problem: After provisioning a service using Ansible service broker (dh-es-apb), the cluster was flooded with 150+ of dh-* namespaces that each had a failed pod. This caused the cluster to become unresponsive until I scaled down dc/asb in openshift-ansible-service-broker. Version-Release number of selected component (if applicable): The broker was running image: docker.io/ansibleplaybookbundle/origin-ansible-service-broker:latest Steps to Reproduce: 1. Provision a gcp-dev cluster following steps at https://github.com/openshift/release/tree/master/cluster/test-deploy 2. Make sure to enable service catalog in gcp-dev/vars-origin.yaml 3. Create a service instance for dh-es-apb with the ephemeral plan
Project names matched this pattern: namespace "dh-es-apb-depr-255fm" deleted namespace "dh-es-apb-depr-25x6h" deleted namespace "dh-es-apb-depr-25zs8" deleted namespace "dh-es-apb-depr-26xtg" deleted It looks like it was deprovisioning the instance. Robb, do you remember the steps you took?
> Robb, do you remember the steps you took? I provisioned Elasticsearch with all the defaults and bound at creation time. The provisioning was never successful, so I deleted the binding and service instance and re-provisioned. The first instance never successfully deprovisioned. The exact same thing happened with the second instance. It never successfully provisioned, so I deleted the binding and service instance and got the same results (the second instance also never successfully deprovisioned). YAML for both failed deprovisioning instances at https://gist.github.com/rhamilto/2cb50068af8aa23726cd710dbc314d3d
We will disable keeping namespaces on error in 3.11 by default and investigate another solution for 3.11.z+ https://github.com/openshift/openshift-ansible/pull/9977 https://github.com/openshift/ansible-service-broker/pull/1075
openshift-ansible 3.11.2 and above should prevent the build up of namespaces and pods from a bad APB. There is a separate issue with the service catalog that is causing the unresponsiveness. This will have to be addressed separately and there is a BZ to track the problem at https://bugzilla.redhat.com/show_bug.cgi?id=1628235
@Jason , in #comment11 & #comment6, do you mean by default, the ' keep_namespace_on_error' will set to 'false' ? but I using openshift-ansible 3.11.4 to build env, the config in cm is still keep_namespace: false keep_namespace_on_error: true
With 3.11.6 I see the expected value: [root@192 ~]# rpm -q openshift-ansible openshift-ansible-3.11.6-1.git.0.22084b3.el7_5.noarch [root@192 ~]# oc describe configmap -n openshift-ansible-service-broker | grep _on_error keep_namespace_on_error: false Bare in mind as well that this value can be overrriden in your inventory with ansible_service_broker_keep_namespace_on_error: true Please ensure you don't have this set while testing.
Verified with openshift-ansible-3.11.6 by default installation, the value in broker-config is : keep_namespace_on_error: false so mark tihs bug as verified. other issue described in the bug is using https://bugzilla.redhat.com/show_bug.cgi?id=1627473#c1 to trace.
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.