Bug 1618547 - Ansible service broker filled cluster with dh-* namespaces and Failed pods
Summary: Ansible service broker filled cluster with dh-* namespaces and Failed pods
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.0
Assignee: Jason Montleon
QA Contact: Zihan Tang
URL:
Whiteboard:
Depends On:
Blocks: 1627480
TreeView+ depends on / blocked
 
Reported: 2018-08-16 22:32 UTC by Samuel Padgett
Modified: 2018-12-21 15:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1627473 1627480 (view as bug list)
Environment:
Last Closed: 2018-12-21 15:23:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Samuel Padgett 2018-08-16 22:32:01 UTC
Description of problem:

After provisioning a service using Ansible service broker (dh-es-apb), the cluster was flooded with 150+ of dh-* namespaces that each had a failed pod. This caused the cluster to become unresponsive until I scaled down dc/asb in openshift-ansible-service-broker.

Version-Release number of selected component (if applicable):

The broker was running image:

docker.io/ansibleplaybookbundle/origin-ansible-service-broker:latest

Steps to Reproduce:
1. Provision a gcp-dev cluster following steps at https://github.com/openshift/release/tree/master/cluster/test-deploy
2. Make sure to enable service catalog in gcp-dev/vars-origin.yaml
3. Create a service instance for dh-es-apb with the ephemeral plan

Comment 2 Samuel Padgett 2018-08-16 22:39:26 UTC
Project names matched this pattern:

namespace "dh-es-apb-depr-255fm" deleted
namespace "dh-es-apb-depr-25x6h" deleted
namespace "dh-es-apb-depr-25zs8" deleted
namespace "dh-es-apb-depr-26xtg" deleted

It looks like it was deprovisioning the instance.

Robb, do you remember the steps you took?

Comment 3 Robb Hamilton 2018-08-17 13:02:21 UTC
> Robb, do you remember the steps you took?

I provisioned Elasticsearch with all the defaults and bound at creation time.  The provisioning was never successful, so I deleted the binding and service instance and re-provisioned.  The first instance never successfully deprovisioned.  The exact same thing happened with the second instance.  It never successfully provisioned, so I deleted the binding and service instance and got the same results (the second instance also never successfully deprovisioned).

YAML for both failed deprovisioning instances at https://gist.github.com/rhamilto/2cb50068af8aa23726cd710dbc314d3d

Comment 5 Jason Montleon 2018-09-10 15:35:12 UTC
We will disable keeping namespaces on error in 3.11 by default and investigate another solution for 3.11.z+
https://github.com/openshift/openshift-ansible/pull/9977
https://github.com/openshift/ansible-service-broker/pull/1075

Comment 11 Jason Montleon 2018-09-13 17:46:17 UTC
openshift-ansible 3.11.2 and above should prevent the build up of namespaces and pods from a bad APB.

There is a separate issue with the service catalog that is causing the unresponsiveness. This will have to be addressed separately and there is a BZ to track the problem at https://bugzilla.redhat.com/show_bug.cgi?id=1628235

Comment 12 Zihan Tang 2018-09-14 07:15:22 UTC
@Jason , in #comment11 & #comment6, do you mean by default, the ' keep_namespace_on_error' will set to 'false' ?

but I using openshift-ansible 3.11.4 to build env, the config in cm is still 
  keep_namespace: false
  keep_namespace_on_error: true

Comment 13 Jason Montleon 2018-09-14 13:14:20 UTC
With 3.11.6 I see the expected value:
[root@192 ~]# rpm -q openshift-ansible
openshift-ansible-3.11.6-1.git.0.22084b3.el7_5.noarch
[root@192 ~]# oc describe configmap -n openshift-ansible-service-broker | grep _on_error
  keep_namespace_on_error: false

Bare in mind as well that this value can be overrriden in your inventory with ansible_service_broker_keep_namespace_on_error: true

Please ensure you don't have this set while testing.

Comment 14 Zihan Tang 2018-09-17 02:48:48 UTC
Verified with openshift-ansible-3.11.6
by default installation, the value in broker-config is : 
 keep_namespace_on_error: false

so mark tihs bug as verified.
other issue described in the bug is using https://bugzilla.redhat.com/show_bug.cgi?id=1627473#c1 to trace.

Comment 15 Luke Meyer 2018-12-21 15:23:18 UTC
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.


Note You need to log in before you can comment on or make changes to this bug.