Bug 1570603

Summary: Should not launch multi depr sandboxes while provision failed
Product: OpenShift Container Platform Reporter: Zhang Cheng <chezhang>
Component: Service BrokerAssignee: David Zager <dzager>
Status: CLOSED ERRATA QA Contact: Zhang Cheng <chezhang>
Severity: high Docs Contact:
Priority: medium    
Version: 3.10.0CC: aos-bugs, dzager, jesusr, jiazha, pmorie, zhsun, zitang
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:13:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Zhang Cheng 2018-04-23 10:20:11 UTC
Description of problem: 
Should not lanuch so many depr sandboxs while provision failed.


service-catalog & asb image using images from brew registry:
# service-catalog --version
v3.10.0-0.27.0;Upstream:v0.1.13
# asbd --version
1.2.6

How reproducible:
Always


Steps to Reproduce:
1. Provison a mediawiki apb with same username and passwd (provision should failed by expect)
2. Look at namespace by `oc get ns | grep mediawiki`

Actual results:
1. Provison a mediawiki apb failed by expect.
2. Auto lanuch many depr sandboxs
# oc get ns | grep mediawiki
rh-mediawiki-apb-depr-4kf9z         Active    5m
rh-mediawiki-apb-depr-4vzlq         Active    2s
rh-mediawiki-apb-depr-5z5sr         Active    3m
rh-mediawiki-apb-depr-7kjsl         Active    2m
rh-mediawiki-apb-depr-b4p9t         Active    6m
rh-mediawiki-apb-depr-bms7h         Active    2m
rh-mediawiki-apb-depr-d74sl         Active    5m
rh-mediawiki-apb-depr-dvqx4         Active    1m
rh-mediawiki-apb-depr-fhdv4         Active    42s
rh-mediawiki-apb-depr-fkhr7         Active    4m
rh-mediawiki-apb-depr-fq4cc         Active    2m
rh-mediawiki-apb-depr-hh5lj         Active    1m
rh-mediawiki-apb-depr-hqj2k         Active    5m
rh-mediawiki-apb-depr-j5jn5         Active    4m
rh-mediawiki-apb-depr-j85sm         Active    16s
rh-mediawiki-apb-depr-kfrnc         Active    3m
rh-mediawiki-apb-depr-kwnf8         Active    5m
rh-mediawiki-apb-depr-mf5m9         Active    4m
rh-mediawiki-apb-depr-mxsff         Active    29s
rh-mediawiki-apb-depr-n7qqx         Active    2m
rh-mediawiki-apb-depr-p5s28         Active    3m
rh-mediawiki-apb-depr-prvf7         Active    56s
rh-mediawiki-apb-depr-rc8sv         Active    1m
rh-mediawiki-apb-depr-rgj7v         Active    6m
rh-mediawiki-apb-depr-rqz62         Active    3m
rh-mediawiki-apb-depr-rt7dk         Active    4m
rh-mediawiki-apb-depr-z8wfv         Active    4m
rh-mediawiki-apb-depr-zhmsk         Active    2m
rh-mediawiki-apb-depr-zqzbf         Active    1m
rh-mediawiki-apb-prov-r7jlc         Active    6m

Expected results:
2. should not launch any depr sandbox.


Addition info: 
Attach asb logs:
time="2018-04-23T10:08:44Z" level=info msg="============================================================"
time="2018-04-23T10:08:44Z" level=info msg="                       PROVISIONING                         "
time="2018-04-23T10:08:44Z" level=info msg="============================================================"
time="2018-04-23T10:08:44Z" level=info msg="Spec.ID: 03b69500305d9859bb9440d9f9023784"
time="2018-04-23T10:08:44Z" level=info msg="Spec.Name: rh-mediawiki-apb"
time="2018-04-23T10:08:44Z" level=info msg="Spec.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.10"
time="2018-04-23T10:08:44Z" level=info msg="Spec.Description: Mediawiki apb implementation"
time="2018-04-23T10:08:44Z" level=info msg="============================================================"
time="2018-04-23T10:08:44Z" level=info msg="Checking if namespace %s exists.cover"
time="2018-04-23T10:08:44Z" level=info msg="Creating RoleBinding apb-4c98884c-4d97-4342-ac56-ca61ce224205"
time="2018-04-23T10:08:44Z" level=info msg="Creating RoleBinding apb-4c98884c-4d97-4342-ac56-ca61ce224205"
time="2018-04-23T10:08:44Z" level=info msg="Successfully created apb sandbox: [ %s ], with %s permissions in namespace %sapb-4c98884c-4d97-4342-ac56-ca61ce224205editrh-mediawiki-apb-prov-r7jlc"
time="2018-04-23T10:08:44Z" level=info msg="Running post create sandbox fuctions if defined."
time="2018-04-23T10:08:44Z" level=info msg="Proxy configuration present. Applying to APB before execution:"
time="2018-04-23T10:08:44Z" level=info msg="HTTP_PROXY=\"\""
time="2018-04-23T10:08:44Z" level=info msg="HTTPS_PROXY=\"\""
time="2018-04-23T10:08:44Z" level=info msg="NO_PROXY=\"\""
time="2018-04-23T10:08:44Z" level=info msg="Creating pod \"apb-4c98884c-4d97-4342-ac56-ca61ce224205\" in the rh-mediawiki-apb-prov-r7jlc namespace"
[2018-04-23T10:08:44.898Z] [INFO] - All Jobs for instance: 4c8d3c6b-46de-11e8-9590-0a580a80000b in state:  in progress - 
[]apb.JobState{apb.JobState{Token:"7e728f5c-15a7-4716-8869-d6a45d15a875", State:"in progress", Podname:"", Method:"provision", Error:"", Description:"action started"}}
[2018-04-23T10:08:44.898Z] [INFO] - Provision requested for instance 4c8d3c6b-46de-11e8-9590-0a580a80000b, but job is already in progress
[2018-04-23T10:08:44.898Z] [ERROR] - provision error provision in progress
10.128.0.1 - - [23/Apr/2018:10:08:44 +0000] "PUT /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b?accepts_incomplete=true HTTP/1.1" 202 58
10.128.0.1 - - [23/Apr/2018:10:08:45 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=7e728f5c-15a7-4716-8869-d6a45d15a875&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
10.128.0.1 - - [23/Apr/2018:10:08:45 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=7e728f5c-15a7-4716-8869-d6a45d15a875&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
10.128.0.1 - - [23/Apr/2018:10:08:49 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=7e728f5c-15a7-4716-8869-d6a45d15a875&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
time="2018-04-23T10:08:50Z" level=error msg="Provision or Update action failed - Pod [ apb-4c98884c-4d97-4342-ac56-ca61ce224205 ] failed with exit code [2]"
time="2018-04-23T10:08:50Z" level=info msg="Destroying APB sandbox..."
time="2018-04-23T10:08:50Z" level=info msg="Successfully deleted rolebinding apb-4c98884c-4d97-4342-ac56-ca61ce224205, namespace rh-mediawiki-apb-prov-r7jlc"
time="2018-04-23T10:08:50Z" level=info msg="Successfully deleted rolebinding apb-4c98884c-4d97-4342-ac56-ca61ce224205, namespace cover"
[2018-04-23T10:08:50.709Z] [ERROR] - broker::provision error occurred. Pod [ apb-4c98884c-4d97-4342-ac56-ca61ce224205 ] failed with exit code [2]
time="2018-04-23T10:08:50Z" level=error msg="Provision APB error: Pod [ apb-4c98884c-4d97-4342-ac56-ca61ce224205 ] failed with exit code [2]"
10.128.0.1 - - [23/Apr/2018:10:08:57 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=7e728f5c-15a7-4716-8869-d6a45d15a875&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 129
[2018-04-23T10:08:57.485Z] [INFO] - All Jobs for instance: 4c8d3c6b-46de-11e8-9590-0a580a80000b in state:  in progress - 
[]apb.JobState{}
time="2018-04-23T10:08:57Z" level=error msg="Unable to load secret '4c8d3c6b-46de-11e8-9590-0a580a80000b' from namespace 'openshift-ansible-service-broker'"
time="2018-04-23T10:08:57Z" level=error msg="unable to get secret data for 4c8d3c6b-46de-11e8-9590-0a580a80000b, in namespace: openshift-ansible-service-broker"
[2018-04-23T10:08:57.488Z] [INFO] - ASYNC deprovision in progress
10.128.0.1 - - [23/Apr/2018:10:08:57 +0000] "DELETE /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58
time="2018-04-23T10:08:57Z" level=info msg="============================================================"
time="2018-04-23T10:08:57Z" level=info msg="                      DEPROVISIONING                        "
time="2018-04-23T10:08:57Z" level=info msg="============================================================"
time="2018-04-23T10:08:57Z" level=info msg="ServiceInstance.Id: 03b69500305d9859bb9440d9f9023784"
time="2018-04-23T10:08:57Z" level=info msg="ServiceInstance.Name: rh-mediawiki-apb"
time="2018-04-23T10:08:57Z" level=info msg="ServiceInstance.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.10"
time="2018-04-23T10:08:57Z" level=info msg="ServiceInstance.Description: Mediawiki apb implementation"
time="2018-04-23T10:08:57Z" level=info msg="============================================================"
time="2018-04-23T10:08:57Z" level=info msg="Creating RoleBinding apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626"
time="2018-04-23T10:08:57Z" level=info msg="Creating RoleBinding apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626"
time="2018-04-23T10:08:57Z" level=info msg="Successfully created apb sandbox: [ %s ], with %s permissions in namespace %sapb-a80e3b0e-ac46-419e-a12b-5c265cbd9626editrh-mediawiki-apb-depr-rgj7v"
time="2018-04-23T10:08:57Z" level=info msg="Running post create sandbox fuctions if defined."
time="2018-04-23T10:08:57Z" level=info msg="Proxy configuration present. Applying to APB before execution:"
time="2018-04-23T10:08:57Z" level=info msg="HTTP_PROXY=\"\""
time="2018-04-23T10:08:57Z" level=info msg="HTTPS_PROXY=\"\""
time="2018-04-23T10:08:57Z" level=info msg="NO_PROXY=\"\""
time="2018-04-23T10:08:57Z" level=info msg="Creating pod \"apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626\" in the rh-mediawiki-apb-depr-rgj7v namespace"
[2018-04-23T10:09:06.482Z] [ERROR] - broker::deprovision error occurred. Pod [ apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626 ] failed with exit code [2]
time="2018-04-23T10:09:06Z" level=error msg="Deprovision action failed - Pod [ apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626 ] failed with exit code [2]"
time="2018-04-23T10:09:06Z" level=info msg="Destroying APB sandbox..."
time="2018-04-23T10:09:06Z" level=info msg="Successfully deleted rolebinding apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626, namespace rh-mediawiki-apb-depr-rgj7v"
time="2018-04-23T10:09:06Z" level=info msg="Successfully deleted rolebinding apb-a80e3b0e-ac46-419e-a12b-5c265cbd9626, namespace cover"
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=45453ceb-3f11-43ac-b100-1da5c40a7cf3&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 131
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=45453ceb-3f11-43ac-b100-1da5c40a7cf3&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 131
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=45453ceb-3f11-43ac-b100-1da5c40a7cf3&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 131
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=45453ceb-3f11-43ac-b100-1da5c40a7cf3&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 131
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=45453ceb-3f11-43ac-b100-1da5c40a7cf3&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 131
[2018-04-23T10:09:13.692Z] [INFO] - All Jobs for instance: 4c8d3c6b-46de-11e8-9590-0a580a80000b in state:  in progress - 
[]apb.JobState{}
time="2018-04-23T10:09:13Z" level=error msg="Unable to load secret '4c8d3c6b-46de-11e8-9590-0a580a80000b' from namespace 'openshift-ansible-service-broker'"
time="2018-04-23T10:09:13Z" level=error msg="unable to get secret data for 4c8d3c6b-46de-11e8-9590-0a580a80000b, in namespace: openshift-ansible-service-broker"
[2018-04-23T10:09:13.694Z] [INFO] - ASYNC deprovision in progress
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "DELETE /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58
time="2018-04-23T10:09:13Z" level=info msg="============================================================"
time="2018-04-23T10:09:13Z" level=info msg="                      DEPROVISIONING                        "
time="2018-04-23T10:09:13Z" level=info msg="============================================================"
time="2018-04-23T10:09:13Z" level=info msg="ServiceInstance.Id: 03b69500305d9859bb9440d9f9023784"
time="2018-04-23T10:09:13Z" level=info msg="ServiceInstance.Name: rh-mediawiki-apb"
time="2018-04-23T10:09:13Z" level=info msg="ServiceInstance.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.10"
time="2018-04-23T10:09:13Z" level=info msg="ServiceInstance.Description: Mediawiki apb implementation"
time="2018-04-23T10:09:13Z" level=info msg="============================================================"
time="2018-04-23T10:09:13Z" level=info msg="Creating RoleBinding apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5"
[2018-04-23T10:09:13.923Z] [INFO] - All Jobs for instance: 4c8d3c6b-46de-11e8-9590-0a580a80000b in state:  in progress - 
[]apb.JobState{apb.JobState{Token:"05436063-4983-4437-85b3-2945e1ce48e2", State:"in progress", Podname:"", Method:"deprovision", Error:"", Description:"action started"}}
[2018-04-23T10:09:13.923Z] [INFO] - Deprovision requested for instance 4c8d3c6b-46de-11e8-9590-0a580a80000b, but job is already in progress
10.128.0.1 - - [23/Apr/2018:10:09:13 +0000] "DELETE /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 3
time="2018-04-23T10:09:13Z" level=info msg="Creating RoleBinding apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5"
time="2018-04-23T10:09:13Z" level=info msg="Successfully created apb sandbox: [ %s ], with %s permissions in namespace %sapb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5editrh-mediawiki-apb-depr-b4p9t"
time="2018-04-23T10:09:13Z" level=info msg="Running post create sandbox fuctions if defined."
time="2018-04-23T10:09:14Z" level=info msg="Proxy configuration present. Applying to APB before execution:"
time="2018-04-23T10:09:14Z" level=info msg="HTTP_PROXY=\"\""
time="2018-04-23T10:09:14Z" level=info msg="HTTPS_PROXY=\"\""
time="2018-04-23T10:09:14Z" level=info msg="NO_PROXY=\"\""
time="2018-04-23T10:09:14Z" level=info msg="Creating pod \"apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5\" in the rh-mediawiki-apb-depr-b4p9t namespace"
10.128.0.1 - - [23/Apr/2018:10:09:14 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=05436063-4983-4437-85b3-2945e1ce48e2&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
10.128.0.1 - - [23/Apr/2018:10:09:14 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=05436063-4983-4437-85b3-2945e1ce48e2&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
10.128.0.1 - - [23/Apr/2018:10:09:18 +0000] "GET /ansible-service-broker/v2/service_instances/4c8d3c6b-46de-11e8-9590-0a580a80000b/last_operation?operation=05436063-4983-4437-85b3-2945e1ce48e2&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 64
time="2018-04-23T10:09:22Z" level=error msg="Deprovision action failed - Pod [ apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5 ] failed with exit code [2]"
time="2018-04-23T10:09:22Z" level=info msg="Destroying APB sandbox..."
[2018-04-23T10:09:22.964Z] [ERROR] - broker::deprovision error occurred. Pod [ apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5 ] failed with exit code [2]
time="2018-04-23T10:09:22Z" level=info msg="Successfully deleted rolebinding apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5, namespace rh-mediawiki-apb-depr-b4p9t"
time="2018-04-23T10:09:22Z" level=info msg="Successfully deleted rolebinding apb-48edc83b-5ea4-45c1-a0bb-2a7d347d65b5, namespace cover"

Comment 1 Shawn Hurley 2018-04-29 16:37:35 UTC
I am seeing this behavior in service catalog 0.1.13 but not in 0.1.11.

@pmorie Do you know which behavior is correct for 3.10?

Comment 2 John Matthews 2018-05-02 13:32:42 UTC
Clearing needinfo

Plan for this BZ is to update the mediawiki APB so it's deprovision playbook is idempotent.

That is the first issue here, the deprovision should be able to run multiple times and ensure it has cleaned up the artifacts, if they do not exist it should _not_ treat that as an error.

For longer term, we will investigate and determine how we want to handle failure cases and a possible rate limiter so we will control how often we attempt a deprovision.
Trello for longer term approach here (post 3.10):  https://trello.com/c/hSGqK7wm

Summary:
For 3.10, we will update the deprovision playbook in mediawiki so it is idempotent

Comment 6 David Zager 2018-05-24 17:33:13 UTC
https://errata.devel.redhat.com/advisory/33505 moved to QE

openshift-enterprise-asb-container-v3.10.0-0.51.0.1
openshift-enterprise-mediawiki-apb-v3.10.0-0.51.0.1
openshift-enterprise-postgresql-apb-v3.10.0-0.51.0.1
openshift-enterprise-mysql-apb-v3.10.0-0.51.0.1
openshift-enterprise-mariadb-apb-v3.10.0-0.51.0.1
openshift-enterprise-apb-tools-v3.10.0-0.32.0.2

Comment 7 Zhang Cheng 2018-05-28 07:48:12 UTC
I want to know what is our expected in currently. I tried with openshift-enterprise-mediawiki-apb-v3.10.0-0.51.0.1 and latest asb:1.2.14, but still get similar result.
1. Will automatic launch depr sandbox if mediawiki provision failed
2. Still launch multi depr sandboxes while provision failed, but not frequently.

rh-mediawiki-apb-depr-4w8c6         Active    33m
rh-mediawiki-apb-depr-kpdwq         Active    40m
rh-mediawiki-apb-depr-kxjtd         Active    1h
rh-mediawiki-apb-depr-mnsqr         Active    29m
rh-mediawiki-apb-prov-fnjl5         Active    1h

Comment 8 David Zager 2018-05-29 02:27:12 UTC
From jmatthews: 


Plan for this BZ is to update the mediawiki APB so it's deprovision playbook is idempotent.




You should expect to see the multiple sandboxes, however they should now be idempotent.

Comment 9 Zhang Cheng 2018-05-29 02:43:49 UTC
That is OK. Verified in here.

Comment 11 errata-xmlrpc 2018-07-30 19:13:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816