Bug 1550418

Summary: [ASB] Zombie project issue - while deleting a project that contains deprovision failed instance
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: Service BrokerAssignee: Erik Nelson <ernelson>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, chezhang, ernelson, jesusr, jmatthew, rhallise
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:10:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1566924    
Bug Blocks:    

Description Jian Zhang 2018-03-01 08:50:05 UTC
Description of problem:

Reference bug: https://bugzilla.redhat.com/show_bug.cgi?id=1548311, a similar issue.
Zombie project issue - while deleting a project contains a deprovision failed instance
Error message:
[2018-03-01T08:25:05.968Z] [ERROR] - Error occurred getting service instance [ 63b3d030-d91e-4dbe-902e-ca593e8597d5 ] after deprovision job:

Version-Release number of selected component (if applicable):
The ASB version: 1.1.15
[root@host-172-16-120-67 ~]# docker run --rm --entrypoint=asbd registry.reg-aws.openshift.com:443/openshift3/ose-ansible-service-broker:v3.9.1 --version
1.1.15

How reproducible:
Always

Steps to Reproduce:
1, Config the below registry that stores an example APB(deprovision fail). Like below:

  - type: dockerhub
    name: dh
    url: registry.hub.docker.com
    org: zjianbjz
    tag: latest
    white_list:
    - ".*-apb$"

2, Provision the "Hello Test" APB in a project called "test" and select the "faildeprovision" plan in web UI.
3, Provision success.
4, Deprovision it (failed as we expected), and Delete this "test" project.
5, Check the project status.
 
Actual results: 
serviceinstance and project test cannot be deleted

[root@host-172-16-120-87 ~]# oc get ns
NAME                                STATUS        AGE
default                             Active        5h
dh-hello-test-apb-depr-lsn72        Active        35m
install-test                        Active        5h
kube-public                         Active        5h
kube-service-catalog                Active        5h
kube-system                         Active        5h
logging                             Active        5h
management-infra                    Active        5h
openshift                           Active        5h
openshift-ansible-service-broker    Active        5h
openshift-infra                     Active        5h
openshift-node                      Active        5h
openshift-template-service-broker   Active        5h
openshift-web-console               Active        5h
test                                Terminating   41m

[root@host-172-16-120-87 ~]# oc get serviceinstance -n test
NAME                      AGE
dh-hello-test-apb-q97lw   21m

Expected results:
serviceinstance and project test should be deleted succeed.

Additional info:
The ASB logs:
[2018-03-01T08:25:05.953Z] [INFO] - ASYNC deprovision in progress
[2018-03-01T08:25:05.954Z] [DEBUG] - skipping deprovision and sending complete msg to channel
[2018-03-01T08:25:05.954Z] [DEBUG] - received deprovision message from buffer
10.128.0.1 - - [01/Mar/2018:08:25:05 +0000] "DELETE /ansible-service-broker/v2/service_instances/63b3d030-d91e-4dbe-902e-ca593e8597d5?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58
[2018-03-01T08:25:05.968Z] [ERROR] - Error occurred getting service instance [ 63b3d030-d91e-4dbe-902e-ca593e8597d5 ] after deprovision job:
[2018-03-01T08:25:05.972Z] [WARNING] - Broker configured to *NOT* launch and run APB unbind
[2018-03-01T08:25:05.972Z] [DEBUG] - Dao::DeleteBindInstance -> [ 61d0749b-19d4-4eb4-911d-099ad9c8c04c ]
[2018-03-01T08:25:05.986Z] [INFO] - Could not find a service instance in dao - 100: Key not found (/service_instance/63b3d030-d91e-4dbe-902e-ca593e8597d5) [722]
10.128.0.1 - - [01/Mar/2018:08:25:05 +0000] "DELETE /ansible-service-broker/v2/service_instances/63b3d030-d91e-4dbe-902e-ca593e8597d5?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 410 3
[2018-03-01T08:25:06.189Z] [DEBUG] - service_id: 03b69500305d9859bb9440d9f9023784
[2018-03-01T08:25:06.189Z] [DEBUG] - plan_id: 43d3e23d214c26dbebc0879e44425db4
[2018-03-01T08:25:06.189Z] [DEBUG] - operation:  96130bde-6cfa-4008-b4d0-3599b3a41c16
[2018-03-01T08:25:06.189Z] [DEBUG] - state: succeeded

Comment 1 John Matthews 2018-03-02 17:29:45 UTC
Aligning to 3.10.0

Below issue is related:
https://github.com/openshift/ansible-service-broker/issues/666

Comment 2 Ryan Hallisey 2018-03-02 22:03:13 UTC
workaround: for i in $(oc get projects  | grep Terminating| awk '{print $1}'); do echo $i; oc get serviceinstance -n $i -o yaml | sed "/kubernetes-incubator/d"| oc apply -f - ; done

Comment 3 Erik Nelson 2018-05-01 15:30:52 UTC
There have likely been enough changes in both the catalog and the broker since this bz was filed that the problem does not exist anymore for a 3.10 release.

Can you please retest this with the latest catalog (v0.1.16) and the latest broker (1.2.8-1)?

Comment 4 Jian Zhang 2018-05-02 07:35:49 UTC
Erik,

OK, I will test it later. Changed status to "MODIFIED" since the latest version of service catalog is "v3.10.0-0.31.0;Upstream:v0.1.13". And, for the version 1.2.8 of the ASB, bug 1566924 block also it.

Comment 5 Erik Nelson 2018-05-09 18:21:30 UTC
So while debugging a related issue, tracked on trello (https://trello.com/c/Kb3CVqkH), I discovered a broker bug that *may* have prevented the zombie project referenced in the bz from being deleted. I patched the broker and confirmed the following PR ensures the broker is responding correctly during failed async deprovisions. I'm also able to fully delete the project that the failed service was deployed to, no ServiceInstance resources are left behind.

https://github.com/openshift/ansible-service-broker/pull/942

Comment 7 Jian Zhang 2018-05-16 08:28:56 UTC
Verify success.

The target namespace that the failed serviceinstance was deployed can be deleted succeed.

The ASB version: 1.2.11
Service catalog: v3.10.0-0.46.0;Upstream:v0.1.18

Comment 9 errata-xmlrpc 2018-07-30 19:10:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816