Bug 1548311

Summary: Zombie project issue - while delete a project that contain provision failed instance
Product: OpenShift Container Platform Reporter: Zhang Cheng <chezhang>
Component: Service BrokerAssignee: Erik Nelson <ernelson>
Status: CLOSED CURRENTRELEASE QA Contact: Zhang Cheng <chezhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, dzager, ernelson, jiazha, jmatthew, jmontleo, pmorie, smunilla, tcarlin, zitang
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: asb image 1.1.15 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-18 18:10:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhang Cheng 2018-02-23 06:56:19 UTC
Description of problem: 
Zombie project issue - while delete a project contain a provision failed instance
Error message:
[2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]
[2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]


ansible-service-broker: 1.1.13
mediawiki-apb:v3.9.0-0.47.0.0 (From stage registry)


How reproducible:
Always


Steps to Reproduce:
1. Provision mediawiki(APB) with same "Mediawiki Admin User" and "Mediawiki Admin User Password" to a new project "test1" from web console. (Provision failed by expected)

2. Delete project test1 directly.

3. Check serviceinstance project status.


Actual results:  
3. serviceinstance and project test1 cannot be deleted


Expected results: 
3. serviceinstance and project test1 should be deleted succeed.


Addition info: 
Attach asb logs:
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [NOTICE] -                        PROVISIONING                         
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.ID: 03b69500305d9859bb9440d9f9023784
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Name: rh-mediawiki-apb
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.9
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Description: Mediawiki apb implementation
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [INFO] - Checking if namespace provision-fail exists.
[2018-02-23T06:32:22.625Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58
[2018-02-23T06:32:22.694Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0
[2018-02-23T06:32:22.822Z] [INFO] - Successfully created apb sandbox: [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ], with edit permissions in namespace rh-mediawiki-apb-prov-s5nnj
[2018-02-23T06:32:22.822Z] [INFO] - Running post create sandbox fuctions if defined.
[2018-02-23T06:32:22.822Z] [NOTICE] - Creating pod "apb-d8dbf724-b05d-409b-bf90-a9d910a399c0" in the rh-mediawiki-apb-prov-s5nnj namespace
[2018-02-23T06:32:22.884Z] [INFO] - Provision requested for instance ac81d124-d14a-4c0c-a4d3-84617ec9eef8, but job is already in progress
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58
[2018-02-23T06:32:22.902Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 1
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
10.128.0.1 - - [23/Feb/2018:06:32:23 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
10.128.0.1 - - [23/Feb/2018:06:32:27 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
[2018-02-23T06:32:27.907Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 2
[2018-02-23T06:32:27.91Z] [ERROR] - Provision or Update action failed - Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2]
[2018-02-23T06:32:27.91Z] [INFO] - Destroying APB sandbox...
[2018-02-23T06:32:27.915Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace rh-mediawiki-apb-prov-s5nnj
[2018-02-23T06:32:27.918Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace provision-fail
[2018-02-23T06:32:27.921Z] [ERROR] - broker::Provision error occurred. Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2]
10.128.0.1 - - [23/Feb/2018:06:32:35 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 122
[2018-02-23T06:33:03.226Z] [INFO] - ASYNC deprovision in progress
10.128.0.1 - - [23/Feb/2018:06:33:03 +0000] "DELETE /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58
[2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]
[2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]

Comment 1 Paul Morie 2018-02-23 16:42:51 UTC
I believe the issue here is that the ansible broker has to make some resources in the namespace that is being deleted - which can't happen because the namespace is in a terminating state.

Of course, I'm not an expert in the ansible broker's behavior, but I wanted to call out that we've talked about an issue that sounds extremely similar to this before.

Comment 2 Erik Nelson 2018-02-23 20:18:53 UTC
@Paul, dug a little bit into this today and I believe we're erroneously sending {"state": "failed"} back from the last_operation endpoint, instead of a {"state": success"}. However, I am seeing the catalog continue to poll last_operation. The spec states:

"A response with "state": "succeeded" or "state": "failed" MUST cause the Platform to cease polling."

Am I correct in that regardless of whether or not we return success or failed, the catalog should stop polling in this case?

Comment 6 Zhang Cheng 2018-03-01 06:32:00 UTC
Verified with latest asb image 1.1.15 from brew registry.
Test step follow description of this bug.