Bug 1548311 - Zombie project issue - while delete a project that contain provision failed instance
Summary: Zombie project issue - while delete a project that contain provision failed i...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Erik Nelson
QA Contact: Zhang Cheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-23 06:56 UTC by Zhang Cheng
Modified: 2018-06-18 18:29 UTC (History)
10 users (show)

Fixed In Version: asb image 1.1.15
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-06-18 18:10:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1562961 0 medium CLOSED Unable to force delete of zombie resources (including projects) 2021-02-22 00:41:40 UTC

Internal Links: 1562961

Description Zhang Cheng 2018-02-23 06:56:19 UTC
Description of problem: 
Zombie project issue - while delete a project contain a provision failed instance
Error message:
[2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]
[2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]


ansible-service-broker: 1.1.13
mediawiki-apb:v3.9.0-0.47.0.0 (From stage registry)


How reproducible:
Always


Steps to Reproduce:
1. Provision mediawiki(APB) with same "Mediawiki Admin User" and "Mediawiki Admin User Password" to a new project "test1" from web console. (Provision failed by expected)

2. Delete project test1 directly.

3. Check serviceinstance project status.


Actual results:  
3. serviceinstance and project test1 cannot be deleted


Expected results: 
3. serviceinstance and project test1 should be deleted succeed.


Addition info: 
Attach asb logs:
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [NOTICE] -                        PROVISIONING                         
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.ID: 03b69500305d9859bb9440d9f9023784
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Name: rh-mediawiki-apb
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.9
[2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Description: Mediawiki apb implementation
[2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================
[2018-02-23T06:32:22.611Z] [INFO] - Checking if namespace provision-fail exists.
[2018-02-23T06:32:22.625Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58
[2018-02-23T06:32:22.694Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0
[2018-02-23T06:32:22.822Z] [INFO] - Successfully created apb sandbox: [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ], with edit permissions in namespace rh-mediawiki-apb-prov-s5nnj
[2018-02-23T06:32:22.822Z] [INFO] - Running post create sandbox fuctions if defined.
[2018-02-23T06:32:22.822Z] [NOTICE] - Creating pod "apb-d8dbf724-b05d-409b-bf90-a9d910a399c0" in the rh-mediawiki-apb-prov-s5nnj namespace
[2018-02-23T06:32:22.884Z] [INFO] - Provision requested for instance ac81d124-d14a-4c0c-a4d3-84617ec9eef8, but job is already in progress
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58
[2018-02-23T06:32:22.902Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 1
10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
10.128.0.1 - - [23/Feb/2018:06:32:23 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
10.128.0.1 - - [23/Feb/2018:06:32:27 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29
[2018-02-23T06:32:27.907Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 2
[2018-02-23T06:32:27.91Z] [ERROR] - Provision or Update action failed - Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2]
[2018-02-23T06:32:27.91Z] [INFO] - Destroying APB sandbox...
[2018-02-23T06:32:27.915Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace rh-mediawiki-apb-prov-s5nnj
[2018-02-23T06:32:27.918Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace provision-fail
[2018-02-23T06:32:27.921Z] [ERROR] - broker::Provision error occurred. Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2]
10.128.0.1 - - [23/Feb/2018:06:32:35 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 122
[2018-02-23T06:33:03.226Z] [INFO] - ASYNC deprovision in progress
10.128.0.1 - - [23/Feb/2018:06:33:03 +0000] "DELETE /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58
[2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]
[2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]

Comment 1 Paul Morie 2018-02-23 16:42:51 UTC
I believe the issue here is that the ansible broker has to make some resources in the namespace that is being deleted - which can't happen because the namespace is in a terminating state.

Of course, I'm not an expert in the ansible broker's behavior, but I wanted to call out that we've talked about an issue that sounds extremely similar to this before.

Comment 2 Erik Nelson 2018-02-23 20:18:53 UTC
@Paul, dug a little bit into this today and I believe we're erroneously sending {"state": "failed"} back from the last_operation endpoint, instead of a {"state": success"}. However, I am seeing the catalog continue to poll last_operation. The spec states:

"A response with "state": "succeeded" or "state": "failed" MUST cause the Platform to cease polling."

Am I correct in that regardless of whether or not we return success or failed, the catalog should stop polling in this case?

Comment 6 Zhang Cheng 2018-03-01 06:32:00 UTC
Verified with latest asb image 1.1.15 from brew registry.
Test step follow description of this bug.


Note You need to log in before you can comment on or make changes to this bug.