Description of problem: Zombie project issue - while delete a project contain a provision failed instance Error message: [2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286] [2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286] ansible-service-broker: 1.1.13 mediawiki-apb:v3.9.0-0.47.0.0 (From stage registry) How reproducible: Always Steps to Reproduce: 1. Provision mediawiki(APB) with same "Mediawiki Admin User" and "Mediawiki Admin User Password" to a new project "test1" from web console. (Provision failed by expected) 2. Delete project test1 directly. 3. Check serviceinstance project status. Actual results: 3. serviceinstance and project test1 cannot be deleted Expected results: 3. serviceinstance and project test1 should be deleted succeed. Addition info: Attach asb logs: [2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================ [2018-02-23T06:32:22.611Z] [NOTICE] - PROVISIONING [2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================ [2018-02-23T06:32:22.611Z] [NOTICE] - Spec.ID: 03b69500305d9859bb9440d9f9023784 [2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Name: rh-mediawiki-apb [2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Image: registry.access.stage.redhat.com/openshift3/mediawiki-apb:v3.9 [2018-02-23T06:32:22.611Z] [NOTICE] - Spec.Description: Mediawiki apb implementation [2018-02-23T06:32:22.611Z] [NOTICE] - ============================================================ [2018-02-23T06:32:22.611Z] [INFO] - Checking if namespace provision-fail exists. [2018-02-23T06:32:22.625Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58 [2018-02-23T06:32:22.694Z] [NOTICE] - Creating RoleBinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 [2018-02-23T06:32:22.822Z] [INFO] - Successfully created apb sandbox: [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ], with edit permissions in namespace rh-mediawiki-apb-prov-s5nnj [2018-02-23T06:32:22.822Z] [INFO] - Running post create sandbox fuctions if defined. [2018-02-23T06:32:22.822Z] [NOTICE] - Creating pod "apb-d8dbf724-b05d-409b-bf90-a9d910a399c0" in the rh-mediawiki-apb-prov-s5nnj namespace [2018-02-23T06:32:22.884Z] [INFO] - Provision requested for instance ac81d124-d14a-4c0c-a4d3-84617ec9eef8, but job is already in progress 10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "PUT /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true HTTP/1.1" 202 58 [2018-02-23T06:32:22.902Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 1 10.128.0.1 - - [23/Feb/2018:06:32:22 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29 10.128.0.1 - - [23/Feb/2018:06:32:23 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29 10.128.0.1 - - [23/Feb/2018:06:32:27 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 29 [2018-02-23T06:32:27.907Z] [INFO] - Watch pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] tick 2 [2018-02-23T06:32:27.91Z] [ERROR] - Provision or Update action failed - Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2] [2018-02-23T06:32:27.91Z] [INFO] - Destroying APB sandbox... [2018-02-23T06:32:27.915Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace rh-mediawiki-apb-prov-s5nnj [2018-02-23T06:32:27.918Z] [NOTICE] - Successfully deleted rolebinding apb-d8dbf724-b05d-409b-bf90-a9d910a399c0, namespace provision-fail [2018-02-23T06:32:27.921Z] [ERROR] - broker::Provision error occurred. Pod [ apb-d8dbf724-b05d-409b-bf90-a9d910a399c0 ] failed with exit code [2] 10.128.0.1 - - [23/Feb/2018:06:32:35 +0000] "GET /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8/last_operation?operation=cd31b5d4-2a79-4352-864a-b509678752ea&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 200 122 [2018-02-23T06:33:03.226Z] [INFO] - ASYNC deprovision in progress 10.128.0.1 - - [23/Feb/2018:06:33:03 +0000] "DELETE /ansible-service-broker/v2/service_instances/ac81d124-d14a-4c0c-a4d3-84617ec9eef8?accepts_incomplete=true&plan_id=43d3e23d214c26dbebc0879e44425db4&service_id=03b69500305d9859bb9440d9f9023784 HTTP/1.1" 202 58 [2018-02-23T06:33:03.244Z] [ERROR] - failed to delete extracted credentials - 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286] [2018-02-23T06:33:03.244Z] [ERROR] - Failed cleaning up deprovision after job, error: 100: Key not found (/extracted_credentials/ac81d124-d14a-4c0c-a4d3-84617ec9eef8) [1286]
I believe the issue here is that the ansible broker has to make some resources in the namespace that is being deleted - which can't happen because the namespace is in a terminating state. Of course, I'm not an expert in the ansible broker's behavior, but I wanted to call out that we've talked about an issue that sounds extremely similar to this before.
@Paul, dug a little bit into this today and I believe we're erroneously sending {"state": "failed"} back from the last_operation endpoint, instead of a {"state": success"}. However, I am seeing the catalog continue to poll last_operation. The spec states: "A response with "state": "succeeded" or "state": "failed" MUST cause the Platform to cease polling." Am I correct in that regardless of whether or not we return success or failed, the catalog should stop polling in this case?
PR: https://github.com/openshift/ansible-service-broker/pull/790
http://pkgs.devel.redhat.com/cgit/rpms/ansible-service-broker/commit/?h=rhaos-3.9-asb-rhel-7&id=baca022a5ed5e86a12e6a07b5fe3ffa69b5b2077
Verified with latest asb image 1.1.15 from brew registry. Test step follow description of this bug.