Description of problem: When a user goes to deprovision a ServiceInstance and the deprovision fails the service instance should move into an error state rather than removed as if the operation were succesful. How reproducible: Whenever deprovision fails. Steps to Reproduce: 1. Launch openshift origin with service catalog and ansible service broker (in developer mode so you can push an apb that will reliably fail on deprovision) 2. Create an apb that will fail on deprovision and push it to the broker $ docker run --rm --privileged -v $PWD:/mnt -v $HOME/.kube:/.kube -v /var/run/docker.sock:/var/run/docker.sock -u $UID docker.io/ansibleplaybookbundle/apb init $ cd fail-apb # Modify deprovision playbook so that it will always fail $ cat playbooks/deprovision.yml - name: fail-apb playbook to deprovision the application hosts: localhost gather_facts: false connection: local tasks: - command: /bin/false $ docker run --rm --privileged -v $PWD:/mnt -v $HOME/.kube:/.kube -v /var/run/docker.sock:/var/run/docker.sock -u $UID docker.io/ansibleplaybookbundle/apb push 3. Provision the apb 4. Deprovision the apb Actual results: $ oc get serviceinstances --all-namespaces No resources found. Expected results: For the serviceinstance to still exist but have an error status. Additional info: Relevant portion of logs can be found here https://gist.github.com/djzager/d8cd444a0f5a9fa057417277de148f7d
This is definitely a bug - I created https://github.com/kubernetes-incubator/service-catalog/issues/1437 for this.
Fix will be delivered into origin in https://github.com/openshift/origin/pull/17075.
Users can't deprovision a serviceinstance on web UI? Just found "delete" button, and asb log won't show "DEPROVISION" process when I click it. Thanks for your info. I'm going to follow steps you provided.
I can answer question #2 on @mstaeble's behalf as best I can. While nothing can be guaranteed about a ServiceInstance that ends up in the Unknown or Error state, what you are seeing with Pods and RCs remaining is related to https://bugzilla.redhat.com/show_bug.cgi?id=1508969.
Presumably, with the fake command in the deprovision playbook, the broker is responding to the deprovision request with a 500 Internal Server Error status. The service catalog does not know at that point what the state of the instance is in the broker, hence the Ready/Unknown condition. The service catalog will continue to make deprovision requests for the ServiceInstance to the broker until either (1) a response is received that tells the service catalog definitively what the state of the instance is in the broker or (2) the reconciliation retry duration elapses.
@David @Matthew Thanks your clarify! The question #2 of this bug will depend on the https://bugzilla.redhat.com/show_bug.cgi?id=1508969
This looks to me to be working as expected. When you delete the ServiceInstance, the service catalog attempts to deprovision the resource on the broker. The deprovision request is failing. The service catalog will retry the deprovision request until the reconciliation retry duration elapses. The service catalog will not remove the finalizer for the ServiceInstance unless the deprovision request completes successfully. This is done so that the user has an opportunity to see that the ServiceInstance was not deprovisioned and coordinate with the broker directly to delete whatever resources need to be deleted. In Step 3, you say > the serviceinstance was kept as expected, but I could not force-delete it What commands were made to force-delete the ServiceInstance? And what was the output from the attempted force-delete?
@Matthew Yeah... I see, but as this https://github.com/kubernetes-incubator/service-catalog/issues/1437 described: > Instead, we should leave the instance in a failed state until the user force-deletes it, I think here the "force-delete" just mean the "oc delete xxx" commands used by a user. That "force-delete" in step 3 means this too. Sorry for not clear here. So, for the above example, if I want to delete the serviceinstance, how to do that? In my opinion, as a user or administrator, if I want to delete a resource created by the provision, the resource should be deleted.
A force-delete is an "oc delete" with parameters "--grace-period=0" and "--force". For example, oc delete serviceinstance dh-rhscl-mysql-apb-dbwdr \ -n instance6 \ --grace-period=0 \ --force
I think the failed serviceinstance should be deleted by manually. The current test result not looks good to us. I'm changing the status to ASSIGNED. You can move back if we have a mistake. Thanks.
There is an issue with force deleting ServiceInstances and ServiceBindings that is being tracked upstream with https://github.com/kubernetes-incubator/service-catalog/issues/1551. However, the basic intention of this bug is that a ServiceInstance that cannot be deprovisioned successfully should not be removed from storage until it is force deleted. That basic intention is working as expected.
Update tile of bug for better trace the current issue. Furthermore, I hit a similar issue, and not sure if can be covered in pr: https://github.com/kubernetes-incubator/service-catalog/issues/1551 1. Create a serviceinstance 2. Edit clusterservicebroker url to a invalid value 3. Delete the serviceinstance 4. Force delete serviceinstance Actual result: Cannot force delete the serviceinstance Expect result: Force delete should be succeed.
Is this still a bug, given the clarification made in https://bugzilla.redhat.com/show_bug.cgi?id=1541350#c5 ?
Changed status to ON_QA since Paul's clarification.
Per Paul's description, we need to delete the "finalizers" first before deleting a failed serviceinstance. 1, A failed serviceinstnace. [root@host-172-16-120-7 ~]# oc get serviceinstance -n jian NAME AGE dh-hello-test-apb-tn8nk 14m dh-hello-test-apb-xpkls 12m [root@host-172-16-120-7 ~]# oc describe serviceinstance dh-hello-test-apb-tn8nk -n jian Name: dh-hello-test-apb-tn8nk Namespace: jian Labels: <none> Annotations: <none> API Version: servicecatalog.k8s.io/v1beta1 Kind: ServiceInstance Metadata: Creation Timestamp: 2018-02-27T07:00:10Z Finalizers: kubernetes-incubator/service-catalog Generate Name: dh-hello-test-apb- Generation: 1 Resource Version: 43725 Self Link: /apis/servicecatalog.k8s.io/v1beta1/namespaces/jian/serviceinstances/dh-hello-test-apb-tn8nk UID: d9f0ed18-1b8b-11e8-8928-0a580a800004 Spec: Cluster Service Class External Name: dh-hello-test-apb Cluster Service Class Ref: Name: 0a8f417c71d090c39dc2ba73f538c148 Cluster Service Plan External Name: faildeprovision Cluster Service Plan Ref: Name: befb688fbb048a00128951e8b68913b4 External ID: ba841d04-713c-4b49-a758-f425de1ec28b Update Requests: 0 User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: system:authenticated:oauth system:authenticated UID: Username: jiazha Status: Async Op In Progress: false Conditions: Last Transition Time: 2018-02-27T07:00:11Z Message: Provision call failed: Error occurred during provision. Please contact administrator if it persists. Reason: ProvisionCallFailed Status: False Type: Ready Last Transition Time: 2018-02-27T07:01:13Z Message: Provision call failed: Error occurred during provision. Please contact administrator if it persists. Reason: ProvisionCallFailed Status: True Type: Failed Deprovision Status: Required Orphan Mitigation In Progress: false Reconciled Generation: 1 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 14m service-catalog-controller-manager The instance is being provisioned asynchronously Warning ProvisionCallFailed 13m (x2 over 13m) service-catalog-controller-manager Provision call failed: Error occurred during provision. Please contact administrator if it persists. 2, Delete the "finalizers". Delete the below content: finalizers: - kubernetes-incubator/service-catalog [root@host-172-16-120-7 ~]# oc edit serviceinstance dh-hello-test-apb-tn8nk -n jian serviceinstance "dh-hello-test-apb-tn8nk" edited 3, Delete this serviceinstnace [root@host-172-16-120-7 ~]# oc delete serviceinstance dh-hello-test-apb-tn8nk -n jian serviceinstance "dh-hello-test-apb-tn8nk" deleted 4, Check it if still exist. [root@host-172-16-120-7 ~]# oc get serviceinstance -n jian No resources found. So, LGTM. Changed the status to "VERIFIED". Furthermore, we have a doc bug to trace this clarification. Here: https://bugzilla.redhat.com/show_bug.cgi?id=1548618
Assigning back to Paul as I actually had nothing to do with this issue.