Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1503829

Summary:	Cannot force delete ServiceInstances when deprovision fails
Product:	OpenShift Container Platform	Reporter:	David Zager <dzager>
Component:	Service Broker	Assignee:	Paul Morie <pmorie>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Jian Zhang <jiazha>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.7.0	CC:	admin, aos-bugs, chezhang, dzager, jaboyd, jdesousa, jiazha, mstaeble, pmorie, qixuan.wang, vlaad, wmeng
Target Milestone:	---
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	closed as working as intended, we published a document about dealing with stuck resources: https://access.redhat.com/articles/3441161	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-29 21:25:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description David Zager 2017-10-18 20:49:41 UTC

Description of problem:

When a user goes to deprovision a ServiceInstance and the deprovision fails the service instance should move into an error state rather than removed as if the operation were succesful.

How reproducible: Whenever deprovision fails.

Steps to Reproduce:
1. Launch openshift origin with service catalog and ansible service broker (in developer mode so you can push an apb that will reliably fail on deprovision)


2. Create an apb that will fail on deprovision and push it to the broker
$ docker run --rm --privileged -v $PWD:/mnt -v $HOME/.kube:/.kube -v /var/run/docker.sock:/var/run/docker.sock -u $UID docker.io/ansibleplaybookbundle/apb init 
$ cd fail-apb
# Modify deprovision playbook so that it will always fail
$ cat playbooks/deprovision.yml
- name: fail-apb playbook to deprovision the application
  hosts: localhost
  gather_facts: false
  connection: local
  tasks:
  - command: /bin/false
$ docker run --rm --privileged -v $PWD:/mnt -v $HOME/.kube:/.kube -v /var/run/docker.sock:/var/run/docker.sock -u $UID docker.io/ansibleplaybookbundle/apb push

3. Provision the apb
4. Deprovision the apb

Actual results:

$ oc get serviceinstances --all-namespaces
No resources found.

Expected results:

For the serviceinstance to still exist but have an error status.

Additional info:

Relevant portion of logs can be found here https://gist.github.com/djzager/d8cd444a0f5a9fa057417277de148f7d

Comment 1 Paul Morie 2017-10-19 18:23:31 UTC

This is definitely a bug - I created https://github.com/kubernetes-incubator/service-catalog/issues/1437 for this.

Comment 2 Paul Morie 2017-10-30 14:41:15 UTC

Fix will be delivered into origin in https://github.com/openshift/origin/pull/17075.

Comment 4 Qixuan Wang 2017-11-05 18:42:13 UTC

Users can't deprovision a serviceinstance on web UI? Just found "delete" button, and asb log won't show "DEPROVISION" process when I click it. Thanks for your info. I'm going to follow steps you provided.

Comment 6 David Zager 2017-11-07 14:53:32 UTC

I can answer question #2 on @mstaeble's behalf as best I can. While nothing can be guaranteed about a ServiceInstance that ends up in the Unknown or Error state, what you are seeing with Pods and RCs remaining is related to https://bugzilla.redhat.com/show_bug.cgi?id=1508969.

Comment 7 Matthew Staebler 2017-11-07 15:14:48 UTC

Presumably, with the fake command in the deprovision playbook, the broker is responding to the deprovision request with a 500 Internal Server Error status. The service catalog does not know at that point what the state of the instance is in the broker, hence the Ready/Unknown condition. The service catalog will continue to make deprovision requests for the ServiceInstance to the broker until either (1) a response is received that tells the service catalog definitively what the state of the instance is in the broker or (2) the reconciliation retry duration elapses.

Comment 8 Jian Zhang 2017-11-08 07:11:55 UTC

@David @Matthew Thanks your clarify! The question #2 of this bug will depend on the https://bugzilla.redhat.com/show_bug.cgi?id=1508969

Comment 10 Matthew Staebler 2017-11-09 14:20:35 UTC

This looks to me to be working as expected. When you delete the ServiceInstance, the service catalog attempts to deprovision the resource on the broker. The deprovision request is failing. The service catalog will retry the deprovision request until the reconciliation retry duration elapses. The service catalog will not remove the finalizer for the ServiceInstance unless the deprovision request completes successfully. This is done so that the user has an opportunity to see that the ServiceInstance was not deprovisioned and coordinate with the broker directly to delete whatever resources need to be deleted.

In Step 3, you say
> the serviceinstance was kept as expected, but I could not force-delete it
What commands were made to force-delete the ServiceInstance? And what was the output from the attempted force-delete?

Comment 11 Jian Zhang 2017-11-10 01:44:54 UTC

@Matthew

Yeah... I see, but as this https://github.com/kubernetes-incubator/service-catalog/issues/1437 described:

> Instead, we should leave the instance in a failed state until the user force-deletes it, 

I think here the "force-delete" just mean the "oc delete xxx" commands used by a user. That "force-delete" in step 3 means this too. Sorry for not clear here.

So, for the above example, if I want to delete the serviceinstance, how to do that? In my opinion, as a user or administrator, if I want to delete a resource created by the provision, the resource should be deleted.

Comment 12 Matthew Staebler 2017-11-10 02:22:34 UTC

A force-delete is an "oc delete" with parameters "--grace-period=0" and "--force".

For example,
   oc delete serviceinstance  dh-rhscl-mysql-apb-dbwdr \
      -n instance6 \
      --grace-period=0 \
      --force

Comment 15 Zhang Cheng 2017-11-10 09:02:03 UTC

I think the failed serviceinstance should be deleted by manually. The current test result not looks good to us. I'm changing the status to ASSIGNED. You can move back if we have a mistake. Thanks.

Comment 16 Matthew Staebler 2017-11-13 17:11:27 UTC

There is an issue with force deleting ServiceInstances and ServiceBindings that is being tracked upstream with https://github.com/kubernetes-incubator/service-catalog/issues/1551. However, the basic intention of this bug is that a ServiceInstance that cannot be deprovisioned successfully should not be removed from storage until it is force deleted. That basic intention is working as expected.

Comment 17 Zhang Cheng 2017-11-16 03:37:06 UTC

Update tile of bug for better trace the current issue.

Furthermore, I hit a similar issue, and not sure if can be covered in pr: https://github.com/kubernetes-incubator/service-catalog/issues/1551

1. Create a serviceinstance
2. Edit clusterservicebroker url to a invalid value
3. Delete the serviceinstance
4. Force delete serviceinstance

Actual result: Cannot force delete the serviceinstance

Expect result: Force delete should be succeed.

Comment 19 Paul Morie 2018-02-26 15:15:42 UTC

Is this still a bug, given the clarification made in https://bugzilla.redhat.com/show_bug.cgi?id=1541350#c5 ?

Comment 20 Jian Zhang 2018-02-27 07:24:07 UTC

Changed status to ON_QA since Paul's clarification.

Comment 21 Jian Zhang 2018-02-27 07:36:36 UTC

Per Paul's description, we need to delete the "finalizers" first before deleting a failed serviceinstance.

1, A failed serviceinstnace.
[root@host-172-16-120-7 ~]# oc get serviceinstance -n jian
NAME                      AGE
dh-hello-test-apb-tn8nk   14m
dh-hello-test-apb-xpkls   12m
[root@host-172-16-120-7 ~]# oc describe serviceinstance dh-hello-test-apb-tn8nk -n jian
Name:         dh-hello-test-apb-tn8nk
Namespace:    jian
Labels:       <none>
Annotations:  <none>
API Version:  servicecatalog.k8s.io/v1beta1
Kind:         ServiceInstance
Metadata:
  Creation Timestamp:  2018-02-27T07:00:10Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:     dh-hello-test-apb-
  Generation:        1
  Resource Version:  43725
  Self Link:         /apis/servicecatalog.k8s.io/v1beta1/namespaces/jian/serviceinstances/dh-hello-test-apb-tn8nk
  UID:               d9f0ed18-1b8b-11e8-8928-0a580a800004
Spec:
  Cluster Service Class External Name:  dh-hello-test-apb
  Cluster Service Class Ref:
    Name:                              0a8f417c71d090c39dc2ba73f538c148
  Cluster Service Plan External Name:  faildeprovision
  Cluster Service Plan Ref:
    Name:           befb688fbb048a00128951e8b68913b4
  External ID:      ba841d04-713c-4b49-a758-f425de1ec28b
  Update Requests:  0
  User Info:
    Extra:
      Scopes . Authorization . Openshift . Io:
        user:full
    Groups:
      system:authenticated:oauth
      system:authenticated
    UID:       
    Username:  jiazha
Status:
  Async Op In Progress:  false
  Conditions:
    Last Transition Time:         2018-02-27T07:00:11Z
    Message:                      Provision call failed: Error occurred during provision. Please contact administrator if it persists.
    Reason:                       ProvisionCallFailed
    Status:                       False
    Type:                         Ready
    Last Transition Time:         2018-02-27T07:01:13Z
    Message:                      Provision call failed: Error occurred during provision. Please contact administrator if it persists.
    Reason:                       ProvisionCallFailed
    Status:                       True
    Type:                         Failed
  Deprovision Status:             Required
  Orphan Mitigation In Progress:  false
  Reconciled Generation:          1
Events:
  Type     Reason               Age                From                                Message
  ----     ------               ----               ----                                -------
  Normal   Provisioning         14m                service-catalog-controller-manager  The instance is being provisioned asynchronously
  Warning  ProvisionCallFailed  13m (x2 over 13m)  service-catalog-controller-manager  Provision call failed: Error occurred during provision. Please contact administrator if it persists.

2, Delete the "finalizers". Delete the below content:

  finalizers:
  - kubernetes-incubator/service-catalog

[root@host-172-16-120-7 ~]# oc edit serviceinstance dh-hello-test-apb-tn8nk -n jian
serviceinstance "dh-hello-test-apb-tn8nk" edited

3, Delete this serviceinstnace
[root@host-172-16-120-7 ~]# oc delete serviceinstance dh-hello-test-apb-tn8nk -n jian
serviceinstance "dh-hello-test-apb-tn8nk" deleted

4, Check it if still exist.
[root@host-172-16-120-7 ~]# oc get serviceinstance -n jian
No resources found.

So, LGTM. Changed the status to "VERIFIED". 
Furthermore, we have a doc bug to trace this clarification. Here: https://bugzilla.redhat.com/show_bug.cgi?id=1548618

Comment 22 Marko Luksa 2018-03-02 11:10:06 UTC

Assigning back to Paul as I actually had nothing to do with this issue.