1507440 – some servicebindings will be blocked to unbind status when delete multi servicebindings at same time

Bug 1507440 - some servicebindings will be blocked to unbind status when delete multi servicebindings at same time

Summary: some servicebindings will be blocked to unbind status when delete multi servi...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Broker
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Jay Boyd
QA Contact:	XiuJuan Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-30 09:26 UTC by XiuJuan Wang
Modified:	2022-03-13 14:30 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-08-28 14:46:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
failedtodeprovisionserviceinstance (124.70 KB, image/png) 2017-10-30 09:58 UTC, XiuJuan Wang	no flags	Details
three x controller logs (8.64 MB, application/zip) 2018-02-08 16:39 UTC, David Caldwell	no flags	Details
oc describe all -n kube-service-catalog (15.98 KB, text/plain) 2018-02-08 16:40 UTC, David Caldwell	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3305241	0	None	None	None	2018-01-02 07:08:52 UTC

Internal Links: 1743273

Description XiuJuan Wang 2017-10-30 09:26:20 UTC

Description of problem:
Provision a clusterserviceclass via browse catalog,after provision is ready,create binding twice.Can't deprovision serviceinstance via browse catalog after binding 2 mins.
Could deprovision serviceinstance successfully after binding immediately

Version-Release number of selected component (if applicable):
brew-*/ose-service-catalog   v3.7.0-0.185.0.0    ef0dd815574a        2 days ago          266 MB


How reproducible:
always

Steps to Reproduce:
1.Provision jenkins-ephemral clusterserviceclass via browse catalog
2.After serviceinstance is ready,creating binging twice
3.Wait 2 mins, deprovision serviceinstance via browse catalog

Actual results:
Can't deprovision successfully due to 'Delete instance blocked by existing ServiceBindings associated with this instance. All credentials must be removed first'

Bindings have been deleted when checked from cli side.

Expected results:
Can't deprovision successfully 

Additional info:

Comment 1 XiuJuan Wang 2017-10-30 09:58:07 UTC

Created attachment 1345358 [details]
failedtodeprovisionserviceinstance

Comment 2 Paul Morie 2017-10-30 14:49:23 UTC

This sounds to me like this is working as expected - deprovision of the instance is blocked until there are no bindings to the instance remaining.  I'm putting NEEDINFO on this so we can dig into why the bindings were not fully deleted after so much time.  Would you please provide:

- service catalog controller logs
- service catalog api server logs
- output of `oc get servicebindings -o yaml`
- output of `oc get serviceinstances -o yaml`

Comment 3 Matthew Staebler 2017-10-31 02:16:24 UTC

I have reproduced this bug.

When the Jenkins (Ephemeral) service is deleted via the web console, the two ServiceBindings and the ServiceInstance are all deleted in rapid succession. The service catalog controller sends Unbind requests to the template service broker for the two ServiceBindings in rapid succession. When the template service broker receives the first Unbind request, the template service broker makes modifications to the provisioned instance that temporarily makes it unavailable. When the template service broker receives the second Unbind request, the template service broker rejects the request with a 500 response indicating that the instance has been modified and the request should be retried with the latest version [1].

Subsequent Unbind requests for the binding succeed. The binding is deleted from the template service broker. However, the service catalog retains the ServiceBinding and thinks that it has not been unbound. The only reason that the service catalog sends another Unbind request is because there is an outstanding bug in the service catalog that causes the controller to repeat OSB requests. If that bug were fixed, then the service catalog would not send any additional Unibind requests after receiving the 500 response from the initial Unbind request for the binding.

[1] This is the full text of the 500 response from the template service broker. "f95fb251-289-4002-9323-0cfe751d2040" is the ID of the instance.

Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io "f95fb251-289-4002-9323-0cfe751d2040": the object has been modified; please apply your changes to the latest version and try again

Comment 5 Matthew Staebler 2017-10-31 16:55:20 UTC

Upstream issue created to address this bug.

https://github.com/kubernetes-incubator/service-catalog/issues/1503

Comment 6 Paul Morie 2017-11-06 15:13:56 UTC

Upstream PR: https://github.com/kubernetes-incubator/service-catalog/pull/1505

Fixed in origin with: https://github.com/openshift/origin/pull/17166

Comment 9 XiuJuan Wang 2017-11-10 09:47:40 UTC

Update my bug description.

some servicebindings will be blocked to unbind status when delete multi servicebindings at same time.

#oc  get servicebinding  -n xiu12 
NAME                            AGE
mongodb-ephemeral-ncd5b-j78hl   51s
mongodb-ephemeral-ncd5b-mhhhw   48s
mongodb-ephemeral-ncd5b-wbvbk   45s
# oc  delete servicebinding  --all  -n xiu12 
servicebinding "mongodb-ephemeral-ncd5b-j78hl" deleted
servicebinding "mongodb-ephemeral-ncd5b-mhhhw" deleted
servicebinding "mongodb-ephemeral-ncd5b-wbvbk" deleted

oc  get servicebinding  -n xiu12 
NAME                            AGE
mongodb-ephemeral-ncd5b-mhhhw   3m
mongodb-ephemeral-ncd5b-wbvbk   3m

# oc  get  servicebinding mongodb-ephemeral-ncd5b-mhhhw  -n xiu12  -o yaml 
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
  creationTimestamp: 2017-11-10T09:42:15Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2017-11-10T09:43:12Z
  finalizers:
  - kubernetes-incubator/service-catalog
  generateName: mongodb-ephemeral-ncd5b-
  generation: 2
  name: mongodb-ephemeral-ncd5b-mhhhw
  namespace: xiu12
  resourceVersion: "16515"
  selfLink: /apis/servicecatalog.k8s.io/v1beta1/namespaces/xiu12/servicebindings/mongodb-ephemeral-ncd5b-mhhhw
  uid: 6f8e8246-c5fb-11e7-a8a8-0a580a800003
spec:
  externalID: 15f1c433-03a3-4912-b6d6-30bf29003a29
  instanceRef:
    name: mongodb-ephemeral-ncd5b
  secretName: mongodb-ephemeral-ncd5b-credentials-gk1s7
  userInfo:
    groups:
    - system:cluster-admins
    - system:authenticated
    uid: ""
    username: system:admin
status:
  conditions:
  - lastTransitionTime: 2017-11-10T09:43:12Z
    message: 'Unbind call failed. Error unbinding from ServiceInstance "xiu12/mongodb-ephemeral-ncd5b"
      of ClusterServiceClass (K8S: "4926f6cb-c5ec-11e7-9993-fa163e1a3633" ExternalName:
      "mongodb-ephemeral") at ClusterServiceBroker "template-service-broker": Status:
      500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io
      "e9fd4c45-e565-42be-80bc-2a67165011bb": the object has been modified; please
      apply your changes to the latest version and try again; ResponseError: <nil>'
    reason: UnbindCallFailed
    status: Unknown
    type: Ready
  - lastTransitionTime: 2017-11-10T09:43:12Z
    message: 'Unbind call failed. Error unbinding from ServiceInstance "xiu12/mongodb-ephemeral-ncd5b"
      of ClusterServiceClass (K8S: "4926f6cb-c5ec-11e7-9993-fa163e1a3633" ExternalName:
      "mongodb-ephemeral") at ClusterServiceBroker "template-service-broker": Status:
      500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io
      "e9fd4c45-e565-42be-80bc-2a67165011bb": the object has been modified; please
      apply your changes to the latest version and try again; ResponseError: <nil>'
    reason: UnbindCallFailed
    status: "True"
    type: Failed
  externalProperties:
    userInfo:
      extra:
        scopes.authorization.openshift.io:
        - user:full
      groups:
      - system:authenticated:oauth
      - system:authenticated
      uid: ""
      username: xiuwang
  orphanMitigationInProgress: false
  reconciledGeneration: 2

Comment 10 XiuJuan Wang 2017-11-10 09:58:57 UTC

Depend on https://bugzilla.redhat.com/show_bug.cgi?id=1511847.
When deprovision serviceinstance from web console,multi servicebindings will be removed firstly at same time.So there is the behavior as comment #0 description.

Comment 14 Jay Boyd 2018-01-25 14:28:55 UTC

I reproduce this in MASTER, one of the unbind requests fails with error 500:

I0125 13:59:28.209966       1 event.go:218] Event(v1.ObjectReference{Kind:"ServiceBinding", Namespace:"proj", Name:"jenkins-ephemeral-gxs6s-9q5jk", UID:"9acd5e9f-0087-11e8-a158-0242ac110004", APIVersion:"servic
ecatalog.k8s.io", ResourceVersion:"686", FieldPath:""}): type: 'Warning' reason: 'UnbindCallFailed' Error unbinding from ServiceInstance "proj/jenkins-ephemeral-gxs6s" of ClusterServiceClass (K8S: "a61a7f29-008
6-11e8-a43f-c85b76aab8ec" ExternalName: "jenkins-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinst
ances.template.openshift.io "958386a6-dd59-4f9e-b526-a0b3c73a1225": the object has been modified; please apply your changes to the latest version and try again; ResponseError: <nil>


and any subsequent delete attempts are rejected:
I0125 14:01:50.966188       1 controller_binding.go:615] ServiceBinding "proj/jenkins-ephemeral-gxs6s-9q5jk": Processing Delete
I0125 14:01:50.966207       1 controller_binding.go:625] ServiceBinding "proj/jenkins-ephemeral-gxs6s-9q5jk": Not processing delete event because unbinding has failed


This has been recently fixed upstream by https://github.com/kubernetes-incubator/service-catalog/pull/1653  on January 8, 2018.

There is no work around, --force=true does not work.

Comment 19 Jay Boyd 2018-02-07 15:35:25 UTC

fixed by https://github.com/openshift/origin/pull/18480

Comment 23 David Caldwell 2018-02-08 16:39:27 UTC

Created attachment 1393263 [details]
three x controller logs

Attach in response to comment #22

Comment 24 David Caldwell 2018-02-08 16:40:59 UTC

Created attachment 1393264 [details]
oc describe all -n kube-service-catalog

Attached in response to comment #22

Comment 30 XiuJuan Wang 2018-02-13 02:47:12 UTC

The fix has merged into ocp v3.9.0-0.42.0.
And can't reproduce this bug,could deprovision serviceinstance successfully when multiple servicebinding  existing.
Please help set this bug to on_qa when you are free.
Thanks

Comment 31 XiuJuan Wang 2018-02-14 02:31:24 UTC

As comment #30, move this bug to verified

Comment 42 Alexander Rydekull 2018-11-02 11:00:50 UTC

Issue still exists in 3.11:

[root@ocpmaster1 ~]# oc get projects
NAME                                DISPLAY NAME        STATUS
default                                                 Active
kube-public                                             Active
kube-service-catalog                                    Active
kube-system                                             Active
management-infra                                        Active
openshift                                               Active
openshift-ansible-service-broker                        Active
openshift-console                                       Active
openshift-infra                                         Active
openshift-logging                                       Active
openshift-monitoring                                    Active
openshift-node                                          Active
openshift-sdn                                           Active
openshift-storage                                       Active
openshift-template-service-broker                       Active
openshift-web-console                                   Active
test                                                    Terminating
[root@ocpmaster1 ~]# oc get projects | grep Terminating
test                                                    Terminating
[root@ocpmaster1 ~]# oc get serviceinstance -n test
NAME                           CLASS                                        PLAN      STATUS         AGE
django-psql-persistent-qjqnb   ClusterServiceClass/django-psql-persistent   default   Provisioning   1d
[root@ocpmaster1 ~]# 

Implement workaround from: https://github.com/openshift/origin/issues/19660#issuecomment-388797440

[root@ocpmaster1 ~]# for i in $(oc get projects  | grep Terminating| awk '{print $1}'); do echo $i; oc get serviceinstance -n $i -o yaml | sed "/kubernetes-incubator/d"| oc apply -f - ; done
test
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
serviceinstance.servicecatalog.k8s.io/django-psql-persistent-qjqnb configured
[root@ocpmaster1 ~]#

Verify that it is gone.

[root@ocpmaster1 ~]# oc get projects
NAME                                DISPLAY NAME        STATUS
default                                                 Active
kube-public                                             Active
kube-service-catalog                                    Active
kube-system                                             Active
management-infra                                        Active
openshift                                               Active
openshift-ansible-service-broker                        Active
openshift-console                                       Active
openshift-infra                                         Active
openshift-logging                                       Active
openshift-monitoring                                    Active
openshift-node                                          Active
openshift-sdn                                           Active
openshift-storage                                       Active
openshift-template-service-broker                       Active
openshift-web-console                                   Active
[root@ocpmaster1 ~]#


Success!

Comment 43 Jay Boyd 2018-11-02 14:49:53 UTC

Hey Alex, this is a thorny issue.  Initially there were some bugs in earlier releases attributing to these issues where we had Service Instances or Service Bindings that clould not be deleted.  What we primarily see now though is the Service Broker is having issues freeing up resources when the user asks for them to be deleted.  Customers may be charged for any resources allocated by a Broker so we don't want to proceed with forgetting about them.

Because the Broker is telling Service Catalog that the resources may not be properly deleted or are in some unknown state, Service Catalog is not going to free up the tracking of them.

Take a look at the status of the resources - try `oc describe django-psql-persistent-qjqnb` and review the events and status.  

This is well documented by https://access.redhat.com/articles/3441161

Without knowing the status of your service instance, I'd initially say this is working as designed.   That work around you quoted is a really big hammer and would not be advisable in any production environment.  If the status shows some issues/errors from the Broker, please take it up with the Broker team.   If you think you have other issues, please open a new BZ.

You may want to review https://bugzilla.redhat.com/show_bug.cgi?id=1623918#c9 (comments 9 & 10) - my guess is your issue falls into this category.

Comment 44 Alexander Rydekull 2018-11-22 09:19:43 UTC

I can try and catch that information(oc describe) once the issue is seen again. With that being said.

I looked around in the project, and there was literally nothing left then the "Provisioned Service" blocking the deletion of the project.

Comment 45 Alexander Rydekull 2018-11-22 09:36:57 UTC

Well, arent we lucky. Just looked in the environment and found the following:

[root@ocpmaster1 ~]# oc get projects | grep Terminating
my-buildtest                                                Terminating
my-jenkins                                                  Terminating
my-jenkins2                                                 Terminating
[root@ocpmaster1 ~]#


Looks like we have some projects hanging around.

[root@ocpmaster1 ~]# oc project my-jenkins
Now using project "my-jenkins" on server "https://ocp.example.com:443".
[root@ocpmaster1 ~]# oc get serviceinstance
NAME                       CLASS                                    PLAN      STATUS                      AGE
jenkins-persistent-5zwfr   ClusterServiceClass/jenkins-persistent   default   ErrorPollingLastOperation   16h
[root@ocpmaster1 ~]# oc describe serviceinstance jenkins-persistent-5zwfr
Name:         jenkins-persistent-5zwfr
Namespace:    my-jenkins
Labels:       <none>
Annotations:  <none>
API Version:  servicecatalog.k8s.io/v1beta1
Kind:         ServiceInstance
Metadata:
  Creation Timestamp:             2018-11-21T16:44:16Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2018-11-22T09:27:30Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:     jenkins-persistent-
  Generation:        2
  Resource Version:  318119
  Self Link:         /apis/servicecatalog.k8s.io/v1beta1/namespaces/my-jenkins/serviceinstances/jenkins-persistent-5zwfr
  UID:               af4bdf63-edac-11e8-9e9c-0a580a810008
Spec:
  Cluster Service Class External Name:  jenkins-persistent
  Cluster Service Class Ref:
    Name:                              c001f40b-eda3-11e8-8886-0050569122cd
  Cluster Service Plan External Name:  default
  Cluster Service Plan Ref:
    Name:       c001f40b-eda3-11e8-8886-0050569122cd
  External ID:  af4bdebd-edac-11e8-9e9c-0a580a810008
  Parameters From:
    Secret Key Ref:
      Key:          parameters
      Name:         jenkins-persistent-parametersljdmx
  Update Requests:  0
  User Info:
    Groups:
      system:serviceaccounts
      system:serviceaccounts:kube-system
      system:authenticated
    UID:
    Username:  system:serviceaccount:kube-system:namespace-controller
Status:
  Async Op In Progress:  true
  Conditions:
    Last Transition Time:  2018-11-21T16:44:16Z
    Message:               Error polling last operation: Status: 500; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "af4bdebd-edac-11e8-9e9c-0a580a810008" not found; ResponseError: <nil>
    Reason:                ErrorPollingLastOperation
    Status:                False
    Type:                  Ready
  Current Operation:       Provision
  Deprovision Status:      Required
  In Progress Properties:
    Cluster Service Plan External ID:    c001f40b-eda3-11e8-8886-0050569122cd
    Cluster Service Plan External Name:  default
    Parameter Checksum:                  b327e98626e9dfb9fc9c944227ebe27f780e0e72e535c8df4ff272fd1072fd2a
    Parameters:
      DISABLE _ ADMINISTRATIVE _ MONITORS:  <redacted>
      ENABLE _ FATAL _ ERROR _ LOG _ FILE:  <redacted>
      ENABLE _ OAUTH:                       <redacted>
      JENKINS _ IMAGE _ STREAM _ TAG:       <redacted>
      JENKINS _ SERVICE _ NAME:             <redacted>
      JNLP _ SERVICE _ NAME:                <redacted>
      MEMORY _ LIMIT:                       <redacted>
      NAMESPACE:                            <redacted>
      VOLUME _ CAPACITY:                    <redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:
      Username:                   my
  Last Operation:                 provisioning
  Observed Generation:            1
  Operation Start Time:           2018-11-22T08:26:22Z
  Orphan Mitigation In Progress:  false
  Provision Status:               NotProvisioned
  Reconciled Generation:          0
Events:                           <none>
[root@ocpmaster1 ~]#





[root@ocpmaster1 ~]# oc project my-jenkins2
Now using project "my-jenkins2" on server "https://ocp.example.com:443".
[root@ocpmaster1 ~]# oc get serviceinstance
NAME                       CLASS                                    PLAN      STATUS         AGE
jenkins-persistent-pq76s   ClusterServiceClass/jenkins-persistent   default   Provisioning   29m
[root@ocpmaster1 ~]# oc describe serviceinstance jenkins-persistent-pq76s
Name:         jenkins-persistent-pq76s
Namespace:    my-jenkins2
Labels:       <none>
Annotations:  <none>
API Version:  servicecatalog.k8s.io/v1beta1
Kind:         ServiceInstance
Metadata:
  Creation Timestamp:             2018-11-22T09:02:18Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2018-11-22T09:32:09Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:     jenkins-persistent-
  Generation:        2
  Resource Version:  319521
  Self Link:         /apis/servicecatalog.k8s.io/v1beta1/namespaces/my-jenkins2/serviceinstances/jenkins-persistent-pq76s
  UID:               50b9da97-ee35-11e8-9e9c-0a580a810008
Spec:
  Cluster Service Class External Name:  jenkins-persistent
  Cluster Service Class Ref:
    Name:                              c001f40b-eda3-11e8-8886-0050569122cd
  Cluster Service Plan External Name:  default
  Cluster Service Plan Ref:
    Name:       c001f40b-eda3-11e8-8886-0050569122cd
  External ID:  50b9d9d7-ee35-11e8-9e9c-0a580a810008
  Parameters From:
    Secret Key Ref:
      Key:          parameters
      Name:         jenkins-persistent-parametersoic39
  Update Requests:  0
  User Info:
    Groups:
      system:serviceaccounts
      system:serviceaccounts:kube-system
      system:authenticated
    UID:
    Username:  system:serviceaccount:kube-system:namespace-controller
Status:
  Async Op In Progress:  true
  Conditions:
    Last Transition Time:  2018-11-22T09:02:19Z
    Message:               The instance is being provisioned asynchronously
    Reason:                Provisioning
    Status:                False
    Type:                  Ready
  Current Operation:       Provision
  Deprovision Status:      Required
  In Progress Properties:
    Cluster Service Plan External ID:    c001f40b-eda3-11e8-8886-0050569122cd
    Cluster Service Plan External Name:  default
    Parameter Checksum:                  b6719d10b1549e6544afd18b8fa1475c80a6c6e3a8083a97cf588bb8f6676d7d
    Parameters:
      DISABLE _ ADMINISTRATIVE _ MONITORS:  <redacted>
      ENABLE _ FATAL _ ERROR _ LOG _ FILE:  <redacted>
      ENABLE _ OAUTH:                       <redacted>
      JENKINS _ IMAGE _ STREAM _ TAG:       <redacted>
      JENKINS _ SERVICE _ NAME:             <redacted>
      JNLP _ SERVICE _ NAME:                <redacted>
      MEMORY _ LIMIT:                       <redacted>
      NAMESPACE:                            <redacted>
      VOLUME _ CAPACITY:                    <redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        ak000899_app00602_mgmt
        ak000899_team
        system:authenticated:oauth
        system:authenticated
      UID:
      Username:                   my
  Last Operation:                 provisioning
  Observed Generation:            1
  Operation Start Time:           2018-11-22T09:02:19Z
  Orphan Mitigation In Progress:  false
  Provision Status:
  Reconciled Generation:          0
Events:                           <none>
[root@ocpmaster1 ~]#

Comment 46 Alexander Rydekull 2018-11-22 09:51:01 UTC

I looked at the other BZ mentioned, and I cannot agree with the conclusion from comment #9 that it isn't a bug. When a project is supposed to terminate and it doesn't, its definitely a bug when its blocked by this without any feedback.

According to comment #10 I understand that it is working as designed, but the problem is also the reasonable expectation from the action which the user executes. And someone who deletes a project expects it to be deleted, all resources in it also. Not blocking.

And its the lack of any feedback or possibility to actually execute the delete that merits it as a bug.

---

But I'll gladly try and figure out the underlying reason why its happening, so if you want me to dig into some information please give me some pointers and i'll look for it.

Comment 47 Alexander Rydekull 2018-12-17 09:43:44 UTC

This is appearing over and over again in several clusters that I cross paths with. 

I still have no 'fool proof' way to recreate it, but it happens every now and then.

Note You need to log in before you can comment on or make changes to this bug.