Description of problem: Provision a clusterserviceclass via browse catalog,after provision is ready,create binding twice.Can't deprovision serviceinstance via browse catalog after binding 2 mins. Could deprovision serviceinstance successfully after binding immediately Version-Release number of selected component (if applicable): brew-*/ose-service-catalog v3.7.0-0.185.0.0 ef0dd815574a 2 days ago 266 MB How reproducible: always Steps to Reproduce: 1.Provision jenkins-ephemral clusterserviceclass via browse catalog 2.After serviceinstance is ready,creating binging twice 3.Wait 2 mins, deprovision serviceinstance via browse catalog Actual results: Can't deprovision successfully due to 'Delete instance blocked by existing ServiceBindings associated with this instance. All credentials must be removed first' Bindings have been deleted when checked from cli side. Expected results: Can't deprovision successfully Additional info:
Created attachment 1345358 [details] failedtodeprovisionserviceinstance
This sounds to me like this is working as expected - deprovision of the instance is blocked until there are no bindings to the instance remaining. I'm putting NEEDINFO on this so we can dig into why the bindings were not fully deleted after so much time. Would you please provide: - service catalog controller logs - service catalog api server logs - output of `oc get servicebindings -o yaml` - output of `oc get serviceinstances -o yaml`
I have reproduced this bug. When the Jenkins (Ephemeral) service is deleted via the web console, the two ServiceBindings and the ServiceInstance are all deleted in rapid succession. The service catalog controller sends Unbind requests to the template service broker for the two ServiceBindings in rapid succession. When the template service broker receives the first Unbind request, the template service broker makes modifications to the provisioned instance that temporarily makes it unavailable. When the template service broker receives the second Unbind request, the template service broker rejects the request with a 500 response indicating that the instance has been modified and the request should be retried with the latest version [1]. Subsequent Unbind requests for the binding succeed. The binding is deleted from the template service broker. However, the service catalog retains the ServiceBinding and thinks that it has not been unbound. The only reason that the service catalog sends another Unbind request is because there is an outstanding bug in the service catalog that causes the controller to repeat OSB requests. If that bug were fixed, then the service catalog would not send any additional Unibind requests after receiving the 500 response from the initial Unbind request for the binding. [1] This is the full text of the 500 response from the template service broker. "f95fb251-289-4002-9323-0cfe751d2040" is the ID of the instance. Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io "f95fb251-289-4002-9323-0cfe751d2040": the object has been modified; please apply your changes to the latest version and try again
Upstream issue created to address this bug. https://github.com/kubernetes-incubator/service-catalog/issues/1503
Upstream PR: https://github.com/kubernetes-incubator/service-catalog/pull/1505 Fixed in origin with: https://github.com/openshift/origin/pull/17166
Update my bug description. some servicebindings will be blocked to unbind status when delete multi servicebindings at same time. #oc get servicebinding -n xiu12 NAME AGE mongodb-ephemeral-ncd5b-j78hl 51s mongodb-ephemeral-ncd5b-mhhhw 48s mongodb-ephemeral-ncd5b-wbvbk 45s # oc delete servicebinding --all -n xiu12 servicebinding "mongodb-ephemeral-ncd5b-j78hl" deleted servicebinding "mongodb-ephemeral-ncd5b-mhhhw" deleted servicebinding "mongodb-ephemeral-ncd5b-wbvbk" deleted oc get servicebinding -n xiu12 NAME AGE mongodb-ephemeral-ncd5b-mhhhw 3m mongodb-ephemeral-ncd5b-wbvbk 3m # oc get servicebinding mongodb-ephemeral-ncd5b-mhhhw -n xiu12 -o yaml apiVersion: servicecatalog.k8s.io/v1beta1 kind: ServiceBinding metadata: creationTimestamp: 2017-11-10T09:42:15Z deletionGracePeriodSeconds: 0 deletionTimestamp: 2017-11-10T09:43:12Z finalizers: - kubernetes-incubator/service-catalog generateName: mongodb-ephemeral-ncd5b- generation: 2 name: mongodb-ephemeral-ncd5b-mhhhw namespace: xiu12 resourceVersion: "16515" selfLink: /apis/servicecatalog.k8s.io/v1beta1/namespaces/xiu12/servicebindings/mongodb-ephemeral-ncd5b-mhhhw uid: 6f8e8246-c5fb-11e7-a8a8-0a580a800003 spec: externalID: 15f1c433-03a3-4912-b6d6-30bf29003a29 instanceRef: name: mongodb-ephemeral-ncd5b secretName: mongodb-ephemeral-ncd5b-credentials-gk1s7 userInfo: groups: - system:cluster-admins - system:authenticated uid: "" username: system:admin status: conditions: - lastTransitionTime: 2017-11-10T09:43:12Z message: 'Unbind call failed. Error unbinding from ServiceInstance "xiu12/mongodb-ephemeral-ncd5b" of ClusterServiceClass (K8S: "4926f6cb-c5ec-11e7-9993-fa163e1a3633" ExternalName: "mongodb-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io "e9fd4c45-e565-42be-80bc-2a67165011bb": the object has been modified; please apply your changes to the latest version and try again; ResponseError: <nil>' reason: UnbindCallFailed status: Unknown type: Ready - lastTransitionTime: 2017-11-10T09:43:12Z message: 'Unbind call failed. Error unbinding from ServiceInstance "xiu12/mongodb-ephemeral-ncd5b" of ClusterServiceClass (K8S: "4926f6cb-c5ec-11e7-9993-fa163e1a3633" ExternalName: "mongodb-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinstances.template.openshift.io "e9fd4c45-e565-42be-80bc-2a67165011bb": the object has been modified; please apply your changes to the latest version and try again; ResponseError: <nil>' reason: UnbindCallFailed status: "True" type: Failed externalProperties: userInfo: extra: scopes.authorization.openshift.io: - user:full groups: - system:authenticated:oauth - system:authenticated uid: "" username: xiuwang orphanMitigationInProgress: false reconciledGeneration: 2
Depend on https://bugzilla.redhat.com/show_bug.cgi?id=1511847. When deprovision serviceinstance from web console,multi servicebindings will be removed firstly at same time.So there is the behavior as comment #0 description.
I reproduce this in MASTER, one of the unbind requests fails with error 500: I0125 13:59:28.209966 1 event.go:218] Event(v1.ObjectReference{Kind:"ServiceBinding", Namespace:"proj", Name:"jenkins-ephemeral-gxs6s-9q5jk", UID:"9acd5e9f-0087-11e8-a158-0242ac110004", APIVersion:"servic ecatalog.k8s.io", ResourceVersion:"686", FieldPath:""}): type: 'Warning' reason: 'UnbindCallFailed' Error unbinding from ServiceInstance "proj/jenkins-ephemeral-gxs6s" of ClusterServiceClass (K8S: "a61a7f29-008 6-11e8-a43f-c85b76aab8ec" ExternalName: "jenkins-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 500; ErrorMessage: <nil>; Description: Operation cannot be fulfilled on brokertemplateinst ances.template.openshift.io "958386a6-dd59-4f9e-b526-a0b3c73a1225": the object has been modified; please apply your changes to the latest version and try again; ResponseError: <nil> and any subsequent delete attempts are rejected: I0125 14:01:50.966188 1 controller_binding.go:615] ServiceBinding "proj/jenkins-ephemeral-gxs6s-9q5jk": Processing Delete I0125 14:01:50.966207 1 controller_binding.go:625] ServiceBinding "proj/jenkins-ephemeral-gxs6s-9q5jk": Not processing delete event because unbinding has failed This has been recently fixed upstream by https://github.com/kubernetes-incubator/service-catalog/pull/1653 on January 8, 2018. There is no work around, --force=true does not work.
fixed by https://github.com/openshift/origin/pull/18480
Created attachment 1393263 [details] three x controller logs Attach in response to comment #22
Created attachment 1393264 [details] oc describe all -n kube-service-catalog Attached in response to comment #22
The fix has merged into ocp v3.9.0-0.42.0. And can't reproduce this bug,could deprovision serviceinstance successfully when multiple servicebinding existing. Please help set this bug to on_qa when you are free. Thanks
As comment #30, move this bug to verified
Issue still exists in 3.11: [root@ocpmaster1 ~]# oc get projects NAME DISPLAY NAME STATUS default Active kube-public Active kube-service-catalog Active kube-system Active management-infra Active openshift Active openshift-ansible-service-broker Active openshift-console Active openshift-infra Active openshift-logging Active openshift-monitoring Active openshift-node Active openshift-sdn Active openshift-storage Active openshift-template-service-broker Active openshift-web-console Active test Terminating [root@ocpmaster1 ~]# oc get projects | grep Terminating test Terminating [root@ocpmaster1 ~]# oc get serviceinstance -n test NAME CLASS PLAN STATUS AGE django-psql-persistent-qjqnb ClusterServiceClass/django-psql-persistent default Provisioning 1d [root@ocpmaster1 ~]# Implement workaround from: https://github.com/openshift/origin/issues/19660#issuecomment-388797440 [root@ocpmaster1 ~]# for i in $(oc get projects | grep Terminating| awk '{print $1}'); do echo $i; oc get serviceinstance -n $i -o yaml | sed "/kubernetes-incubator/d"| oc apply -f - ; done test Warning: oc apply should be used on resource created by either oc create --save-config or oc apply serviceinstance.servicecatalog.k8s.io/django-psql-persistent-qjqnb configured [root@ocpmaster1 ~]# Verify that it is gone. [root@ocpmaster1 ~]# oc get projects NAME DISPLAY NAME STATUS default Active kube-public Active kube-service-catalog Active kube-system Active management-infra Active openshift Active openshift-ansible-service-broker Active openshift-console Active openshift-infra Active openshift-logging Active openshift-monitoring Active openshift-node Active openshift-sdn Active openshift-storage Active openshift-template-service-broker Active openshift-web-console Active [root@ocpmaster1 ~]# Success!
Hey Alex, this is a thorny issue. Initially there were some bugs in earlier releases attributing to these issues where we had Service Instances or Service Bindings that clould not be deleted. What we primarily see now though is the Service Broker is having issues freeing up resources when the user asks for them to be deleted. Customers may be charged for any resources allocated by a Broker so we don't want to proceed with forgetting about them. Because the Broker is telling Service Catalog that the resources may not be properly deleted or are in some unknown state, Service Catalog is not going to free up the tracking of them. Take a look at the status of the resources - try `oc describe django-psql-persistent-qjqnb` and review the events and status. This is well documented by https://access.redhat.com/articles/3441161 Without knowing the status of your service instance, I'd initially say this is working as designed. That work around you quoted is a really big hammer and would not be advisable in any production environment. If the status shows some issues/errors from the Broker, please take it up with the Broker team. If you think you have other issues, please open a new BZ. You may want to review https://bugzilla.redhat.com/show_bug.cgi?id=1623918#c9 (comments 9 & 10) - my guess is your issue falls into this category.
I can try and catch that information(oc describe) once the issue is seen again. With that being said. I looked around in the project, and there was literally nothing left then the "Provisioned Service" blocking the deletion of the project.
Well, arent we lucky. Just looked in the environment and found the following: [root@ocpmaster1 ~]# oc get projects | grep Terminating my-buildtest Terminating my-jenkins Terminating my-jenkins2 Terminating [root@ocpmaster1 ~]# Looks like we have some projects hanging around. [root@ocpmaster1 ~]# oc project my-jenkins Now using project "my-jenkins" on server "https://ocp.example.com:443". [root@ocpmaster1 ~]# oc get serviceinstance NAME CLASS PLAN STATUS AGE jenkins-persistent-5zwfr ClusterServiceClass/jenkins-persistent default ErrorPollingLastOperation 16h [root@ocpmaster1 ~]# oc describe serviceinstance jenkins-persistent-5zwfr Name: jenkins-persistent-5zwfr Namespace: my-jenkins Labels: <none> Annotations: <none> API Version: servicecatalog.k8s.io/v1beta1 Kind: ServiceInstance Metadata: Creation Timestamp: 2018-11-21T16:44:16Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2018-11-22T09:27:30Z Finalizers: kubernetes-incubator/service-catalog Generate Name: jenkins-persistent- Generation: 2 Resource Version: 318119 Self Link: /apis/servicecatalog.k8s.io/v1beta1/namespaces/my-jenkins/serviceinstances/jenkins-persistent-5zwfr UID: af4bdf63-edac-11e8-9e9c-0a580a810008 Spec: Cluster Service Class External Name: jenkins-persistent Cluster Service Class Ref: Name: c001f40b-eda3-11e8-8886-0050569122cd Cluster Service Plan External Name: default Cluster Service Plan Ref: Name: c001f40b-eda3-11e8-8886-0050569122cd External ID: af4bdebd-edac-11e8-9e9c-0a580a810008 Parameters From: Secret Key Ref: Key: parameters Name: jenkins-persistent-parametersljdmx Update Requests: 0 User Info: Groups: system:serviceaccounts system:serviceaccounts:kube-system system:authenticated UID: Username: system:serviceaccount:kube-system:namespace-controller Status: Async Op In Progress: true Conditions: Last Transition Time: 2018-11-21T16:44:16Z Message: Error polling last operation: Status: 500; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "af4bdebd-edac-11e8-9e9c-0a580a810008" not found; ResponseError: <nil> Reason: ErrorPollingLastOperation Status: False Type: Ready Current Operation: Provision Deprovision Status: Required In Progress Properties: Cluster Service Plan External ID: c001f40b-eda3-11e8-8886-0050569122cd Cluster Service Plan External Name: default Parameter Checksum: b327e98626e9dfb9fc9c944227ebe27f780e0e72e535c8df4ff272fd1072fd2a Parameters: DISABLE _ ADMINISTRATIVE _ MONITORS: <redacted> ENABLE _ FATAL _ ERROR _ LOG _ FILE: <redacted> ENABLE _ OAUTH: <redacted> JENKINS _ IMAGE _ STREAM _ TAG: <redacted> JENKINS _ SERVICE _ NAME: <redacted> JNLP _ SERVICE _ NAME: <redacted> MEMORY _ LIMIT: <redacted> NAMESPACE: <redacted> VOLUME _ CAPACITY: <redacted> User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: system:authenticated:oauth system:authenticated UID: Username: my Last Operation: provisioning Observed Generation: 1 Operation Start Time: 2018-11-22T08:26:22Z Orphan Mitigation In Progress: false Provision Status: NotProvisioned Reconciled Generation: 0 Events: <none> [root@ocpmaster1 ~]# [root@ocpmaster1 ~]# oc project my-jenkins2 Now using project "my-jenkins2" on server "https://ocp.example.com:443". [root@ocpmaster1 ~]# oc get serviceinstance NAME CLASS PLAN STATUS AGE jenkins-persistent-pq76s ClusterServiceClass/jenkins-persistent default Provisioning 29m [root@ocpmaster1 ~]# oc describe serviceinstance jenkins-persistent-pq76s Name: jenkins-persistent-pq76s Namespace: my-jenkins2 Labels: <none> Annotations: <none> API Version: servicecatalog.k8s.io/v1beta1 Kind: ServiceInstance Metadata: Creation Timestamp: 2018-11-22T09:02:18Z Deletion Grace Period Seconds: 0 Deletion Timestamp: 2018-11-22T09:32:09Z Finalizers: kubernetes-incubator/service-catalog Generate Name: jenkins-persistent- Generation: 2 Resource Version: 319521 Self Link: /apis/servicecatalog.k8s.io/v1beta1/namespaces/my-jenkins2/serviceinstances/jenkins-persistent-pq76s UID: 50b9da97-ee35-11e8-9e9c-0a580a810008 Spec: Cluster Service Class External Name: jenkins-persistent Cluster Service Class Ref: Name: c001f40b-eda3-11e8-8886-0050569122cd Cluster Service Plan External Name: default Cluster Service Plan Ref: Name: c001f40b-eda3-11e8-8886-0050569122cd External ID: 50b9d9d7-ee35-11e8-9e9c-0a580a810008 Parameters From: Secret Key Ref: Key: parameters Name: jenkins-persistent-parametersoic39 Update Requests: 0 User Info: Groups: system:serviceaccounts system:serviceaccounts:kube-system system:authenticated UID: Username: system:serviceaccount:kube-system:namespace-controller Status: Async Op In Progress: true Conditions: Last Transition Time: 2018-11-22T09:02:19Z Message: The instance is being provisioned asynchronously Reason: Provisioning Status: False Type: Ready Current Operation: Provision Deprovision Status: Required In Progress Properties: Cluster Service Plan External ID: c001f40b-eda3-11e8-8886-0050569122cd Cluster Service Plan External Name: default Parameter Checksum: b6719d10b1549e6544afd18b8fa1475c80a6c6e3a8083a97cf588bb8f6676d7d Parameters: DISABLE _ ADMINISTRATIVE _ MONITORS: <redacted> ENABLE _ FATAL _ ERROR _ LOG _ FILE: <redacted> ENABLE _ OAUTH: <redacted> JENKINS _ IMAGE _ STREAM _ TAG: <redacted> JENKINS _ SERVICE _ NAME: <redacted> JNLP _ SERVICE _ NAME: <redacted> MEMORY _ LIMIT: <redacted> NAMESPACE: <redacted> VOLUME _ CAPACITY: <redacted> User Info: Extra: Scopes . Authorization . Openshift . Io: user:full Groups: ak000899_app00602_mgmt ak000899_team system:authenticated:oauth system:authenticated UID: Username: my Last Operation: provisioning Observed Generation: 1 Operation Start Time: 2018-11-22T09:02:19Z Orphan Mitigation In Progress: false Provision Status: Reconciled Generation: 0 Events: <none> [root@ocpmaster1 ~]#
I looked at the other BZ mentioned, and I cannot agree with the conclusion from comment #9 that it isn't a bug. When a project is supposed to terminate and it doesn't, its definitely a bug when its blocked by this without any feedback. According to comment #10 I understand that it is working as designed, but the problem is also the reasonable expectation from the action which the user executes. And someone who deletes a project expects it to be deleted, all resources in it also. Not blocking. And its the lack of any feedback or possibility to actually execute the delete that merits it as a bug. --- But I'll gladly try and figure out the underlying reason why its happening, so if you want me to dig into some information please give me some pointers and i'll look for it.
This is appearing over and over again in several clusters that I cross paths with. I still have no 'fool proof' way to recreate it, but it happens every now and then.