Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1551988

Summary: Serviceinstance keeps pending when deprovision templateinstance before serviceinstance goes to ready
Product: OpenShift Container Platform Reporter: XiuJuan Wang <xiuwang>
Component: Service BrokerAssignee: jkim
Status: CLOSED DUPLICATE QA Contact: XiuJuan Wang <xiuwang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, chezhang, jaboyd, jmatthew, maupadhy, pmorie, travi, wzheng
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-07 20:02:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description XiuJuan Wang 2018-03-06 09:56:49 UTC
Description of problem:
Serviceinstance always keeps pending status with "templateinstance not found" error 
In the first hour, service-catalog-controller-manager will re-check templateinstance status every 6 mins.After an hour, service-catalog-controller-manager increases checked interval, but serviceinstance still keeps pending.

The servicebinding also keeps pending due to "Binding cannot begin because referenced ServiceInstance "xiu1/jenkins-ephemeral-fjnrl" is not ready"


Version-Release number of selected component (if applicable):
penshift v3.9.2
kubernetes v1.9.1+a0ce1bc657

How reproducible:
always

Steps to Reproduce:
1.Provision a serviceclass from ocp webconsole
2.Create a binding
3.Deprovision templateinstance before serviceinstance ready
4.Check serviceinstance

Actual results:
Serviceinstance always keeps pending
oc describe  serviceinstance jenkins-ephemeral-fjnrl -n xiu1
<-----------------snip--------------->
Events:
  Type     Reason                               Age                From                                Message
  ----     ------                               ----               ----                                -------
  Warning  ErrorWithParameters                  1h                 service-catalog-controller-manager  failed to prepare parameters nil: secrets "jenkins-ephemeral-parametersqvvi0" not found
  Normal   Provisioning                         1h                 service-catalog-controller-manager  The instance is being provisioned asynchronously
  Warning  ProvisionCallFailed                  1h                 service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "0fc3a9d1-2103-11e8-8498-42010af00043" ExternalName: "jenkins-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 409; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
  Warning  ClusterServiceBrokerReturnedFailure  1h                 service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "0fc3a9d1-2103-11e8-8498-42010af00043" ExternalName: "jenkins-ephemeral") at ClusterServiceBroker "template-service-broker": Status: 409; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
  Warning  ErrorPollingLastOperation            19m (x10 over 1h)  service-catalog-controller-manager  Error polling last operation: Status: 500; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "abd4499c-8bbf-4f6c-83e6-3446a1e4936c" not found; ResponseError: <nil>


Expected results:
Serviceinstance should go to failed due to timeout

Additional info:

Comment 1 Jay Boyd 2018-04-03 17:11:23 UTC
I tried reproducing this in Origin 3.9 and then again in OSE 3.9.2.1 and could not.  Both times the instance was provisioned within several minutes and then a deprovision on the instance was attempted.  This failed because the binding ("All associated ServiceBindings must be removed before this ServiceInstance can be deleted").  The binding was in a ready=false state, "Binding cannot begin because referenced ServiceInstance "one/jenkins-persistent-zdrgk" is not ready".  After I deleted the binding, the instance was removed successfully.


Reviewing the original information in the description, this looks like a problem with the broker.  In the describe instance info, we see the provision, then two 409 messages (these are usually an indicate that the Service Catalog sent subsequent provision requests to the broker and the additional requests are ignored).  Then the broker starts returning error 500 in response to the last operation polls.  I believe these will be retried for a week before abandoning.  In this sense, I don't believe the reporter's Expected Results are correct in this time period.

Brokers should not be returning error 500 unless they were unable to handle the request.  I think we probably need to review the broker logs.

For this type of issue it would be helpful to collect the Service Catalog controller logs and the Broker logs and attach to the bug report.

Comment 2 Paul Morie 2018-04-04 18:56:46 UTC
I'm not sure I understand the premise of this issue - the problem is when you delete a resource created by the template broker before the service is fully provisioned?

Comment 3 XiuJuan Wang 2018-04-08 03:09:49 UTC
@Paul
Yes, I removed templateinstance before serviceinstance privisioned successfully.

The expected result is that serviceinstance should fail due to timeout since the templateinstance is not found, but not always keeping pending.


@Jay
I am not deprovisoning serviceinstance, just want the serviceinstance go to a expected status. 
When the serviceinstance keeps pending due to "templateinstances.template.openshift.io ***** not found", meantime the servicebinding also keeps pending due to "ServiceInstance **** is not ready". All these pending status should not go on so long time.

Comment 4 jkim 2018-11-07 20:02:37 UTC
This is a duplication of two bugs.

Provision failing with a 409 status error:
https://bugzilla.redhat.com/show_bug.cgi?id=1579194
This is due to the fact that the broker was not handling multiple provision request properly. This was fixed in v3.11

ServiceInstance stays around after a bad/failed provision:
https://bugzilla.redhat.com/show_bug.cgi?id=1623918
If a provision has not successfully completed and has not errored out (or the templateinstance was manually removed as described in the comments), the serviceinstance can only be deleted by removing the finalizer.

I will close this bug, but please feel free to re-open, if further discussion is required.

*** This bug has been marked as a duplicate of bug 1579194 ***