Bug 1577076

Summary: Should be successfully deprovision the instance which returned an orphaned resource
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: Service CatalogAssignee: Jay Boyd <jaboyd>
Status: CLOSED CURRENTRELEASE QA Contact: Jian Zhang <jiazha>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: chezhang, jaboyd, pmorie, zitang
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: If Service Catalog hits a non-terminal error during provisioning (for example if the request to the broker times out or the broker returns a 500 series error) it would immediately retry provisioning (potentially sending a request to delete the instance first to ensure its not in some invalid state). There was no pause between these operations and it would repeat until the instance was successfully provisioned by the broker or the OpenShift service instance was deleted by the admin. Consequence: Service Catalog would continuously retry provisioning with no backoff. This may cause high CPU on both the Service Catalog controller pod and the Broker. Fix: Service Catalog will do an exponential backoff between retries. The maximum delay is 20 minutes before retrying.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-21 15:16:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jian Zhang 2018-05-11 07:23:07 UTC
Description of problem:
Could NOT deprovision the instance that an orphaned resource and the instance always stay in the "Pending" status. And, its log is repeating all the time.

Version-Release number of selected component (if applicable):
Service catalog version: v3.10.0-0.38.0;Upstream:v0.1.16

How reproducible:
always

Steps to Reproduce:
1. create the ups-broker, as below:
#oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/svc-catalog/ups-broker-deploy.yaml
#oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/svc-catalog/ups-broker-svc.yaml
#oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/svc-catalog/ups-broker-3.7.yam

2. Modify the deployment of the ups-broker so that it can return an orphaned resource, such as `205`.
...
      containers:
      - args:
        - -alsologtostderr
        - --port
        - "8080"
        - --provision
        - "205"
        image: docker.io/zjianbjz/user-broker:latest

About the source modification of the above ups-broker, please see here: https://github.com/jianzhangbjz/service-catalog/commit/1e4cc3def02c8c9af22ec1f1e53e73a869ba703a

3. Provision the "user-provided-service" instance on Web console, watching the logs of the ups-brokers.

Actual results:
This instance always stays in the "Pending" status, could NOT deprovision success.

Logs:
[root@qe-jiazha-10master-etcd-1 ~]# oc logs -f ups-broker-c7994d88f-wcqzq
...
I0511 06:46:30.005984       1 server.go:138] CreateServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:30.006140       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0511 06:46:30.006175       1 controller.go:81] CreateServiceInstance()
I0511 06:46:30.006188       1 controller.go:112] Created User Provided Service Instance:
&{e7c0e0ef-54e4-11e8-8eee-0a580a800003 0xc420086b20}
I0511 06:46:30.202213       1 server.go:177] RemoveServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:30.202259       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0511 06:46:30.202269       1 controller.go:132] RemoveServiceInstance()
I0511 06:46:30.404632       1 server.go:177] RemoveServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:30.404663       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0511 06:46:30.404673       1 controller.go:132] RemoveServiceInstance()
I0511 06:46:31.005264       1 server.go:138] CreateServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:31.005376       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0511 06:46:31.005387       1 controller.go:81] CreateServiceInstance()
I0511 06:46:31.005396       1 controller.go:112] Created User Provided Service Instance:
&{e7c0e0ef-54e4-11e8-8eee-0a580a800003 0xc420150c88}
I0511 06:46:31.207235       1 server.go:138] CreateServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:31.207353       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0511 06:46:31.207371       1 controller.go:81] CreateServiceInstance()
I0511 06:46:31.207386       1 controller.go:112] Created User Provided Service Instance:
&{e7c0e0ef-54e4-11e8-8eee-0a580a800003 0xc42000ed40}
I0511 06:46:31.408540       1 server.go:177] RemoveServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:31.408568       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0511 06:46:31.408580       1 controller.go:132] RemoveServiceInstance()
I0511 06:46:31.604866       1 server.go:177] RemoveServiceInstance e7c0e0ef-54e4-11e8-8eee-0a580a800003...
W0511 06:46:31.604960       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0511 06:46:31.604975       1 controller.go:132] RemoveServiceInstance()
...

Expected results:
Deprovision the instance success

Additional info:

# oc describe serviceinstance user-provided-service-q59mt -n jian
...
Events:
  Type     Reason                            Age                  From                                Message
  ----     ------                            ----                 ----                                -------
  Warning  StartingInstanceOrphanMitigation  24m (x970 over 34m)  service-catalog-controller-manager  The instance provision call failed with an ambiguous error; attempting to deprovision the instance in order to mitigate an orphaned resource
  Normal   OrphanMitigationSuccessful        14m (x941 over 34m)  service-catalog-controller-manager  Orphan mitigation was completed successfully
  Warning  ProvisionCallFailed               4m (x2758 over 34m)  service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName: "user-provided-service") at ClusterServiceBroker "ups-broker": Status: 205; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>

Comment 1 Jay Boyd 2018-05-11 16:39:27 UTC
reported upstream as https://github.com/kubernetes-incubator/service-catalog/issues/2025

The issue is the Catalog is in a tight loop attempting to provision the instance, hitting a failure error from the broker, deprovisions the instance, and then immediately catalog attempts to provision the instance again.

Issue 2025 is asking for an exponential backoff and retry which I think is the correct behavior.

Comment 2 Jian Zhang 2018-05-14 05:36:31 UTC
Jay,

Yes, I agree with you. The correct behavior should be like this: `(start provisioning -> fail, requires orphan mitigation -> perform orphan mitigation ->  exponential delay) x maxRetries`.

Comment 3 Jay Boyd 2018-08-21 15:14:22 UTC
This is now in place service instance provisioning and updates.  Only for service instances.  This is a temporary work around, there are still issues with properly backing off and retrying for other operations, see https://github.com/kubernetes-incubator/service-catalog/issues/2076 for further details.

fixed by https://github.com/kubernetes-incubator/service-catalog/pull/2065

For verification, you need a build of Service Catalog with v0.1.29 or greater.  This was merged into OpenShift on August 20.

Comment 4 Jay Boyd 2018-08-21 15:15:32 UTC
regarding comment 3 - this is fixed in 3.11, not yet back ported to 3.10.z

Comment 5 Jian Zhang 2018-08-27 07:29:33 UTC
Jay,

I test it with the latest version: v0.1.29 on OCP 3.11 but still got the same errors.

[root@sharer4-master-etcd-1 ~]# oc get serviceinstance -n jian
...
Events:
  Type     Reason                            Age              From                                Message
  ----     ------                            ----             ----                                -------
  Warning  RetryBackoff                      2m               service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-08-27 07:20:33.919087076 +0000 UTC m=+15392.451927528
  Warning  RetryBackoff                      2m (x2 over 2m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-08-27 07:20:36.281338712 +0000 UTC m=+15394.814179159
  Warning  RetryBackoff                      2m (x2 over 2m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-08-27 07:20:40.364137474 +0000 UTC m=+15398.896977916
  Warning  RetryBackoff                      2m (x2 over 2m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-08-27 07:20:48.565192547 +0000 UTC m=+15407.098032980
  Warning  StartingInstanceOrphanMitigation  2m (x5 over 2m)  service-catalog-controller-manager  The instance provision call failed with an ambiguous error; attempting to deprovision the instance in order to mitigate an orphaned resource
  Normal   OrphanMitigationSuccessful        2m (x5 over 2m)  service-catalog-controller-manager  Orphan mitigation was completed successfully
  Warning  RetryBackoff                      2m (x2 over 2m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-08-27 07:21:04.759359755 +0000 UTC m=+15423.292200224
  Warning  ProvisionCallFailed               1m (x6 over 2m)  service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName: "user-provided-service") at ClusterServiceBroker "ups-broker": Status: 205; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
[root@sharer4-master-etcd-1 ~]# oc get pods -n jian
No resources found.

[root@sharer4-master-etcd-1 ~]# oc logs -f ups-broker-6b69ddfbb8-fwxd5
I0827 06:42:08.089938       1 server.go:95] Starting server on %!d(string=:8080)
I0827 06:44:16.780775       1 server.go:112] Get Service Broker Catalog...
I0827 06:44:16.781254       1 controller.go:57] Catalog()
...
I0827 07:20:32.799704       1 server.go:138] CreateServiceInstance ac9cfc99-a9c9-11e8-a183-0a580a020004...
W0827 07:20:32.801562       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0827 07:20:32.803770       1 controller.go:81] CreateServiceInstance()
I0827 07:20:32.803979       1 controller.go:112] Created User Provided Service Instance:
&{ac9cfc99-a9c9-11e8-a183-0a580a020004 0xc42000e5d8}
I0827 07:20:32.932481       1 server.go:177] RemoveServiceInstance ac9cfc99-a9c9-11e8-a183-0a580a020004...
W0827 07:20:32.945228       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0827 07:20:32.945804       1 controller.go:132] RemoveServiceInstance()
I0827 07:20:34.238594       1 server.go:138] CreateServiceInstance ac9cfc99-a9c9-11e8-a183-0a580a020004...
W0827 07:20:34.239318       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0827 07:20:34.239531       1 controller.go:81] CreateServiceInstance()
I0827 07:20:34.239670       1 controller.go:112] Created User Provided Service Instance:
&{ac9cfc99-a9c9-11e8-a183-0a580a020004 0xc420094488}
I0827 07:20:34.315865       1 server.go:177] RemoveServiceInstance ac9cfc99-a9c9-11e8-a183-0a580a020004...
W0827 07:20:34.316126       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0827 07:20:34.316160       1 controller.go:132] RemoveServiceInstance()
I0827 07:20:36.321635       1 server.go:138] CreateServiceInstance ac9cfc99-a9c9-11e8-a183-0a580a020004...
W0827 07:20:36.322405       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0827 07:20:36.322638       1 controller.go:81] CreateServiceInstance()
...
And, this instance always stays in the "Pending" status, could NOT deprovision success.

Comment 6 Jay Boyd 2018-08-27 14:24:15 UTC
From the excerpt above it looks like the backoff is working properly.  It's doing a retry after an exponentially increasing backoff.  It's a 20 minute maximum backoff and will keep retrying indefinitely once every 20 minutes.

You said it can not deprovision - can you please provide Service Catalog controller logs and the full output from the describe serviceinstance after you attempt to delete it?

Thanks!

Comment 7 Jian Zhang 2018-09-26 07:07:30 UTC
Jay

Sorry for the late to reply. The serviceinstance can be deleted successfully by manual running the Deprovision command.

And, the output from the describe serviceinstance looks good.
  Warning  RetryBackoff                      1m (x2 over 1m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 06:50:27.005226809 +0000 UTC m=+17950.124172904
  Warning  RetryBackoff                      1m (x2 over 1m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 06:50:29.171831166 +0000 UTC m=+17952.290777279
  Warning  RetryBackoff                      1m (x2 over 1m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 06:50:33.373362083 +0000 UTC m=+17956.492308194
  Warning  RetryBackoff                      1m (x2 over 1m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 06:50:41.610866126 +0000 UTC m=+17964.729812273
  Warning  StartingInstanceOrphanMitigation  1m (x5 over 1m)  service-catalog-controller-manager  The instance provision call failed with an ambiguous error; attempting to deprovision the instance in order to mitigate an orphaned resource
  Normal   OrphanMitigationSuccessful        1m (x5 over 1m)  service-catalog-controller-manager  Orphan mitigation was completed successfully
  Warning  ProvisionCallFailed               1m (x5 over 1m)  service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName: "user-provided-service") at ClusterServiceBroker "ups-broker": Status: 205; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
  Warning  RetryBackoff                      1m (x2 over 1m)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 06:50:57.82222646 +0000 UTC m=+17980.941172566

But, the output from the broker pods always pop the "RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...", is this as expected?
[root@qe-jiazha-master-etcd-1 ~]# oc logs -f ups-broker-6b69ddfbb8-2cwg4 -n kube-service-catalog
I0926 06:47:30.850742       1 server.go:95] Starting server on %!d(string=:8080)
I0926 06:48:06.332782       1 server.go:112] Get Service Broker Catalog...
I0926 06:48:06.332811       1 controller.go:57] Catalog()
I0926 06:50:25.988339       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:25.988548       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:25.988790       1 controller.go:81] CreateServiceInstance()
I0926 06:50:25.988804       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc42000e1a8}
I0926 06:50:26.148178       1 server.go:177] RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:26.148204       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0926 06:50:26.148216       1 controller.go:132] RemoveServiceInstance()
I0926 06:50:27.152162       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:27.152298       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:27.152317       1 controller.go:81] CreateServiceInstance()
I0926 06:50:27.152328       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc4200fe280}
I0926 06:50:27.347895       1 server.go:177] RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:27.347919       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0926 06:50:27.347931       1 controller.go:132] RemoveServiceInstance()
I0926 06:50:29.350544       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:29.350680       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:29.350725       1 controller.go:81] CreateServiceInstance()
I0926 06:50:29.350740       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc42000e228}
I0926 06:50:29.548142       1 server.go:177] RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:29.548170       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0926 06:50:29.548182       1 controller.go:132] RemoveServiceInstance()
I0926 06:50:33.577449       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:33.577658       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:33.577700       1 controller.go:81] CreateServiceInstance()
I0926 06:50:33.577711       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc42000e238}
I0926 06:50:33.749339       1 server.go:177] RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:33.749366       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0926 06:50:33.749378       1 controller.go:132] RemoveServiceInstance()
I0926 06:50:41.785505       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:41.785749       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:41.785784       1 controller.go:81] CreateServiceInstance()
I0926 06:50:41.785796       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc4200fe2a0}
I0926 06:50:41.949806       1 server.go:177] RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:41.949982       1 server.go:180] (ignore)removeServiceInstance operation, the fake status code is: 200
I0926 06:50:41.950039       1 controller.go:132] RemoveServiceInstance()
I0926 06:50:57.978349       1 server.go:138] CreateServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b...
W0926 06:50:57.978466       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 205
I0926 06:50:57.978479       1 controller.go:81] CreateServiceInstance()
I0926 06:50:57.978489       1 controller.go:112] Created User Provided Service Instance:
&{72629cc8-c158-11e8-9145-0a580a80000b 0xc4200fe2c0}

Comment 8 Jian Zhang 2018-09-26 07:31:05 UTC
One more thing, the orphan mitigation status of the returned code 408 is " NotRequired", is it correct? Per doc: https://github.com/openservicebrokerapi/servicebroker/blob/v2.12/spec.md#orphans, the platform should run the orphan mitigation. COrrect me if I'm wrong.

The version info:
sh-4.2$ service-catalog --version
v3.11.15;Upstream:v0.1.31

# oc logs -f ups-broker-587776d4f9-fw7c8
...
I0926 07:18:51.286501       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:18:51.286627       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:18:51.286641       1 controller.go:81] CreateServiceInstance()
I0926 07:18:51.286653       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42000e1d8}
I0926 07:18:52.459793       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:18:52.459890       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:18:52.459903       1 controller.go:81] CreateServiceInstance()
I0926 07:18:52.459914       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42000e1e8}
I0926 07:18:54.659717       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:18:54.659852       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:18:54.659866       1 controller.go:81] CreateServiceInstance()
I0926 07:18:54.659877       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42000e1f8}
I0926 07:18:58.886426       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:18:58.886549       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:18:58.886562       1 controller.go:81] CreateServiceInstance()
I0926 07:18:58.886574       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42000e208}
I0926 07:19:07.089188       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:19:07.089279       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:19:07.089296       1 controller.go:81] CreateServiceInstance()
I0926 07:19:07.089305       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42007a4a0}
I0926 07:19:23.288389       1 server.go:138] CreateServiceInstance 6ac1cd63-c15c-11e8-9145-0a580a80000b...
W0926 07:19:23.288492       1 server.go:155] (ignore)createServiceInstance operation, the fake status code is: 408
I0926 07:19:23.288506       1 controller.go:81] CreateServiceInstance()
I0926 07:19:23.288518       1 controller.go:112] Created User Provided Service Instance:
&{6ac1cd63-c15c-11e8-9145-0a580a80000b 0xc42007a4b8}


[root@qe-jiazha-master-etcd-1 ~]# oc get serviceinstance -n jian
NAME                          CLASS                                       PLAN      STATUS                AGE
user-provided-service-pwf6l   ClusterServiceClass/user-provided-service   default   ProvisionCallFailed   1m
[root@qe-jiazha-master-etcd-1 ~]# oc describe serviceinstance user-provided-service-pwf6l -n jian
Name:         user-provided-service-pwf6l
Namespace:    jian
Labels:       <none>
Annotations:  <none>
API Version:  servicecatalog.k8s.io/v1beta1
Kind:         ServiceInstance
Metadata:
  Creation Timestamp:  2018-09-26T07:18:50Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:     user-provided-service-
  Generation:        1
  Resource Version:  51665
  Self Link:         /apis/servicecatalog.k8s.io/v1beta1/namespaces/jian/serviceinstances/user-provided-service-pwf6l
  UID:               6ac1cdf5-c15c-11e8-9145-0a580a80000b
Spec:
  Cluster Service Class External Name:  user-provided-service
  Cluster Service Class Ref:
    Name:                              4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468
  Cluster Service Plan External Name:  default
  Cluster Service Plan Ref:
    Name:           86064792-7ea2-467b-af93-ac9694d96d52
  External ID:      6ac1cd63-c15c-11e8-9145-0a580a80000b
  Update Requests:  0
  User Info:
    Extra:
      Scopes . Authorization . Openshift . Io:
        user:full
    Groups:
      system:authenticated:oauth
      system:authenticated
    UID:       
    Username:  jiazha
Status:
  Async Op In Progress:  false
  Conditions:
    Last Transition Time:  2018-09-26T07:18:51Z
    Message:               Error provisioning ServiceInstance of ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName: "user-provided-service") at ClusterServiceBroker "ups-broker": Status: 408; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
    Reason:                ProvisionCallFailed
    Status:                False
    Type:                  Ready
  Current Operation:       Provision
  Deprovision Status:      NotRequired
  In Progress Properties:
    Cluster Service Plan External ID:    86064792-7ea2-467b-af93-ac9694d96d52
    Cluster Service Plan External Name:  default
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:                        
      Username:                   jiazha
  Observed Generation:            1
  Operation Start Time:           2018-09-26T07:18:51Z
  Orphan Mitigation In Progress:  false
  Provision Status:               
  Reconciled Generation:          0
Events:
  Type     Reason               Age                From                                Message
  ----     ------               ----               ----                                -------
  Warning  RetryBackoff         1m (x2 over 1m)    service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:18:52.302415558 +0000 UTC m=+19655.421361658
  Warning  RetryBackoff         1m (x2 over 1m)    service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:18:54.47884747 +0000 UTC m=+19657.597793563
  Warning  RetryBackoff         1m (x2 over 1m)    service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:18:58.676695836 +0000 UTC m=+19661.795641932
  Warning  RetryBackoff         1m (x2 over 1m)    service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:19:06.905254366 +0000 UTC m=+19670.024200470
  Warning  RetryBackoff         1m (x2 over 1m)    service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:19:23.107375213 +0000 UTC m=+19686.226321316
  Warning  RetryBackoff         44s (x2 over 44s)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:19:55.317066994 +0000 UTC m=+19718.436013105
  Warning  ProvisionCallFailed  12s (x7 over 1m)   service-catalog-controller-manager  Error provisioning ServiceInstance of ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName: "user-provided-service") at ClusterServiceBroker "ups-broker": Status: 408; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>
  Warning  RetryBackoff         12s (x2 over 12s)  service-catalog-controller-manager  Delaying provision retry, next attempt will be after 2018-09-26 07:20:59.49015212 +0000 UTC m=+19782.609098228

Comment 9 Jay Boyd 2018-09-26 13:37:47 UTC
Regarding comment #7:  this behavior looks good, exactly what I was hoping to see.  When the Controller attempts to create the instance, the broker is returning a status 205.  The Controller interprets this as an error, this causes the controller to do orphan mitigation, so in the broker you see the "RemoveServiceInstance 72629cc8-c158-11e8-9145-0a580a80000b..." immediately following.   After orphan mitigation, we see an exponential backoff and then a retry, and this repeats over and over.  This all looks good to me, working as we would expect.  Additionally, if you delete the K8s ServiceInstance while it's doing a backoff (waiting/sleeping before the next attempt), the K8s instance is successfully deleted and this ends the retries.



In comment #8 you are referencing version 2.12 of the the OSB API spec here.  If you take a look at master:  https://github.com/openservicebrokerapi/servicebroker/blob/master/spec.md  and search on 408 you will see the spec was updated - - at one point (version 2.12) a 408 was supposed to do orphan mitigation, but in 2.13 that was changed.  The correct behavior as specified in the latest spec is to NOT do orphan mitigation for any 400 series status code.   PR https://github.com/openservicebrokerapi/servicebroker/pull/456  contains the change for that.


Thanks for all the details in comments 7 & 8 - - with this information, I'm going to update this as fixed.

Comment 10 Jian Zhang 2018-09-27 05:31:53 UTC
Jay,

Thanks a lot for your clarification! it works on version Upstream:v0.1.31, 3.11, LGTM, verify it.

Comment 11 Luke Meyer 2018-12-21 15:16:43 UTC
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.