1507595 – Plan can't restore to the previous good state or update to another acceptable plan

Bug 1507595 - Plan can't restore to the previous good state or update to another acceptable plan

Summary: Plan can't restore to the previous good state or update to another acceptable...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Service Broker
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Jeff Peeler
QA Contact:	Zihan Tang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-30 16:27 UTC by Qixuan Wang
Modified:	2018-12-13 19:26 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	There were several problems related to updates: spec changes for instances were blocked even if there wasn't an on going operation, deleting a service instance that was updated to an invalid service plan would cause a crash, and instances weren't updated properly if a previous update had failed.
Clone Of:
Environment:
Last Closed:	2018-12-13 19:26:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3748	0	None	None	None	2018-12-13 19:26:58 UTC

Description Qixuan Wang 2017-10-30 16:27:22 UTC

Description of problem:

Do the following negative testing of update can cause controller-manager crash, and ServiceInstance can't break away from the bad plan. 

1) Update plan to an non-exist one, then rollback to the previous good state (e.g. dev->invalid->dev)
2) Update plan to an non-exist one, then update to another optional plan (e.g. dev-invalid->prod)
3) Downgrade plan (e.g. prod->dev->prod)



Version-Release number of selected component (if applicable):
openshift v3.7.0-0.184.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8
ose-service-catalog   v3.7.0-0.185.0.0
ose-ansible-service-broker   v3.7.0-0.185.0.0


How reproducible:
Always


Steps to Reproduce:
1. Provision a postgreSQL APB on web UI, choose development plan.
2. The ClusterServiceClass has Plan Updatable: true by default.
3. Edit the ServiceInstance, update plan from dev to an invalid one, check ServiceInstance status and Broker log.
4. Restore the invalid plan to dev, check ServiceInstance status and Broker log.
5. Deprovision ServiceInstance.


Actual results:
3. Plan: dev -> dev-abc
[root@qe-chezhang-1030master-etcd-1 ~]# oc edit serviceinstance dh-rhscl-postgresql-apb-b7gbq
serviceinstance "dh-rhscl-postgresql-apb-b7gbq" edited

[root@qe-chezhang-1030master-etcd-1 ~]# oc describe serviceinstance dh-rhscl-postgresql-apb-b7gbq
Name:		dh-rhscl-postgresql-apb-b7gbq
Namespace:	qwang4
Labels:		<none>
Annotations:	<none>
API Version:	servicecatalog.k8s.io/v1beta1
Kind:		ServiceInstance
Metadata:
  Creation Timestamp:	2017-10-30T14:38:22Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:	dh-rhscl-postgresql-apb-
  Generation:		2
  Resource Version:	92565
  Self Link:		/apis/servicecatalog.k8s.io/v1beta1/namespaces/qwang4/serviceinstances/dh-rhscl-postgresql-apb-b7gbq
  UID:			fa91d52c-bd7f-11e7-bc55-0a580a800004
Spec:
  Cluster Service Class External Name:	dh-rhscl-postgresql-apb
  Cluster Service Class Ref:
    Name:				27793015fe45db2fbc1deb7372cc4036
  Cluster Service Plan External Name:	dev-abc
  External ID:				6333e58b-33fc-4e4c-9670-85208a0c58b4
  Parameters From:
    Secret Key Ref:
      Key:		parameters
      Name:		dh-rhscl-postgresql-apb-parameterszx24a
  Update Requests:	0
  User Info:
    Groups:
      system:cluster-admins
      system:authenticated
    UID:	
    Username:	system:admin
Status:
  Async Op In Progress:	false
  Conditions:
    Last Transition Time:	2017-10-30T14:40:39Z
    Message:			The instance references a ClusterServicePlan that does not exist. References a non-existent ClusterServicePlan (K8S: "" ExternalName: "dev-abc") on ClusterServiceClass (K8S: "27793015fe45db2fbc1deb7372cc4036" ExternalName: "dh-rhscl-postgresql-apb") or there is more than one (found: 0)
    Reason:			ReferencesNonexistentServicePlan
    Status:			False
    Type:			Ready
  External Properties:
    Cluster Service Plan External Name:	dev
    Parameter Checksum:			f511137c0021f5169de49e662f0ec2830219a26e50968a57f2faa280408dfaa7
    Parameters:
      Postgresql _ Database:	<redacted>
      Postgresql _ User:	<redacted>
      Postgresql _ Version:	<redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:				
      Username:				qwang
  Orphan Mitigation In Progress:	false
  Reconciled Generation:		1
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason					Message
  ---------	--------	-----	----					-------------	--------	------					-------
  2m		2m		1	service-catalog-controller-manager			Warning		ErrorWithParameters			Failed to prepare ServiceInstance parameters nil: secrets "dh-rhscl-postgresql-apb-parameterszx24a" not found
  2m		2m		1	service-catalog-controller-manager			Normal		Provisioning				The instance is being provisioned asynchronously
  2m		2m		1	service-catalog-controller-manager			Normal		ProvisionedSuccessfully			The instance was provisioned successfully
  37s		18s		13	service-catalog-controller-manager			Warning		ReferencesNonexistentServicePlan	References a non-existent ClusterServicePlan (K8S: "" ExternalName: "dev-abc") on ClusterServiceClass (K8S: "27793015fe45db2fbc1deb7372cc4036" ExternalName: "dh-rhscl-postgresql-apb") or there is more than one (found: 0)


4. Plan: dev-abc -> dev
Forbidden to update, but the spec still be updated.
[root@qe-chezhang-1030master-etcd-1 ~]# oc edit serviceinstance dh-rhscl-postgresql-apb-b7gbq
error: serviceinstances "dh-rhscl-postgresql-apb-b7gbq" is invalid
A copy of your changes has been stored to "/tmp/oc-edit-5yy6g.yaml"
error: Edit cancelled, no valid changes were saved.


[root@qe-chezhang-1030master-etcd-1 ~]# oc describe serviceinstance dh-rhscl-postgresql-apb-b7gbq
Name:		dh-rhscl-postgresql-apb-b7gbq
Namespace:	qwang4
Labels:		<none>
Annotations:	<none>
API Version:	servicecatalog.k8s.io/v1beta1
Kind:		ServiceInstance
Metadata:
  Creation Timestamp:	2017-10-30T14:38:22Z
  Finalizers:
    kubernetes-incubator/service-catalog
  Generate Name:	dh-rhscl-postgresql-apb-
  Generation:		2
  Resource Version:	92565
  Self Link:		/apis/servicecatalog.k8s.io/v1beta1/namespaces/qwang4/serviceinstances/dh-rhscl-postgresql-apb-b7gbq
  UID:			fa91d52c-bd7f-11e7-bc55-0a580a800004
Spec:
  Cluster Service Class External Name:	dh-rhscl-postgresql-apb
  Cluster Service Class Ref:
    Name:				27793015fe45db2fbc1deb7372cc4036
  Cluster Service Plan External Name:	dev-abc
  External ID:				6333e58b-33fc-4e4c-9670-85208a0c58b4
  Parameters From:
    Secret Key Ref:
      Key:		parameters
      Name:		dh-rhscl-postgresql-apb-parameterszx24a
  Update Requests:	0
  User Info:
    Groups:
      system:cluster-admins
      system:authenticated
    UID:	
    Username:	system:admin
Status:
  Async Op In Progress:	false
  Conditions:
    Last Transition Time:	2017-10-30T14:40:39Z
    Message:			The instance references a ClusterServicePlan that does not exist. References a non-existent ClusterServicePlan (K8S: "" ExternalName: "dev-abc") on ClusterServiceClass (K8S: "27793015fe45db2fbc1deb7372cc4036" ExternalName: "dh-rhscl-postgresql-apb") or there is more than one (found: 0)
    Reason:			ReferencesNonexistentServicePlan
    Status:			False
    Type:			Ready
  External Properties:
    Cluster Service Plan External Name:	dev
    Parameter Checksum:			f511137c0021f5169de49e662f0ec2830219a26e50968a57f2faa280408dfaa7
    Parameters:
      Postgresql _ Database:	<redacted>
      Postgresql _ User:	<redacted>
      Postgresql _ Version:	<redacted>
    User Info:
      Extra:
        Scopes . Authorization . Openshift . Io:
          user:full
      Groups:
        system:authenticated:oauth
        system:authenticated
      UID:				
      Username:				qwang
  Orphan Mitigation In Progress:	false
  Reconciled Generation:		1
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason					Message
  ---------	--------	-----	----					-------------	--------	------					-------
  6m		6m		1	service-catalog-controller-manager			Warning		ErrorWithParameters			Failed to prepare ServiceInstance parameters nil: secrets "dh-rhscl-postgresql-apb-parameterszx24a" not found
  6m		6m		1	service-catalog-controller-manager			Normal		Provisioning				The instance is being provisioned asynchronously
  6m		6m		1	service-catalog-controller-manager			Normal		ProvisionedSuccessfully			The instance was provisioned successfully
  4m		1m		16	service-catalog-controller-manager			Warning		ReferencesNonexistentServicePlan	References a non-existent ClusterServicePlan (K8S: "" ExternalName: "dev-abc") on ClusterServiceClass (K8S: "27793015fe45db2fbc1deb7372cc4036" ExternalName: "dh-rhscl-postgresql-apb") or there is more than one (found: 0)


5. The project of ServiceInstance Hangs in Terminating. Controller-manager gets panic like this: 
E1030 15:52:42.323667       1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/builddir/build/BUILD/atomic-openshift-git-0.e975556/cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/builddir/build/BUILD/atomic-openshift-git-0.e975556/cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/builddir/build/BUILD/atomic-openshift-git-0.e975556/cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/lib/golang/src/runtime/asm_amd64.s:514
/usr/lib/golang/src/runtime/panic.go:489
/usr/lib/golang/src/runtime/panic.go:63
/usr/lib/golang/src/runtime/signal_unix.go:290
/builddir/build/BUILD/atomic-openshift-git-0.e975556/cmd/service-catalog/go/src/github.com/kubernetes-incubator/service-catalog/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller.go:269



Expected results:
3. It's better to guide users how to do a correct update (know the correct one doesn't mean do correctly, so limited operations/options is better I think).
4. A bad plan can be updated to a good one .
5. Controller-manager can't crash.


Additional info:

Comment 1 Matthew Staebler 2017-10-30 18:09:01 UTC

https://github.com/kubernetes-incubator/service-catalog/issues/1487 tracks the issue with updating a ServiceInstance after a failed update.

https://github.com/kubernetes-incubator/service-catalog/issues/1499 tracks controller-manager crashing when deleting a ServiceInstance with a plan name of a non-existent plan.

Comment 2 Paul Morie 2017-11-06 15:12:57 UTC

Upstream PRs:

https://github.com/kubernetes-incubator/service-catalog/pull/1501

https://github.com/kubernetes-incubator/service-catalog/pull/1502

Fixed in origin with: https://github.com/openshift/origin/pull/17166

Comment 4 Qixuan Wang 2017-11-07 11:30:20 UTC

Tested on OCP(openshift v3.7.0-0.196.0, kubernetes v1.7.6+a08f5eeb62, etcd 3.2.8, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-service-catalog:v3.7.0-0.196.0.0, brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.7.0-0.196.0.0)

The following 5,6 are not allowed to downgrade, so plan can rollback. That's correct. However, 2,4 are preventing restore from an invalid plan, is this expected?

1. [Edit] spec: dev->dev-123
[Describe] Message: The instance references a ClusterServicePlan that does not exist. spec: dev-123, status:dev

2. [Edit] spec: dev-123->dev/prod

# serviceinstances "rh-rhscl-postgresql-apb-mh5s2" was not valid:
# * spec: Forbidden: Another update for this service instance is in progress

[root@host-172-16-120-8 ~]# oc edit serviceinstance rh-rhscl-postgresql-apb-mh5s2
error: serviceinstances "rh-rhscl-postgresql-apb-mh5s2" is invalid
A copy of your changes has been stored to "/tmp/oc-edit-huczk.yaml"
error: Edit cancelled, no valid changes were saved.

[Describe] Message: The instance references a ClusterServicePlan that does not exist. spec: dev-123, status:dev

3. [Edit] spec: prod->prod-456
[Describe] Message: The instance references a ClusterServicePlan that does not exist. spec: prod-456, status:prod

4. [Edit] spec: prod-456->dev/prod
# serviceinstances "rh-rhscl-postgresql-apb-xq2ns" was not valid:
# * spec: Forbidden: Another update for this service instance is in progress

[root@host-172-16-120-8 ~]# oc edit serviceinstance rh-rhscl-postgresql-apb-xq2ns
error: serviceinstances "rh-rhscl-postgresql-apb-xq2ns" is invalid
A copy of your changes has been stored to "/tmp/oc-edit-zrq5w.yaml"
error: Edit cancelled, no valid changes were saved.


Downgrade and rollback
5. [Edit] spec: prod->dev
[Describe] Message: plan update not possible, spec:dev, status:prod

6. [Edit] spec: dev->prod
[Describe] Message: The instance is being updated asynchronously, spec:prod, status:prod

Comment 5 Matthew Staebler 2017-11-07 15:06:48 UTC

The failure of 2 and 4 is not expected. This bug was unfortunately not addressed completely. The failures are captured upstream in https://github.com/kubernetes-incubator/service-catalog/issues/1533.

Comment 8 Qixuan Wang 2018-01-15 09:54:30 UTC

Version-Release number of selected component (if applicable):
openshift v3.9.0-0.19.0
kubernetes v1.9.0-beta1
etcd 3.2.8
ose-ansible-service-broker:v3.9
ose-service-catalog:v3.9

Now we support plan rollback from a bad state (dev-123 -> dev, or prod456 -> prod) and downgrade (prod -> dev). I found plan can't be updated from an nonexistent one to another valid plan, for example:
1) dev-123 -> prod (x) -> dev (x)
2) prod-456 -> dev (x) -> prod (x)

Comment 9 Jeff Peeler 2018-01-22 17:32:07 UTC

I'm not finding the previous comment to be true with the latest code:

$ kubectl get serviceinstances -n test-ns -o yaml
apiVersion: v1
items:
- apiVersion: servicecatalog.k8s.io/v1beta1
  kind: ServiceInstance
  metadata:
    creationTimestamp: 2018-01-22T17:26:27Z
    finalizers:
    - kubernetes-incubator/service-catalog
    generation: 1
    name: ups-instance
    namespace: test-ns
    resourceVersion: "816"
    selfLink: /apis/servicecatalog.k8s.io/v1beta1/namespaces/test-ns/serviceinstances/ups-instance
    uid: 60b76eec-ff99-11e7-9b7f-0242ac110005
  spec:
    clusterServiceClassExternalName: user-provided-service
    clusterServicePlanExternalName: invalid-default
    externalID: 2542f01d-751b-45a5-ba5c-5d0986c42f08
    parameters:
      param-1: value-1
      param-2: value-2
    updateRequests: 0
  status:
    asyncOpInProgress: false
    conditions:
    - lastTransitionTime: 2018-01-22T17:26:27Z
      message: 'The instance references a ClusterServicePlan that does not exist.
        References a non-existent ClusterServicePlan (K8S: "" ExternalName: "invalid-default")
        on ClusterServiceClass (K8S: "4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468" ExternalName:
        "user-provided-service") or there is more than one (found: 0)'
      reason: ReferencesNonexistentServicePlan
      status: "False"
      type: Ready
    deprovisionStatus: NotRequired
    orphanMitigationInProgress: false
    reconciledGeneration: 0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Next edit to "default" plan.

$ kubectl get serviceinstances -n test-ns -o yaml                                                                        
apiVersion: v1
items:
- apiVersion: servicecatalog.k8s.io/v1beta1
  kind: ServiceInstance
  metadata:
    creationTimestamp: 2018-01-22T17:26:27Z
    finalizers:
    - kubernetes-incubator/service-catalog
    generation: 2
    name: ups-instance
    namespace: test-ns
    resourceVersion: "821"
    selfLink: /apis/servicecatalog.k8s.io/v1beta1/namespaces/test-ns/serviceinstances/ups-instance
    uid: 60b76eec-ff99-11e7-9b7f-0242ac110005
  spec:
    clusterServiceClassExternalName: user-provided-service
    clusterServiceClassRef:
      name: 4f6e6cf6-ffdd-425f-a2c7-3c9258ad2468
    clusterServicePlanExternalName: default
    clusterServicePlanRef:
      name: 86064792-7ea2-467b-af93-ac9694d96d52
    externalID: 2542f01d-751b-45a5-ba5c-5d0986c42f08
    parameters:
      param-1: value-1
      param-2: value-2
    updateRequests: 0
  status:
    asyncOpInProgress: false
    conditions:
    - lastTransitionTime: 2018-01-22T17:27:42Z
      message: The instance was provisioned successfully
      reason: ProvisionedSuccessfully
      status: "True"
      type: Ready
    deprovisionStatus: Required
    externalProperties:
      clusterServicePlanExternalID: 86064792-7ea2-467b-af93-ac9694d96d52
      clusterServicePlanExternalName: default
      parameterChecksum: 4fa544b50ca7a33fe5e8bc0780f1f36aa0c2c7098242db27bc8a3e21f4b4ab55
      parameters:
        param-1: value-1
        param-2: value-2
    orphanMitigationInProgress: false
    reconciledGeneration: 2
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Will look at confirming with openshift next.

Comment 13 Zihan Tang 2018-02-09 08:47:05 UTC

Verified using the latest downstream image.
openshift v3.9.0-0.41.0
kubernetes v1.9.1+a0ce1bc657

ASB : 1.1.9  ; 
     brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-ansible-service-broker:v3.9 
Service-catalog :  0.1.3
   brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-service-catalog:v3.9

update instance :
dev -> dev123 -> prod
prod -> prod123 ->dev 
This will succeed.

Comment 16 errata-xmlrpc 2018-12-13 19:26:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748

Note You need to log in before you can comment on or make changes to this bug.