Bug 1554141 - Unable to delete serviceinstance
Summary: Unable to delete serviceinstance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Broker
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.0
Assignee: Ben Parees
QA Contact: Wenjing Zheng
URL:
Whiteboard:
: 1555053 1567047 (view as bug list)
Depends On:
Blocks: 1563491
TreeView+ depends on / blocked
 
Reported: 2018-03-11 20:57 UTC by Eric Paris
Modified: 2021-09-09 13:23 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When deleting a namespace, the objects within the namespace are deleted by the namespace controller, not the user. Consequence: Service bindings, when deleted, get unbound via an unbind request associated with the user doing the deletion. This leads to an unbind request coming from the namespace controller, which did not have all permissions required to perform an unbind. Fix: Change what permissions are required for unbind to align them w/ the permissions the namespace controller has. Result: The unbind triggered by the namespacecontroller deleting the binding will succeed/be permitted.
Clone Of:
: 1563491 (view as bug list)
Environment:
Last Closed: 2018-06-27 18:01:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
oc get servicebindings -n demo-project -o yaml (2.25 KB, text/plain)
2018-03-11 20:57 UTC, Eric Paris
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2013 0 None None None 2018-06-27 18:02:09 UTC

Description Eric Paris 2018-03-11 20:57:13 UTC
Created attachment 1406979 [details]
oc get servicebindings -n demo-project -o yaml

I'm using v3.9 images for everything:

docker.io/openshift/origin-service-catalog:v3.9:
docker.io/openshift/origin-service-catalog@sha256:8fcc079089663c5f31a9764a4ad6d6cafe7e577dcf85d5c01840523b8e767b67

docker.io/openshift/origin-template-service-broker:v3.9:
docker.io/openshift/origin-template-service-broker@sha256:7cce2960621d567161f5e72421b3d11972b0975683b042900fd8d74a5b532f8b

I used the webui to create (non-ephemeral) jenkins and told it to create a secret/binding. I played around with jenkins, then I delete the namespace. The namespace won't delete. I look at the servicebinding and see:

'Error unbinding from ServiceInstance "demo-project/jenkins-persistent-wdbkx"
        of ClusterServiceClass (K8S: "93cfef20-2480-11e8-b305-0adf9c2d8066" ExternalName:
        "jenkins-persistent") at ClusterServiceBroker "template-service-broker": Status:
        403; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io
        "f6fa12e8-a5e1-4a20-be3f-085ff85d6177" is forbidden: User "system:serviceaccount:kube-system:namespace-controller"
        cannot update templateinstances.template.openshift.io in project "demo-project";
        ResponseError: <nil>'


Now I can't delete the serviceinstance at all...

Comment 1 Zhang Cheng 2018-03-12 03:02:51 UTC
I think the reporter is not using a downstream apb registry, and the instance created by jenkins apb that is not our publish apb images in ocp3.9.

Comment 2 Zhang Cheng 2018-03-12 03:08:27 UTC
BTW, I logged a doc bug for service-catalog resources clean up https://bugzilla.redhat.com/show_bug.cgi?id=1548618

As a workaround for this issue, you can fix a resource in this state by editing it and deleting the finalizer token.

apiVersion: servicecatalog.k8s.io/xxx
kind: ServiceInstance
metadata:
  creationTimestamp: null
  deletionGracePeriodSeconds: 0
  deletionTimestamp: null
  finalizers:
  - kubernetes-incubator/service-catalog
  generateName: xxx
  generation: 2

to be like so:

apiVersion: servicecatalog.k8s.io/xxx
kind: ServiceInstance
metadata:
  creationTimestamp: null
  deletionGracePeriodSeconds: 0
  deletionTimestamp: null
  generateName: xxx
  generation: 2

After this, the resource should be deleted immediately. This should be a viable stopgap measure to unblock people experiencing this issue.

Comment 3 Jessica Forrester 2018-03-12 12:01:28 UTC
@chezhang - from eparis's description on the bug I can tell this was not created using the ASB and APBs, this was done using the Template Service Broker and the Jenkins ephemeral template.

The error seems to be coming from the TSB not having permissions to delete template instances.

@eparis - how did you install this cluster?

Comment 4 John Matthews 2018-03-12 13:30:17 UTC
Moving over to Ben as this looks to be a template broker issue with 3.9

Comment 5 Jessica Forrester 2018-03-12 15:42:08 UTC
eparis confirmed this was an ansible install

Comment 6 Eric Paris 2018-03-12 16:16:11 UTC
@zhang is thanks you are mostly correct. In this case it was the servicebinding, not the service instance where I had to delete the finalizer. But doing so caused things to clean up.

Comment 7 Ben Parees 2018-03-12 16:36:23 UTC
To summarize the theory from the irc discussion:

1) namespace is deleted, finalizer( kubernetes-incubator/service-catalog) is invoked
2) finalizer attempts to unbind the service instance, it apparently does so using the system:serviceaccount:kube-system:namespace-controller user
3) the TSB rejects the unbind request because the system:serviceaccount:kube-system:namespace-controller user does not have permission to update templateinstances (which is a necessary step in performing an unbind)

imho the TSB is doing the right thing here.  If you want to unbind something, you need to do so with a user that has appropriate permissions to perform an unbind.

Comment 8 Ben Parees 2018-03-12 16:40:19 UTC
Possible solutions:

1) give the finalizer account (system:serviceaccount:kube-system:namespace-controller?) permission to update templateinstances
2) find a way to make the finalizer run as a different user (or at least invoke the unbind as a different user - namely the user who deleted the namespace)
3) update the TSB to allow unbinds to occur w/o confirming the user has templateinstance update permissions

(2) is the right answer imho, because this won't necessarily be the last time we hit an issue like this.  We're essentially saying "system:serviceaccount:kube-system:namespace-controller" should be able to invoke any broker's deprovision and unbind apis, successfully.  That means it needs to have any permission those apis require, which could be different for any given broker, or change over time.

Comment 9 Ben Parees 2018-03-12 16:42:10 UTC
Also i'm unable to recreate this behavior using a v3.9 cluster using oc cluster up, though i do see the finalizer reference on both the serviceinstance and the servicebinding, so it would be good to understand if there is either a possible race condition here, or something has changed in the finalizer behavior (either the service catalog's specific finalizer implementation, or the openshift finalizer framework).

Comment 10 Eric Paris 2018-03-12 16:47:03 UTC
After my demo on wednesday I can give the cluster where this is happening.

I'm trying to get another cluster running, if I do, I'll see if I can recreate it there.

Comment 11 Ben Parees 2018-03-12 16:55:10 UTC
unless someone disagrees w/ my summary of the current theory in comment 8, i don't know that we need an env for reproducing it, though I am curious why i'm not seeing it in my cluster.

Comment 12 Ben Parees 2018-03-12 17:13:01 UTC
correct, I guess I can recreate this w/ cluster up.  You don't get an error from "oc delete project" but the project is not actually deleted.

Comment 13 Ben Parees 2018-03-12 17:48:00 UTC
So the reason the unbind is being done w/ the namespace-controller user is that the servicebinding object was updated to have the namespace-controller value in the ServiceBinding.spec.userInfo field:

$ oc get servicebindings -o yaml
apiVersion: v1
items:
- apiVersion: servicecatalog.k8s.io/v1beta1
  kind: ServiceBinding
  metadata:
    creationTimestamp: 2018-03-12T17:05:59Z
    deletionGracePeriodSeconds: 0
    deletionTimestamp: 2018-03-12T17:25:53Z
    finalizers:
    - kubernetes-incubator/service-catalog
    generateName: jenkins-persistent-fhqt9-
    generation: 2
    name: jenkins-persistent-fhqt9-ctcdv
    namespace: j1
    resourceVersion: "1228"
    selfLink: /apis/servicecatalog.k8s.io/v1beta1/namespaces/j1/servicebindings/jenkins-persistent-fhqt9-ctcdv
    uid: a30da6ca-2617-11e8-9c63-0242ac110002
  spec:
    externalID: 40554559-f241-4dff-bcaa-dfa4497d4eb3
    instanceRef:
      name: jenkins-persistent-fhqt9
    secretName: jenkins-persistent-fhqt9-credentials-8b8sn
    userInfo:
      groups:
      - system:serviceaccounts
      - system:serviceaccounts:kube-system
      - system:authenticated
      uid: ""
      username: system:serviceaccount:kube-system:namespace-controller


Clayton thinks this is because there is an update performed before the delete, and the update is performed by the namespace-controller and the service catalog is setting the userinfo field to the identity of whomever did the update.

This ultimately leads to the SC performing an unbind call with the binding.spec.userinfo value.

So this is an issue w/ the SC, it needs to somehow protect the userinfo field during updates.

Comment 14 Ben Parees 2018-03-12 17:52:50 UTC
to recreate:

1) do a provision/bind
2) confirm the servicebinding userinfo has the info of the user who did the bind
3) delete the namespace (it won't actually go away)
4) retrieve the binding, see that userinfo now contains the id of the namespace-controller, not the user from (1).

Comment 15 Ben Parees 2018-03-12 19:28:11 UTC
fix/workaround just for the TSB (other brokers could have this problem too):
https://github.com/openshift/origin/pull/18948

Comment 16 Ben Parees 2018-03-12 20:21:31 UTC
taking this back to do the fix/workaround in the TSB (comment 15) but it sounds like ASB can run into this also.

Comment 17 openshift-github-bot 2018-03-12 21:31:50 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/b296e3d62ad6f61dc6e7e97ee6a72739c0856e92
require templateinstance delete, not update, on unbind

bug 1554141

https://bugzilla.redhat.com/show_bug.cgi?id=1554141

Comment 18 Wenjing Zheng 2018-03-13 06:26:48 UTC
I can reproduce this is both v3.7.23 and v3.9.4 with below steps:
1. Login cluster with cluster-admin user,
2. In web UI, provision/bind jenkins-persistent template,
3. Delete the project
4. Describe servicebing, below message will appear:
Message:               Error unbinding from ServiceInstance "lasttry/jenkins-persistent-kmb58" of ClusterServiceClass (K8S: "5d2ccbdd-2599-11e8-bb2d-fa163e154d74" ExternalName: "jenkins-persistent") at ClusterServiceBroker "template-service-broker": Status: 403; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "3598d5ef-1ea3-47d8-ad8d-ec0bf068380a" is forbidden: User "system:serviceaccount:kube-system:namespace-controller" cannot update templateinstances.template.openshift.io in project "lasttry"; ResponseError: <nil>

This is the error from v3.7.23 which is different from 3.9 with same steps:
Message:               Unbind call failed. Error unbinding from ServiceInstance "37tsb/jenkins-persistent-gwkfc" of ClusterServiceClass (K8S: "0b04c401-2669-11e8-9745-fa163ed402fb" ExternalName: "jenkins-persistent") at ClusterServiceBroker "template-service-broker": Status: 500; ErrorMessage: <nil>; Description: templateinstances.template.openshift.io "202a5160-0187-4830-961c-f2e927b5a95b" not found; ResponseError: <nil>

Comment 19 Wenjing Zheng 2018-03-13 06:55:26 UTC
Normal user can reproduce too, the project is not actually deleted(it can be seen being terminating as cluster-admin)

Comment 20 Zhang Cheng 2018-03-13 08:28:31 UTC
As Ben mentioned in Comment 16 and mail "However Shawn has noted that the ASB also appears to have a similar issue (as could any broker implementation)", do we need any changes in asb side?

Comment 21 Shawn Hurley 2018-03-13 12:47:37 UTC
The broker issue is being tracked here: https://bugzilla.redhat.com/process_bug.cgi

Comment 22 Jessica Forrester 2018-03-13 21:32:22 UTC
*** Bug 1555053 has been marked as a duplicate of this bug. ***

Comment 23 Zhang Cheng 2018-03-14 02:06:46 UTC
Shawn, 

Maybe you have typo in Comment 21, I cannot access the bug link.

Comment 24 Shawn Hurley 2018-03-14 02:10:45 UTC
This is the ASB issue being tracked: https://bugzilla.redhat.com/show_bug.cgi?id=1554239

Comment 26 Wenjing Zheng 2018-03-15 09:50:24 UTC
Verified with below version:
openshift v3.9.9
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

Comment 27 Jay Boyd 2018-04-13 15:50:28 UTC
*** Bug 1567047 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2018-06-27 18:01:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2013


Note You need to log in before you can comment on or make changes to this bug.