Bug 1568607

Summary: template instance controller tries to use garbage collection across namespaces
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: TemplatesAssignee: Ben Parees <bparees>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, bparees, ccoleman, jliggitt, jmatthew, jokerman, mmccomas, xiuwang
Target Milestone: ---Flags: xiuwang: needinfo-
xiuwang: needinfo-
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:13:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
crd related resources
none
FinalizerError none

Description Ben Parees 2018-04-17 22:36:07 UTC
The template instance controller creates ownerrefs from the objects being created, to the templateinstance object.

Since the objects being created can:

1) be cluster scoped resources
2) be created in a different namespace from where the template instance was created

this results in objects that have dangling owner references.

for case (1) it's unclear what the GC will do with the object
for case (2) the object will likely be GCed as soon as the GC concludes the ownerreferenced object doesn't exist (because it doesn't exist in the object's namespace).

We need to decide how to deal w/ these scenarios.

Comment 1 Ben Parees 2018-04-18 12:31:31 UTC
(My current thinking is to stop depending on GC and just iterate+delete the objects when the template instance is deleted, however that requires reconciliation logic that can determine if we've missed a delete event which is generally challenging to implement, at best).

Comment 2 Jordan Liggitt 2018-04-18 14:47:14 UTC
how do you find all the objects created from the template? are the resources (and uids) captured somewhere immutable? aren't there cases like generateName that make it non-deterministic what objects were created?

Comment 3 Ben Parees 2018-04-18 15:07:15 UTC
well we know the objects because the templateinstance object has a fully copy of the template it instantiated, but you're right about generated names, we wouldn't be able to find those.

the next option would be to apply a uniquely generated label to all the objects the templateinstance controller generates (much like the ownerref today, but as a label instead) and then go find the objects later.

Comment 4 Ben Parees 2018-04-18 15:08:47 UTC
at which point, once we also add reconciliation to find orphaned objects that we missed the templateinstance delete event for, we've basically reinvented garbage collection, so the logic question would be:  is there any chance of garbage collection being enhanced to allow cross-namespace ownerrefs and/or clusterscoped objects to have ownerrefs to namespaced objects?

Comment 6 XiuJuan Wang 2018-05-09 08:05:02 UTC
Ben,
Could you give an example how to create the templateinstance and related objects in different namespaces?

Comment 7 Ben Parees 2018-05-09 13:49:06 UTC
Create a template that uses parameterized namespace values like:

https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml#L38

In particular, use two different parameters with two different values, so that one object gets put in the namespace based on the value of PARAM1 and the second object gets put in the namespace based on the value of PARAM2.

Registry the template w/ the template service broker (put it in the openshift namespace) and then provision it using the service catalog/TSB.  (Provisioning a template via the TSB will result in creating a templateinstance for the template).

You should see:

1) object1 gets created in PARAM1's namespace
2) object2 gets created in PARAM2's namespace
3) when you de-provision the instance, object1+object2 get deleted (and the templateinstance object gets deleted)

The additional namespaces must exist and your user must have permission to create/delete objects in those namespaces.

Comment 9 XiuJuan Wang 2018-05-18 08:01:05 UTC
Server https://ec2-***.compute-1.amazonaws.com:8443
openshift v3.10.0-0.47.0
kubernetes v1.10.0+b81c8f8


1)Could deprovision cross namespace resources after deprovision templateinstance.
 a.Provision a clusterserviceclass include cross namespace resources
 b.All resources are marked as labels template.openshift.io/template-instance-owner
 c.Deprovision templateinstance
 d.Cross resources have been deleted.


2)Could provision a cluster scoped resource CRD
But failed to deprovision serviceinstance and templateinstance, although has destroied CRD manaully.

 a.Add cluster-admin policy
    #oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:openshift-infra:template-instance-controller
    #oc adm policy add-cluster-role-to-user cluster-admin common_user
 b.Create template(dummy) include CRD resources under openshift project
 c.After clusterserviceclass sync,provision dummy template
 d. serviceinstance and templateinstance status go to ready, and crd resource(which has label template.openshift.io/template-instance-owner) is created
 e.Deprovision serviceinstance or templateinstance. Two resources always exist after deprovision although has destroied CRD manaully.

Comment 10 Ben Parees 2018-05-18 14:03:23 UTC
how are you "Deprovisioning the templateinstance"?  That's not a thing you can do, you can only deprovision a serviceinstance.

please provide a yaml dump of the objects that are being left behind.

is deprovision reporting any error?

Comment 11 XiuJuan Wang 2018-05-21 05:35:09 UTC
Deprovision templateinstance  via rest api [1], or delete templateinstance from, or delete templateinstance from web console.
Neither does it work.

[1] https://github.com/openshift/origin/blob/master/pkg/templateservicebroker/servicebroker/test-scripts/deprovision.sh

Comment 12 XiuJuan Wang 2018-05-21 05:36:20 UTC
Created attachment 1439382 [details]
crd related resources

Comment 13 Ben Parees 2018-05-21 18:36:44 UTC
ok it looks like the CRD is not being deleted for some reason, so the templateinstance finalizer is blocking the deletion of the templateinstance.

Can you get level 5 logs from the master/controller process that cover the time period when you did the deprovision?

Comment 14 XiuJuan Wang 2018-05-22 06:21:22 UTC
Created attachment 1439869 [details]
FinalizerError

Comment 15 XiuJuan Wang 2018-05-22 06:23:40 UTC
Before I provision crd resources, have add policy:
    #oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:openshift-infra:template-instance-controller
    #oc adm policy add-cluster-role-to-user cluster-admin common_user

But the error in log is no permission to delete crd

Comment 16 Ben Parees 2018-05-22 14:26:06 UTC
oh, right, there's another SA involved now.

you need to also run:
oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:openshift-infra:template-instance-finalizer-controller

to grant the finalizer permission to clean up the CRD.

sorry about that.

Comment 17 XiuJuan Wang 2018-05-23 08:38:51 UTC
After add 'oc adm policy add-cluster-role-to-user cluster-admin system:serviceaccount:openshift-infra:template-instance-finalizer-controller'
Could deprovision crd serviceinstance & templateinstance & crd resources

Move this bug as verified

openshift v3.10.0-0.50.0
kubernetes v1.10.0+b81c8f8

Comment 19 errata-xmlrpc 2018-07-30 19:13:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816