Bug 1974654

Summary: Policy propagator propagates owner from root policy
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Ricardo Carrillo Cruz <ricarril>
Component: GRC & PolicyAssignee: Gus Parvin <gparvin>
Status: CLOSED CURRENTRELEASE QA Contact: Derek Ho <dho>
Severity: high Docs Contact: Mikela Dockery <mdockery>
Priority: high    
Version: rhacm-2.2CC: dho, gparvin, juhsu, keyoung, mdockery, ycao56
Target Milestone: ---Flags: ming: rhacm-2.2.z+
Target Release: rhacm-2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1974648 Environment:
Last Closed: 2021-11-08 20:57:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1974648    
Bug Blocks:    

Description Ricardo Carrillo Cruz 2021-06-22 09:02:34 UTC
Description of the problem:

We are writing a Kubernetes operator that creates ACM secondary resources.
When the operator creates a primary resource that itself causes an ACM policy creation, the policy propagator pod enters an infinite loop of reconciling the policy:

<snip>

{"level":"info","ts":1624288258.3368125,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.337039,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.404348,"logger":"policy-propagator","msg":"Found reconciliation request from replicated policy...","Namespace":"spoke1","Name":"default.group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.447739,"logger":"policy-propagator","msg":"Reconciliation complete.","Policy-Namespace":"default","Policy-Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.4478629,"logger":"policy-propagator","msg":"Reconciling Policy...","Request.Namespace":"default","Request.Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.4709926,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.4710896,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.4720683,"logger":"policy-propagator","msg":"Found reconciliation request from replicated policy...","Namespace":"spoke2","Name":"default.group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.483452,"logger":"policy-propagator","msg":"Reconciliation complete.","Policy-Namespace":"default","Policy-Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.4835362,"logger":"policy-propagator","msg":"Reconciling Policy...","Request.Namespace":"default","Request.Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.4845123,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.4845293,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-2-policy-deployment-policy"}
{"level":"info","ts":1624288258.5060313,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.506073,"logger":"policy-propagator","msg":"Found reconciliation request from root policy...","Namespace":"default","Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.5080616,"logger":"policy-propagator","msg":"Reconciliation complete.","Policy-Namespace":"default","Policy-Name":"group1-batch-1-policy-deployment-policy"}
{"level":"info","ts":1624288258.5084293,"logger":"policy-propagator","msg":"Reconciling Policy...","Request.Namespace":"default","Request.Name":"group1-batch-2-policy-deployment-policy"}

</snip>

Gus Parvin looked into the issue and it appears is due to policy propagator propagating the policy owner reference from the root policy to the managed cluster namespaces. Since our operator is the owner of the root policy, the error condition appears.
Gus fixed that behaviour on ACM code and after that the operator created the policy object fine and no infinite loop occurred.

Release version:

2.2

Operator snapshot version:

OCP version:

4.8

Browser Info:

Steps to reproduce:
1. Clone https://github.com/rcarrillocruz/cluster-group-lcm/commit/d7b5c8ff942085fe9525f40f996da5c668b902c1
2. Run 'make deploy' with a KUBECONFIG targetting a hub cluster with ACM
3. Run oc edit clusterrole open-cluster-management:grc-b5e97:clusterrole and add to the end:
- apiGroups:
  - ran.openshift.io
  resources:
  - groups/finalizers
  verbs:
  - update
4. Run oc apply -f group.yaml (group.yaml is located in the cloned repo)

Actual results:

Policy object is constantly recreated and policy propagator pod logs show an infinite loop reconciling the policy.

Expected results:

Policy object is created just once and no infinite loop of reconcile actions is shown in the propagator pod logs.

Comment 1 Gus Parvin 2021-06-22 15:45:47 UTC
Thanks for helping us improve grc!

Comment 2 Gus Parvin 2021-06-23 18:16:34 UTC
After discussing more with the customer we decided this is not needed in ACM 2.2, only in 2.3.  The fix has been delivered into the release and is being validated.

Comment 3 Ricardo Carrillo Cruz 2021-06-24 07:51:29 UTC
Thanks for this Gus.
Once this BZ moves to verified, I assume that means the fix is in some branch.
Is it possible for us to install with OLM non-released ACM, i.e. I'd like to develop my operator against a ACM with the fix as soon as it's available.

Thanks

Comment 4 Mike Ng 2021-06-30 20:31:31 UTC
G2Bsync 871640744 comment 
 gparvin Wed, 30 Jun 2021 18:40:40 UTC 
 G2Bsync
The instructions for installing the pre-release ACM 2.3 is available in the repository https://github.com/open-cluster-management/deploy and make sure you follow the instructions for obtaining access to the open-cluster-management quay organization so your pull secret can pull our images.

Comment 6 juhsu 2021-11-08 20:57:00 UTC
Fixed in 2.3 GA.