Bug 2102289 - CGU resources cleanup is not consistent
Summary: CGU resources cleanup is not consistent
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.10.0
Assignee: Steven Skeard
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-29 15:17 UTC by Steven Skeard
Modified: 2022-07-12 11:39 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-11 15:28:45 UTC
Target Upstream Version:
Embargoed:
sskeard: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cluster-group-upgrades-operator pull 238 0 None Merged Enhancement CNF-5013: Make CGU resource cleanup more consistent 2022-06-30 13:32:07 UTC
Red Hat Product Errata RHBA-2022:5514 0 None None None 2022-07-11 15:28:49 UTC

Description Steven Skeard 2022-06-29 15:17:38 UTC
Description of problem:
The cleanup of resources related to CGUs is not consistent.
Managed policies are deleted on the next reconcile whereas transient resources are deleted on the current reconcile. This can cause issues when ACM is managing a large number of nodes as there can be memory issues related to these resources. 

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Every time

Steps to Reproduce:
1. Perform a cgu with backup or precaching enabled
2. Observe that some resources are deleted on the current reconcile while others are deleted on the next
3.

Actual results:
The timing for deleting resources is not consistent.

Expected results:
All resources would be cleaned up at the same time.

Additional info:

Comment 2 yliu1 2022-07-05 18:46:00 UTC
@sskeard could you please set target release? Thanks.

Comment 4 yliu1 2022-07-07 17:37:14 UTC
This issue does not seem to be fully resolved. 

CGU spec used: 
  spec:
    clusters:
    - worker-2
    enable: true
    managedPolicies:
    - common-config-policy
    - common-subscriptions-policy
    preCaching: true
    remediationStrategy:
      maxConcurrency: 1
      timeout: 240

Following managedclusterviews shortly after precaching started:
Precache started, following managedclusterviews are created.
[kni ~]$ oc get managedclusterviews.view.open-cluster-management.io -A
NAMESPACE   NAME                                 AGE
worker-2    view-precache-cluster-role-binding   105s
worker-2    view-precache-job                    30s
worker-2    view-precache-namespace              2m48s
worker-2    view-precache-service-acct           109s
worker-2    view-precache-spec-configmap         114s

Shortly after (probably precaching completed), only 1 remained. It did not get cleaned up until after the upgrade is completed.
$ oc get managedclusterviews.view.open-cluster-management.io -A
NAMESPACE   NAME                AGE
worker-2    view-precache-job   10m14s

Comment 7 errata-xmlrpc 2022-07-11 15:28:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.22 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5514


Note You need to log in before you can comment on or make changes to this bug.