Bug 2074626 - Policy placement failure during ZTP SNO scale test
Summary: Policy placement failure during ZTP SNO scale test
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: GRC & Policy
Version: rhacm-2.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.5
Assignee: Gus Parvin
QA Contact: Derek Ho
Mikela Dockery
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-12 16:54 UTC by jun
Modified: 2022-06-09 02:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 02:10:53 UTC
Target Upstream Version:
Embargoed:
bot-tracker-sync: rhacm-2.5+
jun: needinfo-


Attachments (Terms of Use)
must gather sno00367 part1 (15.00 MB, application/gzip)
2022-04-12 16:54 UTC, jun
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 21640 0 None None None 2022-04-12 21:31:57 UTC
Red Hat Product Errata RHSA-2022:4956 0 None None None 2022-06-09 02:10:59 UTC

Description jun 2022-04-12 16:54:05 UTC
Created attachment 1871999 [details]
must gather sno00367 part1

Description of the problem:
During 1000 SNO ZTP deployment test, about 1% of the clusters get stuck on the policy with configuration policy template due to this error:

- eventName: ztp-install.sno00367-common-config-policy.16e4f8fd8984682b
      lastTimestamp: "2022-04-11T22:45:44Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.



Release version:

Operator snapshot version:

OCP version:

Browser Info:

Steps to reproduce:
1. SNO deployment with DU profile at scale (50 clusters or 100 clusters per hour)
2.
3.

Actual results:
Config policy fails to be placed and enforced to the SNO

Expected results:
Config policy should be placed/enforced and become compliant

Additional info:

Comment 2 Gus Parvin 2022-04-12 18:55:35 UTC
Do you have an ACM must-gather instead of just the OpenShift must-gather?

Comment 4 Gus Parvin 2022-04-12 19:27:29 UTC
The hub must-gather is also for OCP.  Waiting for this to be reproduced in a new run.  Thanks!

Comment 5 jun 2022-04-14 13:41:43 UTC
Reproduced in the 1700 cluster run:

[root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205
NAME                                           REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-install.sno00205-common-config-policy      enforce              NonCompliant       17h

status:
  compliant: NonCompliant
  details:
  - compliant: NonCompliant
    history:
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:30Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:28Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:25Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:10Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:54Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:39Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:24Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:08Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:58:53Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:58:38Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    templateMeta:
      creationTimestamp: null
      name: sno00205-common-config-policy-config

[root@e24-h01-000-r640 ~]# oc --kubeconfig hv-sno/manifests/sno00205/kubeconfig get crd|grep configurationpolicies
configurationpolicies.policy.open-cluster-management.io           2022-04-12T22:01:16Z

ACM must gather from the hub and must gather from sno00205:
https://drive.google.com/drive/folders/1GsrH1uUpiBJ0nVFkcDCDoq8p_p-jQNZe?usp=sharing

Note that something happened before the must gathers were taken and the policy got recreated:

[root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205
NAME                                           REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-install.sno00205-common-config-policy      enforce              Compliant          5m6s

Comment 6 bot-tracker-sync 2022-04-14 14:37:02 UTC
G2Bsync 1099211258 comment 
 gparvin Thu, 14 Apr 2022 13:53:41 UTC 
 G2Bsync
We are investigating a fix to the policy framework, specifically an issue @JustinKuli has identified in the template sync controller that could cause the described behavior.  Thanks!

Comment 12 errata-xmlrpc 2022-06-09 02:10:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956


Note You need to log in before you can comment on or make changes to this bug.