Created attachment 1871999 [details] must gather sno00367 part1 Description of the problem: During 1000 SNO ZTP deployment test, about 1% of the clusters get stuck on the policy with configuration policy template due to this error: - eventName: ztp-install.sno00367-common-config-policy.16e4f8fd8984682b lastTimestamp: "2022-04-11T22:45:44Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. Release version: Operator snapshot version: OCP version: Browser Info: Steps to reproduce: 1. SNO deployment with DU profile at scale (50 clusters or 100 clusters per hour) 2. 3. Actual results: Config policy fails to be placed and enforced to the SNO Expected results: Config policy should be placed/enforced and become compliant Additional info:
Do you have an ACM must-gather instead of just the OpenShift must-gather?
Must gather from the hub: https://drive.google.com/file/d/1GrE2zg1QG58pJFazKlqXr0-UNeV7AmS6/view?usp=sharing
The hub must-gather is also for OCP. Waiting for this to be reproduced in a new run. Thanks!
Reproduced in the 1700 cluster run: [root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205 NAME REMEDIATION ACTION COMPLIANCE STATE AGE ztp-install.sno00205-common-config-policy enforce NonCompliant 17h status: compliant: NonCompliant details: - compliant: NonCompliant history: - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T22:00:30Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T22:00:28Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T22:00:25Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T22:00:10Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:59:54Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:59:39Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:59:24Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:59:08Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:58:53Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2 lastTimestamp: "2022-04-12T21:58:38Z" message: NonCompliant; no matches for kind "ConfigurationPolicy" in version "policy.open-cluster-management.io/v1", please check if you have CRD deployed. templateMeta: creationTimestamp: null name: sno00205-common-config-policy-config [root@e24-h01-000-r640 ~]# oc --kubeconfig hv-sno/manifests/sno00205/kubeconfig get crd|grep configurationpolicies configurationpolicies.policy.open-cluster-management.io 2022-04-12T22:01:16Z ACM must gather from the hub and must gather from sno00205: https://drive.google.com/drive/folders/1GsrH1uUpiBJ0nVFkcDCDoq8p_p-jQNZe?usp=sharing Note that something happened before the must gathers were taken and the policy got recreated: [root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205 NAME REMEDIATION ACTION COMPLIANCE STATE AGE ztp-install.sno00205-common-config-policy enforce Compliant 5m6s
G2Bsync 1099211258 comment gparvin Thu, 14 Apr 2022 13:53:41 UTC G2Bsync We are investigating a fix to the policy framework, specifically an issue @JustinKuli has identified in the template sync controller that could cause the described behavior. Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4956