Bug 2074626

Summary: Policy placement failure during ZTP SNO scale test
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: jun
Component: GRC & PolicyAssignee: Gus Parvin <gparvin>
Status: CLOSED ERRATA QA Contact: Derek Ho <dho>
Severity: high Docs Contact: Mikela Dockery <mdockery>
Priority: unspecified    
Version: rhacm-2.5CC: akrzos, gparvin, imiller, jkulikau
Target Milestone: ---Flags: bot-tracker-sync: rhacm-2.5+
jun: needinfo-
Target Release: rhacm-2.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-09 02:10:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must gather sno00367 part1 none

Description jun 2022-04-12 16:54:05 UTC
Created attachment 1871999 [details]
must gather sno00367 part1

Description of the problem:
During 1000 SNO ZTP deployment test, about 1% of the clusters get stuck on the policy with configuration policy template due to this error:

- eventName: ztp-install.sno00367-common-config-policy.16e4f8fd8984682b
      lastTimestamp: "2022-04-11T22:45:44Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.



Release version:

Operator snapshot version:

OCP version:

Browser Info:

Steps to reproduce:
1. SNO deployment with DU profile at scale (50 clusters or 100 clusters per hour)
2.
3.

Actual results:
Config policy fails to be placed and enforced to the SNO

Expected results:
Config policy should be placed/enforced and become compliant

Additional info:

Comment 2 Gus Parvin 2022-04-12 18:55:35 UTC
Do you have an ACM must-gather instead of just the OpenShift must-gather?

Comment 4 Gus Parvin 2022-04-12 19:27:29 UTC
The hub must-gather is also for OCP.  Waiting for this to be reproduced in a new run.  Thanks!

Comment 5 jun 2022-04-14 13:41:43 UTC
Reproduced in the 1700 cluster run:

[root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205
NAME                                           REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-install.sno00205-common-config-policy      enforce              NonCompliant       17h

status:
  compliant: NonCompliant
  details:
  - compliant: NonCompliant
    history:
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:30Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:28Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:25Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T22:00:10Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:54Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:39Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:24Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:59:08Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:58:53Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    - eventName: ztp-install.sno00205-common-config-policy.16e544fbef0fbeb2
      lastTimestamp: "2022-04-12T21:58:38Z"
      message: NonCompliant; no matches for kind "ConfigurationPolicy" in version
        "policy.open-cluster-management.io/v1", please check if you have CRD deployed.
    templateMeta:
      creationTimestamp: null
      name: sno00205-common-config-policy-config

[root@e24-h01-000-r640 ~]# oc --kubeconfig hv-sno/manifests/sno00205/kubeconfig get crd|grep configurationpolicies
configurationpolicies.policy.open-cluster-management.io           2022-04-12T22:01:16Z

ACM must gather from the hub and must gather from sno00205:
https://drive.google.com/drive/folders/1GsrH1uUpiBJ0nVFkcDCDoq8p_p-jQNZe?usp=sharing

Note that something happened before the must gathers were taken and the policy got recreated:

[root@e24-h01-000-r640 ~]# oc --kubeconfig bm/kubeconfig get policy -n sno00205
NAME                                           REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-install.sno00205-common-config-policy      enforce              Compliant          5m6s

Comment 6 bot-tracker-sync 2022-04-14 14:37:02 UTC
G2Bsync 1099211258 comment 
 gparvin Thu, 14 Apr 2022 13:53:41 UTC 
 G2Bsync
We are investigating a fix to the policy framework, specifically an issue @JustinKuli has identified in the template sync controller that could cause the described behavior.  Thanks!

Comment 12 errata-xmlrpc 2022-06-09 02:10:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956