Bug 2009233 - ACM policy object generated by PolicyGen conflicting with OLM Operator
Summary: ACM policy object generated by PolicyGen conflicting with OLM Operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Telco Edge
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.0
Assignee: Ian Miller
QA Contact: yliu1
URL:
Whiteboard:
Depends On:
Blocks: 2025082
TreeView+ depends on / blocked
 
Reported: 2021-09-30 08:35 UTC by Juan Manuel Parrilla Madrid
Modified: 2022-03-10 16:15 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Generated policy has complianceType "mustonlyhave". OLM updates to metadata are then reverted as policy engine restored "desired" state of CR. Consequence: OLM and the policy engine continuously overwrite the metadata of the CR under conflict. High CPU use results. Fix: Change default complianceType to "musthave" Result: OLM and policy engine no longer conflict. CPU use returns to baseline.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:14:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift-kni cnf-features-deploy pull 804 0 None open Bug 2009233: ztp: Override compliance type for Namespace CRs 2021-11-15 21:10:52 UTC
Github openshift-kni cnf-features-deploy pull 815 0 None open Bug 2009233: ztp: Support complianceType for full PGT 2021-11-24 13:35:29 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:15:18 UTC

Description Juan Manuel Parrilla Madrid 2021-09-30 08:35:30 UTC
Description of problem:

Using the ZTP flow and the repo https://github.com/openshift-kni/cnf-features-deploy we deploy an SNO cluster and the policies associated with the environment. The thing is, when the ACM policies are created in the hub cluster has the behaviour "MustOnlyHave" this ends on an issue regarding the labels and the namespaces:

- The Policy creates a NS with concrete labels (E.G monitoring)
- The OLM tries to path that NS with a label (olm.operatorgroup.uid/bb373fcd-1a63-4b7a-83cd-011226dc71ad: "") (automatically generated by OLM operator
- The policy enter in NonCompliant state.
- The policy get applied again..
- The loop goes on

I'm using the Hooks for PolicyGen

Version-Release number of selected component (if applicable):

ACM 2.3.3
Hub 4.8.5
SNO 4.8.11

How reproducible:

Always


Steps to Reproduce:
1. Deploy ACM and the gitops-operator
2. Fill the code repo as exists on the cnf-feature-deploy repo
3. git push to the repo and then let the hooks deploy the SNO and the ACM Policies
4. Wait until it starts flapping

Actual results:

- Policy flapping between between NonCompliant and Compliant state
- Many errors on the OLM Operator Logs

Expected results:

No errors
Additional info:

- Logs on the OLM Operator:
time="2021-09-29T10:30:12Z" level=info msg="checking ptp-operator.4.8.0-202108312109"
time="2021-09-29T10:30:12Z" level=info msg="checking performance-addon-operator.v4.8.1"
{"level":"error","ts":1632911412.328675,"logger":"controllers.operator","msg":"Could not update Operator status","request":"/ptp-operator.openshift-ptp","error":"Operation cannot be fulfilled on operators.operators.coreos.com \"ptp-operator.openshift-ptp\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:293\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}
time="2021-09-29T10:30:12Z" level=info msg="checking ptp-operator.4.8.0-202108312109"
{"level":"error","ts":1632911412.4046497,"logger":"controllers.operator","msg":"Could not update Operator status","request":"/local-storage-operator.openshift-local-storage","error":"Operation cannot be fulfilled on operators.operators.coreos.com \"local-storage-operator.openshift-local-storage\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:293\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1632911412.4187012,"logger":"controllers.operator","msg":"Could not update Operator status","request":"/local-storage-operator.openshift-local-storage","error":"Operation cannot be fulfilled on operators.operators.coreos.com \"local-storage-operator.openshift-local-storage\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:293\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}
time="2021-09-29T10:30:12Z" level=info msg="checking sriov-fec.v1.3.0"
E0929 10:30:12.680671       1 queueinformer_operator.go:290] sync {"update" "openshift-performance-addon-operator"} failed: Operation cannot be fulfilled on namespaces "openshift-performance-addon-operator": the object has been modified; please apply your changes to the latest version and try again
E0929 10:30:12.730499       1 queueinformer_operator.go:290] sync {"update" "openshift-sriov-network-operator"} failed: Operation cannot be fulfilled on namespaces "openshift-sriov-network-operator": the object has been modified; please apply your changes to the latest version and try again
time="2021-09-29T10:30:13Z" level=info msg="checking ptp-operator.4.8.0-202108312109"
time="2021-09-29T10:30:13Z" level=info msg="checking performance-addon-operator.v4.8.1"
time="2021-09-29T10:30:14Z" level=info msg="checking performance-addon-operator.v4.8.1"

Comment 1 Juan Manuel Parrilla Madrid 2021-09-30 12:10:32 UTC
Patching the ACM policy with "complianceType: musthave" is a temp workaround that you can apply, but if you modify the repo this will be overrided by the hooks.

Comment 2 Juan Manuel Parrilla Madrid 2021-10-07 07:58:37 UTC
Hey folks, this is also happening with PVCs:


Error on the policy:

    - eventName: vz-wc-lab-policies.vz-wc-lab-image-registry-policy.16ab6a5510dab516
      lastTimestamp: "2021-10-07T05:40:23Z"
      message: 'NonCompliant; violation - Error updating the object `registry-storage`,
        the error is `Operation cannot be fulfilled on persistentvolumeclaims "registry-storage":
        the object has been modified; please apply your changes to the latest version
        and try again`; notification - configs [cluster] found as specified, therefore
        this Object template is compliant'
    - eventName: vz-wc-lab-policies.vz-wc-lab-image-registry-policy.16aba20db2e6167c
      lastTimestamp: "2021-10-07T05:35:06Z"
      message: "NonCompliant; violation - Error updating the object `registry-storage`,
        the error is `PersistentVolumeClaim \"registry-storage\" is invalid: spec:
        Forbidden: spec is immutable after creation except resources.requests for
        bound claims\n  core.PersistentVolumeClaimSpec{\n  \tAccessModes:      {\"ReadWriteOnce\"},\n
        \ \tSelector:         nil,\n  \tResources:        {Requests: {s\"storage\":
        {i: {...}, s: \"100Gi\", Format: \"BinarySI\"}}},\n- \tVolumeName:       \"\",\n+
        \tVolumeName:       \"local-pv-b908200e\",\n  \tStorageClassName: nil,\n  \tVolumeMode:
        \      &\"Filesystem\",\n  \tDataSource:       nil,\n  }\n`; notification
        - configs [cluster] found as specified, therefore this Object template is
        compliant"
    - eventName: vz-wc-lab-policies.vz-wc-lab-image-registry-policy.16ab6a5510dab516
      lastTimestamp: "2021-10-07T05:10:08Z"
      message: 'NonCompliant; violation - Error updating the object `registry-storage`,
        the error is `Operation cannot be fulfilled on persistentvolumeclaims "registry-storage":
        the object has been modified; please apply your changes to the latest version
        and try again`; notification - configs [cluster] found as specified, therefore
        this Object template is compliant'



This is the object that the policy want to enforce:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    volume.beta.kubernetes.io/storage-class: fs-lso
  creationTimestamp: "2021-10-06T10:04:09Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: registry-storage
  namespace: openshift-image-registry
  resourceVersion: "1623757"
  uid: 36578f14-2c57-4e46-b116-8aabedf759ed
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  volumeMode: Filesystem
  volumeName: local-pv-b908200e
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  phase: Bound


This is the object that other operator want to apply:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    volume.beta.kubernetes.io/storage-class: fs-lso
  creationTimestamp: "2021-10-06T10:04:09Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: registry-storage
  namespace: openshift-image-registry
  resourceVersion: "1623757"
  uid: 36578f14-2c57-4e46-b116-8aabedf759ed
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  volumeMode: Filesystem
  volumeName: local-pv-b908200e
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  phase: Bound

Comment 5 yliu1 2021-11-19 20:28:17 UTC
We currently don't have a formal test env to test ZTP for 4.10 nightly at the moment. Mark it as verified to unblock merge to 4.9, and will verify this change in 4.9.

Comment 6 Ian Miller 2021-11-24 12:36:07 UTC
Reopening. Further testing showed there is still excess CPU use.

Comment 9 Tony Mulqueen 2022-01-10 15:39:51 UTC
Doc Text would be helpful in documenting this in the 4.10 release notes. Please supply.

Comment 12 errata-xmlrpc 2022-03-10 16:14:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.