Description of problem: Kubernetes limits object names to 63 characters. If a policy name defined in a PolicyGenTemplate approaches this limit the Topology Aware Life-cycle Operator (TALO)cannot create child policies. When this occurs, the parent policy will remain in a "NonCompliant" state. Version-Release number of selected component (if applicable): 4.10 How reproducible: 100% Steps to Reproduce: 1. Install OCP with TALO and GitOps operators 2. Create a PolicyGenTemplate with a policy name and cluster name near the 63 character limit 3. Install a cluster via ZTP using GitOps and TALO 4. Verify that the parent policy remains in NonCompliant state and child policy is never created. Actual results: Child policy is not created. Expected results: TALO created child policy which eventually goes into "Compliant" state. Additional info: Kubernetes character limit documented here: https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes/2.3/html/governance/governance
https://github.com/openshift-kni/cluster-group-upgrades-operator/pull/145
Verified with the latest upstream image: quay.io/openshift-kni/cluster-group-upgrades-operator@sha256:71180138852a342c9d55b0c730eaea5c7c708fdf87001543e44e0d31511eaf6d apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: resourceVersion: '204296253' name: cnfdf18-new-with-super-long-name uid: 0f5752af-2bc1-42ba-9bd4-d3e142e1efac creationTimestamp: '2022-06-09T14:17:06Z' generation: 1 namespace: ztp-install ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1 blockOwnerDeletion: true controller: true kind: ManagedCluster name: cnfdf18 uid: 76f7eaec-3321-4858-93be-a49ea9db290d finalizers: - ran.openshift.io/cleanup-finalizer spec: actions: afterCompletion: addClusterLabels: ztp-done: '' deleteClusterLabels: ztp-running: '' deleteObjects: true beforeEnable: addClusterLabels: ztp-running: '' backup: false clusters: - cnfdf18 enable: true managedPolicies: - common-cnfdf18-subscriptions-policy preCaching: false remediationStrategy: maxConcurrency: 1 timeout: 240 status: computedMaxConcurrency: 1 conditions: - lastTransitionTime: '2022-06-09T14:20:38Z' message: >- The ClusterGroupUpgrade CR has all clusters compliant with all the managed policies reason: UpgradeCompleted status: 'True' type: Ready managedPoliciesContent: common-cnfdf18-subscriptions-policy: >- [{"kind":"Subscription","name":"sriov-network-operator-subscription","namespace":"openshift-sriov-network-operator"},{"kind":"Subscription","name":"ptp-operator-subscription","namespace":"openshift-ptp"},{"kind":"Subscription","name":"performance-addon-operator","namespace":"openshift-performance-addon-operator"}] managedPoliciesForUpgrade: - name: common-cnfdf18-subscriptions-policy namespace: ztp-common-cnfdf18 managedPoliciesNs: common-cnfdf18-subscriptions-policy: ztp-common-cnfdf18 remediationPlan: - - cnfdf18 safeResourceNames: cnfdf18-new-with-super-long-name-common-cnfdf18-subscriptions-policy: cnfdf18-new-with-super-long-name-common-cnfdf-tqq9m cnfdf18-new-with-super-long-name-common-cnfdf18-subscriptions-policy-config: cnfdf18-new-with-super-long-name-common-cnfdf18-subscript-ccqzs cnfdf18-new-with-super-long-name-common-cnfdf18-subscriptions-policy-placement: >- cnfdf18-new-with-super-long-name-common-cnfdf18-subscriptions-policy-placement-nc67f cnfdf18-new-with-super-long-name-ztp-install-installplan-install-g4t9k: >- cnfdf18-new-with-super-long-name-ztp-install-installplan-install-g4t9k-dbtxj cnfdf18-new-with-super-long-name-ztp-install-subscription-performance-addon-operator: >- cnfdf18-new-with-super-long-name-ztp-install-subscription-performance-addon-operator-drhnj cnfdf18-new-with-super-long-name-ztp-install-subscription-ptp-operator-subscription: >- cnfdf18-new-with-super-long-name-ztp-install-subscription-ptp-operator-subscription-562h7 cnfdf18-new-with-super-long-name-ztp-install-subscription-sriov-network-operator-subscription: >- cnfdf18-new-with-super-long-name-ztp-install-subscription-sriov-network-operator-subscription-b9rgs status: completedAt: '2022-06-09T14:20:39Z' currentBatchRemediationProgress: cnfdf18: state: Completed startedAt: '2022-06-09T14:17:06Z'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.11 CNF vRAN extras update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6110
We ran into this issue setting up a POC for a customer in their lab. We had to truncate the initial names: ``` 2022-09-07T18:22:59.679Z ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "cgu-platform-upgrade-prep", "namespace": "default", "error": "Operation cannot be fulfilled on clustergroupupgrades.ran.openshift.io \"cgu-platform-upgrade-prep\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 ``` The child policy was not being created.
@dphillip Can you provide more info? Either a CGU dump or the TALM pod log. The "object modified" reconcile error is a very generic one and it can happen in many cases. Usually it resolves by itself on the next reconcile.
Hey Jun, we talked today and decided to upgrade the TALM operator. Thanks for taking a look at it.