Bug 2089093 - CVO hotloops on OperatorGroup due to the diff of "upgradeStrategy": string("Default")
Summary: CVO hotloops on OperatorGroup due to the diff of "upgradeStrategy": string("...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.13.0
Assignee: David Hurta
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 02:45 UTC by Yang Yang
Modified: 2023-05-17 22:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-17 22:46:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CVO log file (11.11 MB, text/plain)
2022-05-23 02:45 UTC, Yang Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 862 0 None open Bug 2089093: CVO hotloops on OperatorGroup due to the diff of "upgradeStrategy" 2022-11-09 17:50:14 UTC
Red Hat Product Errata RHSA-2023:1326 0 None None None 2023-05-17 22:46:45 UTC

Description Yang Yang 2022-05-23 02:45:50 UTC
Created attachment 1882157 [details]
CVO log file

Description of problem:

In a cluster, we can see CVO hotloops on the resources
$ grep -o 'Updating .*due to diff' cvo.log | sort | uniq -c
     32 Updating CRD performanceprofiles.performance.openshift.io due to diff
     32 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
     32 Updating OperatorGroup openshift-monitoring/openshift-cluster-monitoring due to diff
     32 Updating OperatorGroup openshift-operator-lifecycle-manager/olm-operators due to diff
     32 Updating OperatorGroup openshift-operators/global-operators due to diff
     32 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff


$ grep "Updating OperatorGroup openshift-operator-lifecycle-manager/olm-operators due to diff" -A20 cvo.log

I0519 06:23:50.004734       1 generic.go:109] Updating OperatorGroup openshift-operator-lifecycle-manager/olm-operators due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("operators.coreos.com/v1"),
  		"kind":       string("OperatorGroup"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "olm.providedAPIs": string("PackageManifest.v1.packages.operators.coreos.com")}, "creationTimestamp": string("2022-05-19T03:44:11Z"), "generation": int64(1), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"5d28ef1d-78f0-4a88-9c2c-1f0fb01e3c57"}`: map[string]interface{}{}}}, "f:spec": map[string]interface{}{".": map[string]interface{}{}, "f:targetNamespaces": map[string]interface{}{".": map[string]interface{}{}, `v:"openshift-operator-lifecycle-manager"`: map[string]interface{}{}}, "f:upgradeStrategy": map[string]interface{}{}}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:status": map[string]interface{}{".": map[string]interface{}{}, "f:lastUpdated": map[string]interface{}{}, "f:namespaces": map[string]interface{}{".": map[string]interface{}{}, `v:"openshift-operator-lifecycle-manager"`: map[string]interface{}{}}}}, "manager": string("Go-http-client"), ...}, map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{"f:olm.providedAPIs": map[string]interface{}{}}}}, "manager": string("Go-http-client"), ...}}, ...},
  		"spec": map[string]interface{}{
  			"targetNamespaces": []interface{}{string("openshift-operator-lifecycle-manager")},
+ 			"upgradeStrategy":  string("Default"),
  		},
  		"status": map[string]interface{}{"lastUpdated": string("2022-05-19T03:46:59Z"), "namespaces": []interface{}{string("openshift-operator-lifecycle-manager")}},
  	},
  }
I0519 06:23:50.015887       1 sync_worker.go:945] Done syncing for operatorgroup "openshift-operator-lifecycle-manager/olm-operators" (612 of 802)
I0519 06:23:50.016102       1 sync_worker.go:930] Running sync for subscription "openshift-operator-lifecycle-manager/packageserver" (613 of 802)
I0519 06:23:50.041965       1 sync_worker.go:945] Done syncing for subscription "openshift-operator-lifecycle-manager/packageserver" (613 of 802)
I0519 06:23:50.042477       1 sync_worker.go:930] Running sync for clusteroperator "operator-lifecycle-manager" (614 of 802)
I0519 06:23:50.045187       1 sync_worker.go:945] Done syncing for clusteroperator "operator-lifecycle-manager" (614 of 802)
I0519 06:23:50.045368       1 sync_worker.go:930] Running sync for clusteroperator "operator-lifecycle-manager-catalog" (615 of 802)
I0519 06:23:50.046775       1 sync_worker.go:945] Done syncing for clusteroperator "operator-lifecycle-manager-catalog" (615 of 802)
I0519 06:23:50.046878       1 sync_worker.go:930] Running sync for clusteroperator "operator-lifecycle-manager-packageserver" (616 of 802)
I0519 06:23:50.047819       1 sync_worker.go:945] Done syncing for clusteroperator "operator-lifecycle-manager-packageserver" (616 of 802)

OLM has introduced UpgradeStrategy field to OperatorGroups [1]. Looking at the operatorgroup CRD, UpgradeStrategy is present [2].

[1] https://github.com/openshift/operator-framework-olm/commit/d895574f25098093ccde2108c7cb8e288972f1d7
[2] https://github.com/openshift/operator-framework-olm/blob/master/manifests/0000_50_olm_00-operatorgroups.crd.yaml#L43

masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}')

# oc adm node-logs $masters --path=kube-apiserver/audit.log --raw | zgrep -h '"resource":"operatorgroups"' 2>/dev/null | jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' | grep openshift-operator-lifecycle-manager
system:serviceaccount:openshift-cluster-version:default operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:29:46.675922Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:29:46.696079Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:29:46.706094Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:29:46.712102Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:29:46.749286Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:30:52.879022Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:31:09.981192Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:31:10.182751Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:32:08.247971Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:32:52.914636Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:33:35.610683Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups   operators.coreos.com 2022-05-19T09:33:56.912355Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups   operators.coreos.com 2022-05-19T09:33:56.918403Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:34:12.946139Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:34:12.961555Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager olm-operators operators.coreos.com 2022-05-19T09:34:12.973970Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:34:12.980258Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:34:35.562636Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:34:35.687918Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:34:35.938816Z {"metadata":{},"code":200}
system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount operatorgroups openshift-operator-lifecycle-manager  operators.coreos.com 2022-05-19T09:34:35.997805Z {"metadata":{},"code":200}

Both OLM and CVO try to update the operatorgroup.


Looking at the other 2 operatorgroup resources, both complain about the upgradeStrategy.

$ grep "Updating OperatorGroup openshift-monitoring/openshift-cluster-monitoring due to diff" -A20 cvo.log

I0519 06:24:11.493893       1 generic.go:109] Updating OperatorGroup openshift-monitoring/openshift-cluster-monitoring due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("operators.coreos.com/v1"),
  		"kind":       string("OperatorGroup"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true"), "olm.providedAPIs": string("Alertmanager.v1.monitoring.coreos.com,PodMonitor.v1.monitoring.c"...)}, "creationTimestamp": string("2022-05-19T03:43:48Z"), "generation": int64(1), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}, "f:include.release.openshift.io/single-node-developer": map[string]interface{}{}, ...}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"5d28ef1d-78f0-4a88-9c2c-1f0fb01e3c57"}`: map[string]interface{}{}}}, "f:spec": map[string]interface{}{".": map[string]interface{}{}, "f:selector": map[string]interface{}{".": map[string]interface{}{}, "f:matchLabels": map[string]interface{}{".": map[string]interface{}{}, "f:openshift.io/cluster-monitoring": map[string]interface{}{}}}, "f:staticProvidedAPIs": map[string]interface{}{}, "f:upgradeStrategy": map[string]interface{}{}}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:status": map[string]interface{}{".": map[string]interface{}{}, "f:lastUpdated": map[string]interface{}{}, "f:namespaces": map[string]interface{}{".": map[string]interface{}{}, `v:"openshift-apiserver"`: map[string]interface{}{}, `v:"openshift-apiserver-operator"`: map[string]interface{}{}, `v:"openshift-authentication"`: map[string]interface{}{}, ...}}}, "manager": string("Go-http-client"), ...}}, ...},
  		"spec": map[string]interface{}{
  			"selector":           map[string]interface{}{"matchLabels": map[string]interface{}{"openshift.io/cluster-monitoring": string("true")}},
  			"staticProvidedAPIs": bool(true),
+ 			"upgradeStrategy":    string("Default"),
  		},
  		"status": map[string]interface{}{"lastUpdated": string("2022-05-19T03:56:07Z"), "namespaces": []interface{}{string("openshift-kube-controller-manager-operator"), string("openshift-cluster-csi-drivers"), string("openshift-cloud-network-config-controller"), string("openshift-cloud-credential-operator"), ...}},
  	},
  }


$ grep "Updating OperatorGroup openshift-operators/global-operators due to diff" -A20 cvo.log

I0519 04:50:51.993279       1 generic.go:109] Updating OperatorGroup openshift-operators/global-operators due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("operators.coreos.com/v1"),
  		"kind":       string("OperatorGroup"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true")}, "creationTimestamp": string("2022-05-19T03:44:10Z"), "generation": int64(1), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"5d28ef1d-78f0-4a88-9c2c-1f0fb01e3c57"}`: map[string]interface{}{}}}, "f:spec": map[string]interface{}{".": map[string]interface{}{}, "f:upgradeStrategy": map[string]interface{}{}}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("operators.coreos.com/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:status": map[string]interface{}{".": map[string]interface{}{}, "f:lastUpdated": map[string]interface{}{}, "f:namespaces": map[string]interface{}{".": map[string]interface{}{}, `v:""`: map[string]interface{}{}}}}, "manager": string("Go-http-client"), ...}}, ...},
+ 		"spec":       map[string]interface{}{"upgradeStrategy": string("Default")},
  		"status":     map[string]interface{}{"lastUpdated": string("2022-05-19T03:46:59Z"), "namespaces": []interface{}{string("")}},
  	},
  }

Version-Release number of the following components:
4.11.0-0.nightly-2022-05-18-053037


How reproducible:
1/1

Steps to Reproduce:
1. Install a cluster
2. 
3.

Actual results:
CVO hotloops on OperatorGroup

Expected results:
CVO doesn't hotloop on OperatorGroup


Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Yang Yang 2022-05-23 03:00:48 UTC
Audit log is available at https://drive.google.com/file/d/1fbga1w21iuB4jHSRiQ6Fjwt78YMUMb8t/view?usp=sharing.

Comment 4 Yang Yang 2022-11-23 08:11:18 UTC
Verified on 4.13.0-0.nightly-2022-11-22-205408

# oc logs pod/cluster-version-operator-68b9bdd8d8-k9sd2 -n openshift-cluster-version | grep -o 'Updating OperatorGroup'

There is no hotlooping on OperatorGroup. Moving it to verified state.

Comment 7 errata-xmlrpc 2023-05-17 22:46:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326


Note You need to log in before you can comment on or make changes to this bug.