metadata.managedFields[0].time shows that CVO is updating this continously: {"count":91,"path":"/apis/operators.coreos.com/v1alpha1/namespaces/openshift-operator-lifecycle-manager/clusterserviceversions/packageserver"} Looks like it races with OLM: managedFields: - apiVersion: operators.coreos.com/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:olm.clusteroperator.name: {} f:olm.version: {} f:spec: .: {} f:apiservicedefinitions: .: {} f:owned: {} f:description: {} f:displayName: {} f:install: .: {} f:spec: .: {} f:clusterPermissions: {} f:strategy: {} f:installModes: {} f:keywords: {} f:links: {} f:maintainers: {} f:maturity: {} f:minKubeVersion: {} f:provider: .: {} f:name: {} f:version: {} manager: cluster-version-operator operation: Update time: "2020-09-22T15:18:01Z" - apiVersion: operators.coreos.com/v1alpha1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:olm.operatorGroup: {} f:olm.operatorNamespace: {} f:olm.targetNamespaces: {} f:labels: f:olm.api.4bca9f23e412d79d: {} f:spec: f:customresourcedefinitions: {} f:install: f:spec: f:deployments: {} f:status: .: {} f:certsLastUpdated: {} f:certsRotateAt: {} f:conditions: {} f:lastTransitionTime: {} f:lastUpdateTime: {} f:message: {} f:phase: {} f:reason: {} f:requirementStatus: {} manager: olm operation: Update time: "2020-09-22T15:18:16Z"
Stefan suggests possibly waiting until API-server support for server-side apply [1] goes GA and rerolling the CVO's apply logic to use that instead of client-side merging, which might help here. And bug 1879184 might end up with a [Late] CI guard based on the audit logs. But whatever is going on here is unlikely to be new in 4.6, so punting to 4.7. [1]: https://kubernetes.io/blog/2020/04/01/kubernetes-1.18-feature-server-side-apply-beta-2/
I don't think https://bugzilla.redhat.com/show_bug.cgi?id=1881522#c1 reflect what I meant. I meant that it is an infinite game with lots of chance for mistakes and failure in writing the perfect client-side merging funcs for all types. Instead the right solution is to triage these bugs, fix the manifests for now and add an e2e test that uncovers the issues before new manifests merge.
> ... fix the manifests for now... If that's what this bug is about, it should be assigned to the samples team, right? > ... and add an e2e test that uncovers the issues before new manifests merge. This is bug 1879184, right? Punting back to 4.7, because I don't see any new-in-4.6 regressions here, and it's really late in the 4.6 cycle to make new 4.6 blockers unless we have a solid story around why this is a critical issue.
It's end of sprint, and this is not going to get fixed in the next few hours. Hopefully we will at least get the Late audit guard from bug 1879184 in next sprint, and then we'll see which team should fix this issue.
Adding UpcomingSprint as we have reached the end of the current sprint and pushing this bug to the next sprint.
Reproducing with 4.8.0-fc.3 # masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}') # oc adm node-logs $masters --path=kube-apiserver/audit.log --raw | zgrep -h '"verb":"update".*"resource":".*"packageserver"' 2>/dev/null | jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' | sort system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:37:50.673758Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:46:41.857536Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:37:50.691220Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:37:50.706568Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:37:50.719282Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:37:50.719576Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:46:41.876512Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T00:46:41.894444Z {"metadata":{},"code":200} CVO updates the packageserver and races with OLM.
Attempting to verify it with 4.8.0-0.nightly-2021-06-07-180258 # oc adm release info --commits registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-06-07-180258 | grep -i olm operator-lifecycle-manager https://github.com/openshift/operator-framework-olm 1adb4495ae3cec2189e74bd354af348ed5ec7b9b $ git --no-pager log --first-parent --oneline -3 origin/release-4.8 1adb4495a (HEAD -> master, origin/release-4.9, origin/release-4.8, origin/master, origin/HEAD) Merge pull request #84 from vrutkovs/cvo-hotlooping ca1f0b69c Merge pull request #83 from hasbro17/fix-ssa-error 0e9f3bffa Merge pull request #82 from joelanford/bz-1961472 The nightly build includes the fix. # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-06-07-180258 True False 2m Cluster version is 4.8.0-0.nightly-2021-06-07-180258 # masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}') # oc adm node-logs $masters --path=kube-apiserver/audit.log --raw | zgrep -h '"verb":"update".*"resource":".*"packageserver"' 2>/dev/null | jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' | sort system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:39:54.025056Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:43:07.894875Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:46:50.647544Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:51:12.037391Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:54:24.973357Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:25.235770Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:25.577762Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:39.919881Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:40.303295Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:40.320423Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:40:40.803135Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:43:07.921885Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:43:07.944801Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:43:07.961416Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:46:50.679361Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:46:50.712323Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:46:50.726868Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:46:50.732949Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:51:12.058207Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:51:12.075421Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:51:12.086471Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:51:12.086550Z {"metadata":{},"status":"Failure","reason":"Conflict","code":409} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:54:24.994938Z {"metadata":{},"code":200} system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount clusterserviceversions openshift-operator-lifecycle-manager packageserver operators.coreos.com 2021-06-08T02:54:25.017649Z {"metadata":{},"code":200} The CVO still updates the packageserver constantly and races with olm. Re-opening it. It's fixed in olm component only but it targets to CVO component. Is the bz component selected correctly?
Right, this also needs https://github.com/openshift/cluster-version-operator/pull/561 to do proper comparison on unspecified resources. I'll move the bug to ON_QA again once it merges and a new nightly is available
https://github.com/openshift/cluster-version-operator/pull/561 has merged. So moving this to ON_QA
Verified with 4.8.0-0.nightly-2021-06-09-214128 # masters=$(oc get no -l node-role.kubernetes.io/master | sed '1d' | awk '{print $1}') # oc adm node-logs $masters --path=kube-apiserver/audit.log --raw | zgrep -h '"verb":"update".*"resource":".*"packageserver"' 2>/dev/null | jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' | grep "cluster-version" | sort null CVO does not update packageserver constantly any more. Moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438