Description of problem: When a CSV is created and it is copied across target namespaces, it is possible that the lastupdatetime timestamp on the copied version doesn't match the original CSV. This triggers a runaway sync where the copied CSV never converges to match the original Version-Release number of selected component (if applicable): 4.7 How reproducible: Always Steps to Reproduce: 1. Create a large number of namespaces 2. Install an AllNamespace operator Actual results: Copied CSVs are constantly reconciled and never settle. Expected results: A small spike to copy CSVs followed by no further changes. Additional info: This is triggered only when the CSV copy takes place at a different time than the original csv was last updated. There is a higher likelihood this can happen if there are lots of namespaces on the cluster.
*** Bug 1905624 has been marked as a duplicate of this bug. ***
Cluster version is 4.7.0-0.nightly-2020-12-09-112139 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-5bff7985dc-bc764 -- olm --version OLM version: 0.17.0 git commit: 2294bcc907c834c160c5b99fbf15988d0706853c LGTM verify it. 1, subscribe to an operator for the cluster scope. Such as, etcd. [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-operators etcd etcd community-operators clusterwide-alpha [root@preserve-olm-env data]# oc get csv -n openshift-operators NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4-clusterwide etcd 0.9.4-clusterwide etcdoperator.v0.9.2-clusterwide Succeeded 2, Create many namespaces. 3, check the lastUpdateTime of the copied csv if is the same as the origin csv. [root@preserve-olm-env data]# oc get csv -n jian4 etcdoperator.v0.9.4-clusterwide -o yaml ... - lastTransitionTime: "2020-12-09T08:59:52Z" lastUpdateTime: "2020-12-09T08:59:52Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded [root@preserve-olm-env data]# oc get csv -n openshift-operators etcdoperator.v0.9.4-clusterwide -o yaml ... - lastTransitionTime: "2020-12-09T08:59:52Z" lastUpdateTime: "2020-12-09T08:59:52Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded
Hi ecordell, I've tested the CSV update frequency issue and it still seems to be present in OCP 4.6.8 See the OCP 4.6.8 Cluster settings page attached https://bugzilla.redhat.com/attachment.cgi?id=1738461 Over 5 minutes there were 6549 PUT operations on etcd and 3899 of those were to CSVs sh-4.4# cat etcd_watch.log | grep "Key" | wc -l 6549 sh-4.4# cat etcd_watch.log | grep "Key" | grep "clusterserviceversions" | wc -l 3899 In one namespace I can see the lastUpdateTime of the CSV also still incrementing so suspect fix is not in place [ruairi@localhost ibm-apicatalog]$ oc get csv ibm-apiconnect.v2.1.0 -o yaml | grep lastUpdateTime lastUpdateTime: "2020-12-11T16:08:55Z" [ruairi@localhost ibm-apicatalog]$ oc get csv ibm-apiconnect.v2.1.0 -o yaml | grep lastUpdateTime lastUpdateTime: "2020-12-11T16:09:15Z" [ruairi@localhost ibm-apicatalog]$ oc get csv ibm-apiconnect.v2.1.0 -o yaml | grep lastUpdateTime lastUpdateTime: "2020-12-11T16:10:13Z" [ruairi@localhost ibm-apicatalog]$ oc get csv ibm-apiconnect.v2.1.0 -o yaml | grep lastUpdateTime lastUpdateTime: "2020-12-11T16:10:52Z" [ruairi@localhost ibm-apicatalog]$ oc get csv ibm-apiconnect.v2.1.0 -o yaml | grep lastUpdateTime lastUpdateTime: "2020-12-11T16:11:28Z" You can see the memory rising as well from the attached memory metrics graph https://bugzilla.redhat.com/attachment.cgi?id=1738462 Can you confirm that the fix didn't make 4.6.8 and that it should be available in the next release?
Hi Ruairi, the 4.6 backport only merged a couple hours ago due to a test infrastructure problem. The progress of that backport is tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1906416. At the moment, it's awaiting QE verification. Since the backport is also marked as urgent, it should be verified soon and I'd expect it to be present for the following z-release.
Hi bluddy, Do you have a timeline on when the 4.6.9 release is scheduled which will have this fix in it? Thanks, Ruairi
4.6.9 is now released and has this hotfix in the payload -- ready for testing
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633