This bug was initially created as a copy of Bug #2059845 I am copying this bug because: Description of problem: We have the following conformance test fails for us: "[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]" The error: "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=93, upperbound=82, ratio=1.1341463414634145", Log: [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1489 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1489 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:61 [BeforeEach] [sig-arch][Late] github.com/openshift/origin/test/extended/util/client.go:140 STEP: Creating a kubernetes client [It] operators should not create watch channels very often [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/apiserver/api_requests.go:93 Feb 25 11:08:59.756: INFO: operator=ingress-operator, watchrequestcount=478, upperbound=626, ratio=0.7635782747603834 Feb 25 11:08:59.756: INFO: operator=authentication-operator, watchrequestcount=437, upperbound=618, ratio=0.7071197411003236 Feb 25 11:08:59.756: INFO: operator=kube-apiserver-operator, watchrequestcount=372, upperbound=456, ratio=0.8157894736842105 Feb 25 11:08:59.756: INFO: operator=openshift-apiserver-operator, watchrequestcount=331, upperbound=496, ratio=0.6673387096774194 Feb 25 11:08:59.756: INFO: operator=cluster-storage-operator, watchrequestcount=280, upperbound=378, ratio=0.7407407407407407 Feb 25 11:08:59.756: INFO: operator=openshift-controller-manager-operator, watchrequestcount=257, upperbound=340, ratio=0.7558823529411764 Feb 25 11:08:59.756: INFO: operator=kube-controller-manager-operator, watchrequestcount=233, upperbound=320, ratio=0.728125 Feb 25 11:08:59.756: INFO: operator=openshift-kube-scheduler-operator, watchrequestcount=220, upperbound=288, ratio=0.7638888888888888 Feb 25 11:08:59.756: INFO: operator=etcd-operator, watchrequestcount=203, upperbound=286, ratio=0.7097902097902098 Feb 25 11:08:59.756: INFO: operator=console-operator, watchrequestcount=185, upperbound=218, ratio=0.8486238532110092 Feb 25 11:08:59.756: INFO: Operator manila-csi-driver-operator not found in upper bounds for OpenStack Feb 25 11:08:59.756: INFO: operator=system:serviceaccount:openshift-cluster-csi-drivers:manila-csi-driver-operator, watchrequestcount=180 Feb 25 11:08:59.756: INFO: operator=cluster-image-registry-operator, watchrequestcount=155, upperbound=224, ratio=0.6919642857142857 Feb 25 11:08:59.756: INFO: operator=prometheus-operator, watchrequestcount=152, upperbound=236, ratio=0.6440677966101694 Feb 25 11:08:59.756: INFO: operator=service-ca-operator, watchrequestcount=151, upperbound=200, ratio=0.755 Feb 25 11:08:59.756: INFO: Operator openstack-cinder-csi-driver-operator not found in upper bounds for OpenStack Feb 25 11:08:59.756: INFO: operator=system:serviceaccount:openshift-cluster-csi-drivers:openstack-cinder-csi-driver-operator, watchrequestcount=149 Feb 25 11:08:59.756: INFO: operator=cluster-monitoring-operator, watchrequestcount=93, upperbound=82, ratio=1.1341463414634145 Feb 25 11:08:59.756: INFO: Operator "cluster-monitoring-operator" produces more watch requests than expected Feb 25 11:08:59.756: INFO: operator=csi-snapshot-controller-operator, watchrequestcount=83, upperbound=120, ratio=0.6916666666666667 Feb 25 11:08:59.756: INFO: operator=openshift-config-operator, watchrequestcount=70, upperbound=96, ratio=0.7291666666666666 Feb 25 11:08:59.756: INFO: operator=machine-api-operator, watchrequestcount=64, upperbound=96, ratio=0.6666666666666666 Feb 25 11:08:59.756: INFO: operator=dns-operator, watchrequestcount=63, upperbound=98, ratio=0.6428571428571429 Feb 25 11:08:59.756: INFO: operator=cluster-autoscaler-operator, watchrequestcount=58, upperbound=106, ratio=0.5471698113207547 Feb 25 11:08:59.756: INFO: operator=cluster-node-tuning-operator, watchrequestcount=56, upperbound=88, ratio=0.6363636363636364 Feb 25 11:08:59.756: INFO: operator=cloud-credential-operator, watchrequestcount=55, upperbound=80, ratio=0.6875 Feb 25 11:08:59.756: INFO: operator=kube-storage-version-migrator-operator, watchrequestcount=53, upperbound=76, ratio=0.6973684210526315 Feb 25 11:08:59.756: INFO: operator=cluster-baremetal-operator, watchrequestcount=40, upperbound=84, ratio=0.47619047619047616 Feb 25 11:08:59.756: INFO: operator=cluster-samples-operator, watchrequestcount=25, upperbound=52, ratio=0.4807692307692308 Feb 25 11:08:59.756: INFO: operator=marketplace-operator, watchrequestcount=24, upperbound=38, ratio=0.631578947368421 [AfterEach] [sig-arch][Late] github.com/openshift/origin/test/extended/util/client.go:138 [AfterEach] [sig-arch][Late] github.com/openshift/origin/test/extended/util/client.go:139 fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=93, upperbound=82, ratio=1.1341463414634145", ] to be empty Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2022-02-24-141514 OpenshiftSDN How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
searched with https://search.ci.openshift.org/?search=operators+should+not+create+watch+channels+very+often&maxAge=48h&context=1&type=bug%2Bjunit&name=4.11.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job still have issue with 1. VSphere Platform https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial/1501277588121718784 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] expand_less 1s fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=87, upperbound=84, ratio=1.0357142857142858", code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L235 2. single node AWS cluster https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance/1501289618778099712 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] expand_less Run #0: Failed expand_less 2s fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=115, upperbound=96, ratio=1.1979166666666667", ] code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L299
Created attachment 1869931 [details] periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial CMO
Created attachment 1869932 [details] periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance CMO
# periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance sequence 69 68 63 61 62 64 78 60 69 53 66 52 73 64 66 57 61 59 64 68 65 67 52 58 60 68 57 64 57 57 71 65 86 Giving 64 as the median # periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial sequence 101 111 94 86 101 90 109 77 115 138 105 91 102 104 93 86 90 78 65 92 88 110 110 84 100 113 89 100 99 95 99 87 107 86 89 71 108 90 124 95 62 93 96 90 94 90 87 88 80 96 85 88 105 109 107 88 85 91 130 98 102 81 93 94 There are some spikes but mostly holding around 93/94 based on the median.
searched with https://search.ci.openshift.org/?search=operators+should+not+create+watch+channels+very+often&maxAge=48h&context=1&type=bug%2Bjunit&name=4.11.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job we also need to update the upper bound for GCP platform https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-upgrade/1512265217537478656 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:449]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=97, upperbound=96, ratio=1.0104166666666667", ] code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L179
Created attachment 1871841 [details] GCP CMO api WATCH request counts
we should also bump the apirequests upper bounds for aws single node, most of the time 64 is fine, sometimes would be failed, example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance/1513613641847410688 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] Run #0: Failed expand_less 2s { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:449]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=131, upperbound=128, ratio=1.0234375", ] code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L300 same for openstack https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505593005082415104 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=107, upperbound=96, ratio=1.1145833333333333", ] code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L266 GCP/VSphere are fine, no issues now
This could go on forever. Let's keep it as it is change the upper bounds in another iteration.
Having flakes from time to time is acceptable. In some case the sudden increases are expected when many tests fail. Retrying and waiting can lead to higher number of watch requests due to tests/operators running longer. As long as the flakes does not appear often/regularly, it is acceptable to ignore occasional blips.
thanks, based on Comment 14 - 17, since most of time there is not issue with the upper bound, and checked for aws SNO run: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance for case "openshift-tests.[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]", 1 in 23 case failures is related to cluster-monitoring-operator for openstack ccm, the failure ratio is 6/9, other configurations on openstack does not have such high failed ratio. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install 6 out of 9 failures are related to cluster-monitoring-operator 1. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1504505512811565056 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=104, upperbound=96, ratio=1.0833333333333333", ] 2. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505230613387415552 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=98, upperbound=96, ratio=1.0208333333333333", ] 3. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505593005082415104 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=107, upperbound=96, ratio=1.1145833333333333", ] 4. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1508130335605395456 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=102, upperbound=96, ratio=1.0625", ] 5. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1510305447666520064 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=101, upperbound=96, ratio=1.0520833333333333", ] 6. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1511392966680252416 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=98, upperbound=96, ratio=1.0208333333333333", ]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069