Bug 2059845
Summary: | Test 'operators should not create watch channels very often' fails | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Itzik Brown <itbrown> |
Component: | Monitoring | Assignee: | Jan Chaloupka <jchaloup> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.10 | CC: | amuller, anpicker, aos-bugs, dgoodwin, erooth, jchaloup, spasquie |
Target Milestone: | --- | ||
Target Release: | 4.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-02 18:38:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2060406 | ||
Bug Blocks: |
Description
Itzik Brown
2022-03-02 07:52:50 UTC
I wonder if the increase in the WATCH requests is limited to DFG-osasinfra-shiftstack_ci-ocp_testing or if it can be observed in other jobs/topologies. Simon, has there been any recent changes in the monitoring operator which might introduce this increase? @Jan I don't recall anything on top of my head. Now looking at the upper bound value for OpenStack, it is significantly different from the other platforms: * OpenStack is 41.0 [1] * AWS is 124.0 [2] * Azure is 99.0 [3] Overall, I feel that the test is very brittle and it isn't clear to me who owns the test (in particular, when is it ok to bump the limits and who's responsible for it). [1] https://github.com/openshift/origin/blob/625733dd1ce7ebf40c3dd0abd693f7bb54f2d580/test/extended/apiserver/api_requests.go#L265 [2] https://github.com/openshift/origin/blob/625733dd1ce7ebf40c3dd0abd693f7bb54f2d580/test/extended/apiserver/api_requests.go#L119 [3] https://github.com/openshift/origin/blob/625733dd1ce7ebf40c3dd0abd693f7bb54f2d580/test/extended/apiserver/api_requests.go#L149 Everybody and nobody. The test is there to protect from exponential growths. Usually the rule of thumb was to bump it in case the increase is not huge or a new code adding more WATCH requests got merged. Given the reported job runs on OpenStack, we might as well just bump the upper bound. @jchaloup , shall I assign this bug as well to you since you are the assignee of https://bugzilla.redhat.com/show_bug.cgi?id=2060406 ? fix for openstack is fine, we also need to change for GCP/AWS single node, exmaple 1. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1517872400643919872 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] Run #0: Failed expand_less 4s { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=97, upperbound=96, ratio=1.0104166666666667", ] 2. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade/1517739083189719040 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] Run #0: Failed expand_less 4s { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:1, cap:1>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=101, upperbound=96, ratio=1.0520833333333333", ] code: https://github.com/openshift/origin/blob/release-4.10/test/extended/apiserver/api_requests.go#L178 need to change for AWS single node cluster 3. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/3109/pull-ci-openshift-machine-config-operator-release-4.10-e2e-aws-upgrade-single-node/1517598134392328192 : [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] 2s { fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected <[]string | len:2, cap:2>: [ "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=127, upperbound=96, ratio=1.3229166666666667", "Operator \"marketplace-operator\" produces more watch requests than expected: watchrequestcount=29, upperbound=28, ratio=1.0357142857142858", ] to be empty} code: https://github.com/openshift/origin/blob/release-4.10/test/extended/apiserver/api_requests.go#L299 I think we should do the same change to 4.10 as 4.11, see the PR in bug 2060406, no need to change for VSphere, no issue for 4.10 VSphere cluster now. Checking https://search.ci.openshift.org/?search=operators+should+not+create+watch+channels+very+often&maxAge=336h&context=1&type=junit&name=4.10.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job: AWS single node cluster: there are only 3 occurrences for the last 14 days (https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-machine-config-operator-release-4.10-e2e-aws-upgrade-single-node): - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=103, upperbound=96, ratio=1.0729166666666667 (1518692139616178176, Apr 26) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=127, upperbound=96, ratio=1.3229166666666667 (1517598134392328192, Apr 23) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=104, upperbound=96, ratio=1.0833333333333333 (1516326082691731456, Apr 20) GCP: only 4 occurrences for the last 14 days (https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-e2e-gcp-upgrade): - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=109, upperbound=96, ratio=1.1354166666666667 (1518660295168364544, Apr 25) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=97, upperbound=96, ratio=1.0104166666666667 (1517872400643919872, Apr 24) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=101, upperbound=96, ratio=1.0520833333333333 (1517739083189719040, Apr 23) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=100, upperbound=96, ratio=1.0416666666666667 (1514310871843606528, Apr 14) OpenStack (https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.10-e2e-openstack-ccm-install): - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=105, upperbound=96, ratio=1.09375 (Apr 25) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=87, upperbound=82, ratio=1.0609756097560976 (Apr 19) - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=90, upperbound=82, ratio=1.0975609756097562 (Apr 15) https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-single-node-serial: - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=114, upperbound=96, ratio=1.1875 - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=102, upperbound=96, ratio=1.0625 https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-openshift-release-master-okd-4.10-e2e-vsphere: - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=89, upperbound=84, ratio=1.0595238095238095 - Operator \"dns-operator\" produces more watch requests than expected: watchrequestcount=105, upperbound=98, ratio=1.0714285714285714 https://prow.ci.openshift.org/job-history/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.10-upgrade-from-stable-4.9-e2e-openstack-upgrade: - Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=100, upperbound=96, ratio=1.0416666666666667 =========================== AWS single node -> 104 GCP -> 110 The rest is too rare to fix. based on Comment 15 and 16, since the case would pass for most of platforms in 4.10, set to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.12 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1601 We've having to repeat this process again for cluster-monitoring-operator. See: https://github.com/openshift/origin/pull/27281 if you are interested. |