Bug 2060406 - Test 'operators should not create watch channels very often' fails
Summary: Test 'operators should not create watch channels very often' fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.11
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.11.0
Assignee: Jan Chaloupka
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 2059845
TreeView+ depends on / blocked
 
Reported: 2022-03-03 13:19 UTC by Jan Chaloupka
Modified: 2022-08-10 10:52 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 10:52:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial CMO (27.90 KB, image/png)
2022-04-01 10:51 UTC, Jan Chaloupka
no flags Details
periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance CMO (24.14 KB, image/png)
2022-04-01 10:53 UTC, Jan Chaloupka
no flags Details
GCP CMO api WATCH request counts (22.72 KB, image/png)
2022-04-11 16:00 UTC, Jan Chaloupka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26877 0 None Merged bug 2060406: operators should not create watch channels very often: bump OpenStack monitoring operator 2022-03-21 07:43:13 UTC
Github openshift origin pull 26962 0 None open bug 2060406: test: bump apirequests upper bounds for VSphere and AWS single node 2022-04-01 11:58:50 UTC
Github openshift origin pull 27002 0 None open bug 2060406: bump apirequests upper bounds for GCP 2022-04-11 15:58:44 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:52:41 UTC

Description Jan Chaloupka 2022-03-03 13:19:21 UTC
This bug was initially created as a copy of Bug #2059845

I am copying this bug because: 



Description of problem:
We have the following conformance test fails for us:
"[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]"


The error:
"Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=93, upperbound=82, ratio=1.1341463414634145",


Log:
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1489
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:61
[BeforeEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:140
STEP: Creating a kubernetes client
[It] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/apiserver/api_requests.go:93
Feb 25 11:08:59.756: INFO: operator=ingress-operator, watchrequestcount=478, upperbound=626, ratio=0.7635782747603834
Feb 25 11:08:59.756: INFO: operator=authentication-operator, watchrequestcount=437, upperbound=618, ratio=0.7071197411003236
Feb 25 11:08:59.756: INFO: operator=kube-apiserver-operator, watchrequestcount=372, upperbound=456, ratio=0.8157894736842105
Feb 25 11:08:59.756: INFO: operator=openshift-apiserver-operator, watchrequestcount=331, upperbound=496, ratio=0.6673387096774194
Feb 25 11:08:59.756: INFO: operator=cluster-storage-operator, watchrequestcount=280, upperbound=378, ratio=0.7407407407407407
Feb 25 11:08:59.756: INFO: operator=openshift-controller-manager-operator, watchrequestcount=257, upperbound=340, ratio=0.7558823529411764
Feb 25 11:08:59.756: INFO: operator=kube-controller-manager-operator, watchrequestcount=233, upperbound=320, ratio=0.728125
Feb 25 11:08:59.756: INFO: operator=openshift-kube-scheduler-operator, watchrequestcount=220, upperbound=288, ratio=0.7638888888888888
Feb 25 11:08:59.756: INFO: operator=etcd-operator, watchrequestcount=203, upperbound=286, ratio=0.7097902097902098
Feb 25 11:08:59.756: INFO: operator=console-operator, watchrequestcount=185, upperbound=218, ratio=0.8486238532110092
Feb 25 11:08:59.756: INFO: Operator manila-csi-driver-operator not found in upper bounds for OpenStack
Feb 25 11:08:59.756: INFO: operator=system:serviceaccount:openshift-cluster-csi-drivers:manila-csi-driver-operator, watchrequestcount=180
Feb 25 11:08:59.756: INFO: operator=cluster-image-registry-operator, watchrequestcount=155, upperbound=224, ratio=0.6919642857142857
Feb 25 11:08:59.756: INFO: operator=prometheus-operator, watchrequestcount=152, upperbound=236, ratio=0.6440677966101694
Feb 25 11:08:59.756: INFO: operator=service-ca-operator, watchrequestcount=151, upperbound=200, ratio=0.755
Feb 25 11:08:59.756: INFO: Operator openstack-cinder-csi-driver-operator not found in upper bounds for OpenStack
Feb 25 11:08:59.756: INFO: operator=system:serviceaccount:openshift-cluster-csi-drivers:openstack-cinder-csi-driver-operator, watchrequestcount=149
Feb 25 11:08:59.756: INFO: operator=cluster-monitoring-operator, watchrequestcount=93, upperbound=82, ratio=1.1341463414634145
Feb 25 11:08:59.756: INFO: Operator "cluster-monitoring-operator" produces more watch requests than expected
Feb 25 11:08:59.756: INFO: operator=csi-snapshot-controller-operator, watchrequestcount=83, upperbound=120, ratio=0.6916666666666667
Feb 25 11:08:59.756: INFO: operator=openshift-config-operator, watchrequestcount=70, upperbound=96, ratio=0.7291666666666666
Feb 25 11:08:59.756: INFO: operator=machine-api-operator, watchrequestcount=64, upperbound=96, ratio=0.6666666666666666
Feb 25 11:08:59.756: INFO: operator=dns-operator, watchrequestcount=63, upperbound=98, ratio=0.6428571428571429
Feb 25 11:08:59.756: INFO: operator=cluster-autoscaler-operator, watchrequestcount=58, upperbound=106, ratio=0.5471698113207547
Feb 25 11:08:59.756: INFO: operator=cluster-node-tuning-operator, watchrequestcount=56, upperbound=88, ratio=0.6363636363636364
Feb 25 11:08:59.756: INFO: operator=cloud-credential-operator, watchrequestcount=55, upperbound=80, ratio=0.6875
Feb 25 11:08:59.756: INFO: operator=kube-storage-version-migrator-operator, watchrequestcount=53, upperbound=76, ratio=0.6973684210526315
Feb 25 11:08:59.756: INFO: operator=cluster-baremetal-operator, watchrequestcount=40, upperbound=84, ratio=0.47619047619047616
Feb 25 11:08:59.756: INFO: operator=cluster-samples-operator, watchrequestcount=25, upperbound=52, ratio=0.4807692307692308
Feb 25 11:08:59.756: INFO: operator=marketplace-operator, watchrequestcount=24, upperbound=38, ratio=0.631578947368421
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:138
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:139
fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=93, upperbound=82, ratio=1.1341463414634145",
    ]
to be empty

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-02-24-141514
OpenshiftSDN

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Junqi Zhao 2022-03-09 04:44:16 UTC
searched with
https://search.ci.openshift.org/?search=operators+should+not+create+watch+channels+very+often&maxAge=48h&context=1&type=bug%2Bjunit&name=4.11.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

still have issue with 
1. VSphere Platform
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial/1501277588121718784
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] expand_less 	1s
fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=87, upperbound=84, ratio=1.0357142857142858",

code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L235

2. single node AWS cluster
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance/1501289618778099712
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less 	2s
fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=115, upperbound=96, ratio=1.1979166666666667",
    ]

code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L299

Comment 4 Jan Chaloupka 2022-04-01 10:51:58 UTC
Created attachment 1869931 [details]
periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial CMO

Comment 5 Jan Chaloupka 2022-04-01 10:53:41 UTC
Created attachment 1869932 [details]
periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance CMO

Comment 6 Jan Chaloupka 2022-04-01 11:56:46 UTC
# periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance sequence

69 68 63 61 62 64 78 60 69 53 66 52 73 64 66 57 61 59 64 68 65 67 52 58 60 68 57 64 57 57 71 65 86

Giving 64 as the median

# periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-techpreview-serial sequence

101 111 94 86 101 90 109 77 115 138 105 91 102 104 93 86 90 78 65 92 88 110 110 84 100 113 89 100 99 95 99 87 107 86 89 71 108 90 124 95 62 93 96 90 94 90 87 88 80 96 85 88 105 109 107 88 85 91 130 98 102 81 93 94

There are some spikes but mostly holding around 93/94 based on the median.

Comment 10 Junqi Zhao 2022-04-08 08:33:53 UTC
searched with
https://search.ci.openshift.org/?search=operators+should+not+create+watch+channels+very+often&maxAge=48h&context=1&type=bug%2Bjunit&name=4.11.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

we also need to update the upper bound for GCP platform
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-gcp-ovn-upgrade/1512265217537478656
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:449]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=97, upperbound=96, ratio=1.0104166666666667",
    ]

code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L179

Comment 11 Jan Chaloupka 2022-04-11 16:00:30 UTC
Created attachment 1871841 [details]
GCP CMO api WATCH request counts

Comment 14 Junqi Zhao 2022-04-14 03:18:25 UTC
we should also bump the apirequests upper bounds for aws single node, most of the time 64 is fine, sometimes would be failed, example:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance/1513613641847410688
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
Run #0: Failed expand_less 	2s
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:449]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=131, upperbound=128, ratio=1.0234375",
    ]
code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L300

same for openstack
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505593005082415104
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=107, upperbound=96, ratio=1.1145833333333333",
    ]
code: https://github.com/openshift/origin/blob/master/test/extended/apiserver/api_requests.go#L266

GCP/VSphere are fine, no issues now

Comment 16 Jan Chaloupka 2022-04-14 09:00:22 UTC
This could go on forever. Let's keep it as it is change the upper bounds in another iteration.

Comment 17 Jan Chaloupka 2022-04-14 09:02:40 UTC
Having flakes from time to time is acceptable. In some case the sudden increases are expected when many tests fail. Retrying and waiting can lead to higher number of watch requests due to tests/operators running longer. As long as the flakes does not appear often/regularly, it is acceptable to ignore occasional blips.

Comment 18 Junqi Zhao 2022-04-14 10:48:40 UTC
thanks, based on Comment 14 - 17, since most of time there is not issue with the upper bound, and checked for aws SNO run:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-openshift-release-master-nightly-4.11-openshift-e2e-aws-single-node-workers-upgrade-conformance

for case "openshift-tests.[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]", 
1 in 23 case failures is related to cluster-monitoring-operator

for openstack ccm, the failure ratio is 6/9, other configurations on openstack does not have such high failed ratio.
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.11-informing#periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install
6 out of 9 failures are related to cluster-monitoring-operator
1. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1504505512811565056
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=104, upperbound=96, ratio=1.0833333333333333",
    ]

2. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505230613387415552
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=98, upperbound=96, ratio=1.0208333333333333",
    ]


3. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1505593005082415104
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=107, upperbound=96, ratio=1.1145833333333333",
    ]

4. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1508130335605395456
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=102, upperbound=96, ratio=1.0625",
    ]

5. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1510305447666520064
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=101, upperbound=96, ratio=1.0520833333333333",
    ]


6. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.11-e2e-openstack-ccm-install/1511392966680252416
: [sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
{  fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:448]: Expected
    <[]string | len:1, cap:1>: [
        "Operator \"cluster-monitoring-operator\" produces more watch requests than expected: watchrequestcount=98, upperbound=96, ratio=1.0208333333333333",
    ]

Comment 23 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.