Bug 2026802

Summary:	periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade failing with "Operator cluster-monitoring-operator produces more watch requests than expected"
Product:	OpenShift Container Platform	Reporter:	Aravindh Puthiyaparambil <aravindh>
Component:	Monitoring	Assignee:	Sunil Thaha <sthaha>
Status:	CLOSED ERRATA	QA Contact:	Junqi Zhao <juzhao>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.9	CC:	amuller, anpicker, aos-bugs, erooth, janantha, sippy, spasquie, wking
Target Milestone:	---
Target Release:	4.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:	job=periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade=all
Last Closed:	2022-01-31 18:22:29 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2018222
Bug Blocks:

Description Aravindh Puthiyaparambil 2021-11-25 22:04:54 UTC

periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade

Example job failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade/1463913643795025920
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade/1463768020231917568

"[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]" is failing

Log snippet: 

 started: (3/2746/2751) "[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]"
started: (3/2747/2751) "[sig-arch][Late] clients should not use APIs that are removed in upcoming releases [Suite:openshift/conformance/parallel]"
started: (3/2748/2751) "[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]"
started: (3/2749/2751) "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"
started: (3/2750/2751) "[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"
started: (3/2751/2751) "[sig-storage][Late] Metrics should report short mount times [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"
passed: (400ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] kube-apiserver terminates within graceful termination period [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and stop sending requests [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]"
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:61
[BeforeEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:142
STEP: Creating a kubernetes client
[It] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/apiserver/api_requests.go:93
Nov 25 19:36:02.395: INFO: operator=ingress-operator, watchrequestcount=396, upperbound=742, ratio=0.5336927223719676
Nov 25 19:36:02.395: INFO: operator=authentication-operator, watchrequestcount=373, upperbound=616, ratio=0.6055194805194806
Nov 25 19:36:02.395: INFO: operator=kube-apiserver-operator, watchrequestcount=321, upperbound=520, ratio=0.6173076923076923
Nov 25 19:36:02.395: INFO: operator=openshift-apiserver-operator, watchrequestcount=310, upperbound=452, ratio=0.6858407079646017
Nov 25 19:36:02.395: INFO: operator=cluster-storage-operator, watchrequestcount=293, upperbound=310, ratio=0.9451612903225807
Nov 25 19:36:02.395: INFO: operator=openshift-kube-scheduler-operator, watchrequestcount=192, upperbound=358, ratio=0.5363128491620112
Nov 25 19:36:02.395: INFO: operator=kube-controller-manager-operator, watchrequestcount=191, upperbound=290, ratio=0.6586206896551724
Nov 25 19:36:02.395: INFO: operator=etcd-operator, watchrequestcount=182, upperbound=250, ratio=0.728
Nov 25 19:36:02.395: INFO: operator=openshift-controller-manager-operator, watchrequestcount=178, upperbound=298, ratio=0.5973154362416108
Nov 25 19:36:02.395: INFO: operator=prometheus-operator, watchrequestcount=147, upperbound=180, ratio=0.8166666666666667
Nov 25 19:36:02.395: INFO: operator=console-operator, watchrequestcount=139, upperbound=292, ratio=0.476027397260274
Nov 25 19:36:02.395: INFO: operator=aws-ebs-csi-driver-operator, watchrequestcount=100, upperbound=216, ratio=0.46296296296296297
Nov 25 19:36:02.395: INFO: operator=cluster-image-registry-operator, watchrequestcount=91, upperbound=238, ratio=0.38235294117647056
Nov 25 19:36:02.395: INFO: operator=service-ca-operator, watchrequestcount=88, upperbound=214, ratio=0.411214953271028
Nov 25 19:36:02.395: INFO: operator=cluster-monitoring-operator, watchrequestcount=72, upperbound=66, ratio=1.0909090909090908
Nov 25 19:36:02.395: INFO: Operator cluster-monitoring-operator produces more watch requests than expected
Nov 25 19:36:02.395: INFO: operator=openshift-config-operator, watchrequestcount=64, upperbound=94, ratio=0.6808510638297872
Nov 25 19:36:02.395: INFO: operator=machine-api-operator, watchrequestcount=64, upperbound=96, ratio=0.6666666666666666
Nov 25 19:36:02.395: INFO: operator=csi-snapshot-controller-operator, watchrequestcount=57, upperbound=104, ratio=0.5480769230769231
Nov 25 19:36:02.395: INFO: operator=cloud-credential-operator, watchrequestcount=54, upperbound=138, ratio=0.391304347826087
Nov 25 19:36:02.395: INFO: operator=dns-operator, watchrequestcount=54, upperbound=118, ratio=0.4576271186440678
Nov 25 19:36:02.395: INFO: operator=cluster-autoscaler-operator, watchrequestcount=35, upperbound=88, ratio=0.3977272727272727
Nov 25 19:36:02.395: INFO: operator=cluster-node-tuning-operator, watchrequestcount=32, upperbound=78, ratio=0.41025641025641024
Nov 25 19:36:02.395: INFO: operator=kube-storage-version-migrator-operator, watchrequestcount=31, upperbound=116, ratio=0.2672413793103448
Nov 25 19:36:02.395: INFO: operator=cluster-samples-operator, watchrequestcount=30, upperbound=46, ratio=0.6521739130434783
Nov 25 19:36:02.395: INFO: operator=cluster-baremetal-operator, watchrequestcount=30, upperbound=62, ratio=0.4838709677419355
Nov 25 19:36:02.395: INFO: operator=marketplace-operator, watchrequestcount=16, upperbound=30, ratio=0.5333333333333333
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:140
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:141
fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:437]: Expected
    <bool>: true
not to be true
failed: (1.4s) 2021-11-25T19:36:02 "[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]"

The key log entry to note is "Nov 25 19:36:02.395: INFO: Operator cluster-monitoring-operator produces more watch requests than expected"

Comment 1 Jayapriya Pai 2021-11-26 10:12:17 UTC

looks like its due to https://github.com/openshift/cluster-monitoring-operator/pull/1472 which added a few more watch requests.

watch limits were increased in these PRs on master, backporting to 4.9 should fix this
https://github.com/openshift/origin/pull/26583
https://github.com/openshift/origin/pull/26601

Comment 2 Jayapriya Pai 2021-11-26 10:14:19 UTC

Related bug https://bugzilla.redhat.com/show_bug.cgi?id=2018222

Comment 9 errata-xmlrpc 2022-01-31 18:22:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0279