Bug 2026802 - periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade failing with "Operator cluster-monitoring-operator produces more watch requests than expected"
Summary: periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.z
Assignee: Sunil Thaha
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 2018222
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-25 22:04 UTC by Aravindh Puthiyaparambil
Modified: 2022-01-31 18:22 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
job=periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade=all
Last Closed: 2022-01-31 18:22:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26736 0 None open Bug 2026802: copy api-request upperbound for cluster-monitoring-operator 2022-01-11 07:25:18 UTC
Red Hat Product Errata RHBA-2022:0279 0 None None None 2022-01-31 18:22:46 UTC

Description Aravindh Puthiyaparambil 2021-11-25 22:04:54 UTC
periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.9-informing#periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade

Example job failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade/1463913643795025920
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-upgrade-from-stable-4.8-e2e-aws-upgrade/1463768020231917568

"[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]" is failing

Log snippet: 

 started: (3/2746/2751) "[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]"
started: (3/2747/2751) "[sig-arch][Late] clients should not use APIs that are removed in upcoming releases [Suite:openshift/conformance/parallel]"
started: (3/2748/2751) "[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]"
started: (3/2749/2751) "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"
started: (3/2750/2751) "[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"
started: (3/2751/2751) "[sig-storage][Late] Metrics should report short mount times [Skipped:Disconnected] [Suite:openshift/conformance/parallel]"
passed: (400ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] kube-apiserver terminates within graceful termination period [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and stop sending requests [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully [Suite:openshift/conformance/parallel]"
passed: (500ms) 2021-11-25T19:36:01 "[sig-etcd] etcd leader changes are not excessive [Late] [Suite:openshift/conformance/parallel]"
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1453
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:61
[BeforeEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:142
STEP: Creating a kubernetes client
[It] operators should not create watch channels very often [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/apiserver/api_requests.go:93
Nov 25 19:36:02.395: INFO: operator=ingress-operator, watchrequestcount=396, upperbound=742, ratio=0.5336927223719676
Nov 25 19:36:02.395: INFO: operator=authentication-operator, watchrequestcount=373, upperbound=616, ratio=0.6055194805194806
Nov 25 19:36:02.395: INFO: operator=kube-apiserver-operator, watchrequestcount=321, upperbound=520, ratio=0.6173076923076923
Nov 25 19:36:02.395: INFO: operator=openshift-apiserver-operator, watchrequestcount=310, upperbound=452, ratio=0.6858407079646017
Nov 25 19:36:02.395: INFO: operator=cluster-storage-operator, watchrequestcount=293, upperbound=310, ratio=0.9451612903225807
Nov 25 19:36:02.395: INFO: operator=openshift-kube-scheduler-operator, watchrequestcount=192, upperbound=358, ratio=0.5363128491620112
Nov 25 19:36:02.395: INFO: operator=kube-controller-manager-operator, watchrequestcount=191, upperbound=290, ratio=0.6586206896551724
Nov 25 19:36:02.395: INFO: operator=etcd-operator, watchrequestcount=182, upperbound=250, ratio=0.728
Nov 25 19:36:02.395: INFO: operator=openshift-controller-manager-operator, watchrequestcount=178, upperbound=298, ratio=0.5973154362416108
Nov 25 19:36:02.395: INFO: operator=prometheus-operator, watchrequestcount=147, upperbound=180, ratio=0.8166666666666667
Nov 25 19:36:02.395: INFO: operator=console-operator, watchrequestcount=139, upperbound=292, ratio=0.476027397260274
Nov 25 19:36:02.395: INFO: operator=aws-ebs-csi-driver-operator, watchrequestcount=100, upperbound=216, ratio=0.46296296296296297
Nov 25 19:36:02.395: INFO: operator=cluster-image-registry-operator, watchrequestcount=91, upperbound=238, ratio=0.38235294117647056
Nov 25 19:36:02.395: INFO: operator=service-ca-operator, watchrequestcount=88, upperbound=214, ratio=0.411214953271028
Nov 25 19:36:02.395: INFO: operator=cluster-monitoring-operator, watchrequestcount=72, upperbound=66, ratio=1.0909090909090908
Nov 25 19:36:02.395: INFO: Operator cluster-monitoring-operator produces more watch requests than expected
Nov 25 19:36:02.395: INFO: operator=openshift-config-operator, watchrequestcount=64, upperbound=94, ratio=0.6808510638297872
Nov 25 19:36:02.395: INFO: operator=machine-api-operator, watchrequestcount=64, upperbound=96, ratio=0.6666666666666666
Nov 25 19:36:02.395: INFO: operator=csi-snapshot-controller-operator, watchrequestcount=57, upperbound=104, ratio=0.5480769230769231
Nov 25 19:36:02.395: INFO: operator=cloud-credential-operator, watchrequestcount=54, upperbound=138, ratio=0.391304347826087
Nov 25 19:36:02.395: INFO: operator=dns-operator, watchrequestcount=54, upperbound=118, ratio=0.4576271186440678
Nov 25 19:36:02.395: INFO: operator=cluster-autoscaler-operator, watchrequestcount=35, upperbound=88, ratio=0.3977272727272727
Nov 25 19:36:02.395: INFO: operator=cluster-node-tuning-operator, watchrequestcount=32, upperbound=78, ratio=0.41025641025641024
Nov 25 19:36:02.395: INFO: operator=kube-storage-version-migrator-operator, watchrequestcount=31, upperbound=116, ratio=0.2672413793103448
Nov 25 19:36:02.395: INFO: operator=cluster-samples-operator, watchrequestcount=30, upperbound=46, ratio=0.6521739130434783
Nov 25 19:36:02.395: INFO: operator=cluster-baremetal-operator, watchrequestcount=30, upperbound=62, ratio=0.4838709677419355
Nov 25 19:36:02.395: INFO: operator=marketplace-operator, watchrequestcount=16, upperbound=30, ratio=0.5333333333333333
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:140
[AfterEach] [sig-arch][Late]
  github.com/openshift/origin/test/extended/util/client.go:141
fail [github.com/openshift/origin/test/extended/apiserver/api_requests.go:437]: Expected
    <bool>: true
not to be true
failed: (1.4s) 2021-11-25T19:36:02 "[sig-arch][Late] operators should not create watch channels very often [Suite:openshift/conformance/parallel]"

The key log entry to note is "Nov 25 19:36:02.395: INFO: Operator cluster-monitoring-operator produces more watch requests than expected"

Comment 1 Jayapriya Pai 2021-11-26 10:12:17 UTC
looks like its due to https://github.com/openshift/cluster-monitoring-operator/pull/1472 which added a few more watch requests.

watch limits were increased in these PRs on master, backporting to 4.9 should fix this
https://github.com/openshift/origin/pull/26583
https://github.com/openshift/origin/pull/26601

Comment 2 Jayapriya Pai 2021-11-26 10:14:19 UTC
Related bug https://bugzilla.redhat.com/show_bug.cgi?id=2018222

Comment 9 errata-xmlrpc 2022-01-31 18:22:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0279


Note You need to log in before you can comment on or make changes to this bug.