Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1989487

Summary:	CI jobs are failing by firing etcdHighNumberOfFailedGRPCRequests alert
Product:	OpenShift Container Platform	Reporter:	Arda Guclu <aguclu>
Component:	Etcd	Assignee:	Wally <wlewis>
Status:	CLOSED DUPLICATE	QA Contact:	ge liu <geliu>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	wking
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	tag-ci
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-08-03 15:47:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Arda Guclu 2021-08-03 10:40:56 UTC

Description of problem:
CI jobs in different variants are failing due to the test Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured.

This test is failing because of the etcdHighNumberOfFailedGRPCRequests alert is fired.

example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-serial-ipv4/1422424202547302400

Version-Release number of selected component (if applicable):
4.9

How reproducible:


Steps to Reproduce:
1. Run e2e-metal-ipi job
2.
3.

Actual results:
Test is failing.

Expected results:
Test passes.

Additional info:

https://search.ci.openshift.org/?search=etcdHighNumberOfFailedGRPCRequests&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Michal Fojtik 2021-08-03 10:47:08 UTC

** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 2 W. Trevor King 2021-08-03 15:47:50 UTC

I've moved bug 1701154 back to get a revert in to unblock nightlies while folks figure out what went wrong.  So we can use that bug to track "fixed without breaking things", and don't need this separate one.

*** This bug has been marked as a duplicate of bug 1701154 ***