Bug 1989487 - CI jobs are failing by firing etcdHighNumberOfFailedGRPCRequests alert
Summary: CI jobs are failing by firing etcdHighNumberOfFailedGRPCRequests alert
Keywords:
Status: CLOSED DUPLICATE of bug 1701154
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Wally
QA Contact: ge liu
URL:
Whiteboard: tag-ci
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-03 10:40 UTC by Arda Guclu
Modified: 2021-08-03 15:47 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-03 15:47:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Arda Guclu 2021-08-03 10:40:56 UTC
Description of problem:
CI jobs in different variants are failing due to the test Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured.

This test is failing because of the etcdHighNumberOfFailedGRPCRequests alert is fired.

example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi-serial-ipv4/1422424202547302400

Version-Release number of selected component (if applicable):
4.9

How reproducible:


Steps to Reproduce:
1. Run e2e-metal-ipi job
2.
3.

Actual results:
Test is failing.

Expected results:
Test passes.

Additional info:

https://search.ci.openshift.org/?search=etcdHighNumberOfFailedGRPCRequests&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Michal Fojtik 2021-08-03 10:47:08 UTC
** A NOTE ABOUT USING URGENT **

This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold.
Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility.

NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity.

Comment 2 W. Trevor King 2021-08-03 15:47:50 UTC
I've moved bug 1701154 back to get a revert in to unblock nightlies while folks figure out what went wrong.  So we can use that bug to track "fixed without breaking things", and don't need this separate one.

*** This bug has been marked as a duplicate of bug 1701154 ***


Note You need to log in before you can comment on or make changes to this bug.