Bug 2072219

Summary: The alert "etcdGRPCRequestsSlow" fired during upgrade
Product: OpenShift Container Platform Reporter: Hongkai Liu <hongkliu>
Component: EtcdAssignee: Emily Moss <emoss>
Status: CLOSED DEFERRED QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: emoss, kgarriso, tjungblu, wking
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-12 09:29:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hongkai Liu 2022-04-05 19:44:53 UTC
Description of problem:
The alert was fired on build02 during upgrade from 4.10.6 to 4.10.8
https://coreos.slack.com/archives/CHY2E1BL4/p1649150672833739

Everything went back to normal shortly after.

I found nothing outstanding following the runbook of the alert.
https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdGRPCRequestsSlow.md

My questions are:
1. Are slow etcd requests expected to happen during upgrade?
In any case, this is the must-gather.
https://coreos.slack.com/archives/CHY2E1BL4/p1649168331222259?thread_ts=1649150672.833739&cid=CHY2E1BL4


2. The condition of the alert has never last over 10m. Yet it was fired. Why?
https://coreos.slack.com/archives/CHY2E1BL4/p1649184857585639?thread_ts=1649150672.833739&cid=CHY2E1BL4



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Thomas Jungblut 2022-05-11 06:47:22 UTC
it went into 4.11 yesterday, @emoss shall we backport this to 4.10?