Bug 2072219

Summary:	The alert "etcdGRPCRequestsSlow" fired during upgrade
Product:	OpenShift Container Platform	Reporter:	Hongkai Liu <hongkliu>
Component:	Etcd	Assignee:	Emily Moss <emoss>
Status:	CLOSED DEFERRED	QA Contact:	ge liu <geliu>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.10	CC:	emoss, kgarriso, tjungblu, wking
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-12 09:29:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hongkai Liu 2022-04-05 19:44:53 UTC

Description of problem:
The alert was fired on build02 during upgrade from 4.10.6 to 4.10.8
https://coreos.slack.com/archives/CHY2E1BL4/p1649150672833739

Everything went back to normal shortly after.

I found nothing outstanding following the runbook of the alert.
https://github.com/openshift/runbooks/blob/master/alerts/cluster-etcd-operator/etcdGRPCRequestsSlow.md

My questions are:
1. Are slow etcd requests expected to happen during upgrade?
In any case, this is the must-gather.
https://coreos.slack.com/archives/CHY2E1BL4/p1649168331222259?thread_ts=1649150672.833739&cid=CHY2E1BL4


2. The condition of the alert has never last over 10m. Yet it was fired. Why?
https://coreos.slack.com/archives/CHY2E1BL4/p1649184857585639?thread_ts=1649150672.833739&cid=CHY2E1BL4



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Thomas Jungblut 2022-05-11 06:47:22 UTC

it went into 4.11 yesterday, @emoss shall we backport this to 4.10?