1793183 – Cluster admin should be alerted that hardware does not meet minimum requirements

Bug 1793183 - Cluster admin should be alerted that hardware does not meet minimum requirements

Summary: Cluster admin should be alerted that hardware does not meet minimum requirements

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Sam Batschelet
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-20 21:15 UTC by Lalatendu Mohanty
Modified:	2020-05-18 15:50 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-14 11:37:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Lalatendu Mohanty 2020-01-20 21:15:40 UTC

Description of problem:

We are seeing many failures in CI e.g. [1] for "error retrieving resource lock openshift-kube-scheduler/kube-scheduler: etcdserver: request timed out" specifically for shared VPC" + UPI tests. 

As per the discussion this is caused by the slow disk performance in Azure. But we need to communicate the issue to the cluster admin. Hence this bug

[1] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-shared-vpc-4.3/495/build-log.txt

Version-Release number of selected component (if applicable):


How reproducible:
Intermittent 


Expected results:


Additional info:

Comment 1 Lalatendu Mohanty 2020-01-20 21:16:55 UTC

Error:

leaderelection.go:330] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: etcdserver: request timed out\\nI0120 14:48:43.591672       1 leaderelection.go:287] failed to renew lease openshift-kube-scheduler/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded\\nF0120 14:48:43.591702       1 server.go:264] leaderelection lost\\n\"\nNodeControllerDegraded: All master node(s) are ready" to "StaticPodsDegraded: nodes/ci-op-i3stlhv5-fccef-gcdl8-master-0 pods/openshift-kube-scheduler-ci-op-i3stlhv5-fccef-gcdl8-master-0 container=\"scheduler\" is not ready\nNodeControllerDegraded: All master node(s) are ready"

Comment 2 Stephen Cuppett 2020-01-21 15:02:27 UTC

Setting target release to current development branch (4.4). Fixes, if any, requested/required on previous releases will result in cloned BZs targeting the z-stream releases where appropriate.

Comment 3 slowrie 2020-02-12 21:03:55 UTC

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-compact-4.3/113

Comment 5 Michal Fojtik 2020-05-12 10:45:08 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale" and decreasing the severity. 

If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 6 Michal Fojtik 2020-05-14 11:37:35 UTC

I think this deserves an Jira RFE and not only bugzilla as the "hw requirement alert" is pretty vague definition...

We already added some events based on etcd leader changes frequency that reports disk metrics (which usually means the disks are slow".

Please make a Jira RFE or Issue or Epic here, it might cross multiple teams.

Comment 7 Lalatendu Mohanty 2020-05-18 15:50:21 UTC

I have not seen this issue for sometime now, so I am ok closing it.

Note You need to log in before you can comment on or make changes to this bug.