1946306 – Endurance cluster has notready and schedulingdisabled nodes after upgrade

Bug 1946306 - Endurance cluster has notready and schedulingdisabled nodes after upgrade

Summary: Endurance cluster has notready and schedulingdisabled nodes after upgrade

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Jan Safranek
QA Contact:	Qin Ping
Docs Contact:
URL:
Whiteboard:
Depends On:	1929463 1945739 1952224
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-05 18:41 UTC by Ryan Phillips
Modified:	2023-09-15 01:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1945739
Environment:
Last Closed:	2021-05-10 14:03:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 3 Jan Safranek 2021-04-16 13:02:44 UTC

I don't see anything obviously wrong with CSI in the cluster. What I noticed is that nodes in the cluster are drained while e2e tests run. This is quite dangerous, as e2e tests install CSI drivers as Pods, not as DaemonSet, and CSI driver pods may be evicted before pods that use the driver, leading to volumes that cannot be unmounted and pods that can be deleted.

Actually, all e2e tests I remember really don't expect that nodes are drained underneath them.

Comment 4 Jan Safranek 2021-04-16 13:06:51 UTC

Maybe I closed it too early... Is there any magic that would make a pod A ("application") drain before B (CSI driver)? We can add labels/annotation/priority class if we wanted.

Comment 5 Jan Safranek 2021-05-10 14:03:28 UTC

Checked with node team, we can't make Pods drain in a specific order. To sum it up: do not drain nodes when the tests are running!

Comment 6 Red Hat Bugzilla 2023-09-15 01:04:38 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.