Bug 1946306 - Endurance cluster has notready and schedulingdisabled nodes after upgrade
Summary: Endurance cluster has notready and schedulingdisabled nodes after upgrade
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On: 1929463 1945739 1952224
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-05 18:41 UTC by Ryan Phillips
Modified: 2023-09-15 01:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1945739
Environment:
Last Closed: 2021-05-10 14:03:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 3 Jan Safranek 2021-04-16 13:02:44 UTC
I don't see anything obviously wrong with CSI in the cluster. What I noticed is that nodes in the cluster are drained while e2e tests run. This is quite dangerous, as e2e tests install CSI drivers as Pods, not as DaemonSet, and CSI driver pods may be evicted before pods that use the driver, leading to volumes that cannot be unmounted and pods that can be deleted.

Actually, all e2e tests I remember really don't expect that nodes are drained underneath them.

Comment 4 Jan Safranek 2021-04-16 13:06:51 UTC
Maybe I closed it too early... Is there any magic that would make a pod A ("application") drain before B (CSI driver)? We can add labels/annotation/priority class if we wanted.

Comment 5 Jan Safranek 2021-05-10 14:03:28 UTC
Checked with node team, we can't make Pods drain in a specific order. To sum it up: do not drain nodes when the tests are running!

Comment 6 Red Hat Bugzilla 2023-09-15 01:04:38 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.