Bug 2232582

Summary: CSI pods and customer workloads both have 'priority=0' and race for resources
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Florian Bergmann <fbergman>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED WONTFIX QA Contact: Neha Berry <nberry>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.11CC: apahim, ndevos, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-07-11 10:23:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Florian Bergmann 2023-08-17 13:21:15 UTC
Description of problem:

CSI pods have priority 0 instead of using openshift priorityclasses (https://docs.openshift.com/container-platform/4.13/nodes/pods/nodes-pods-priority.html).  Customer workloads also have priority 0 by default, which leads to a race for scheduling and resources.

Version-Release number of selected component (if applicable):
Not sure

How reproducible:
This was triggered via a DR testing scenario. It requires workloads that exhaust the resources of a node, so the CSI pods that get scheduled will no longer be place-able on a node.

Steps to Reproduce:
1. Apply CSI pods and customer workloads at the same time
2. Repeat until race is hit where CSI pods cannot be scheduled.

Actual results:
CSI pods can't be scheduled when customer workloads get there first

Expected results:
CSI pods are scheduled before customer workloads

Additional info:

Comment 1 Niels de Vos 2023-11-21 16:20:26 UTC
To prevent any confusion, on a non-managed ODF-4.14 deployment priority-classes are already used:

$ oc -n openshift-storage get daemonset/csi-rbdplugin -o yaml | grep priority
      priorityClassName: system-node-critical

$ oc -n openshift-storage get deployment/csi-rbdplugin-provisioner -o yaml | grep priority
      priorityClassName: system-cluster-critical


This BZ is really for odf-managed-service only.

Comment 2 Niels de Vos 2023-11-21 16:24:21 UTC
ODF-4.14 seems to be the 1st version that uses priority-classes, see bug #2232464

Comment 3 Ohad 2024-07-11 10:23:21 UTC
The ODF Managed service product got canceled and is now considered obsolete