Bug 2124213

Summary: [RFE] The TopoLVM scheduler should put Pods on Nodes where the PV is already created
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Chris Blum <cblum>
Component: topolvmAssignee: N Balachandran <nibalach>
Status: CLOSED NOTABUG QA Contact: Shay Rozen <srozen>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: jolmomar, lgangava, mmuench, muagarwa, nigoyal, ocs-bugs, odf-bz-bot, rsinghal, sapillai
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-01 12:37:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Blum 2022-09-05 09:50:34 UTC
When deploying a Pod that wants to use an already existing PV (e.g. when we redeploy an application or share a PV between multiple Pods) we need to ensure that the Pod will be scheduled on the Node that has that storage locally.

Currently we use a storage-focused scheduling extension that selects nodes based on available storage capacity. We should extend this. If the PV already exists, this extension should forward only the one node that has the PV locally to the scheduler. That way we force the Pod to be scheduled there.

Comment 3 Chris Blum 2022-09-05 09:51:22 UTC
This will become important for SNO clusters with additional workers

Comment 4 N Balachandran 2022-11-01 12:08:33 UTC
Looks like the kube scheduler already takes care of this.


From the kube-scheduler logs:

I1026 08:35:44.304414       1 eventhandlers.go:116] "Add event for unscheduled pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.304530       1 scheduling_queue.go:956] "About to try and schedule pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.304552       1 schedule_one.go:85] "Attempting to schedule pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.304912       1 binder.go:808] "All bound volumes for pod match with node" pod="openshift-storage/lvmpod-2" node="rhocs-bm5.lab.eng.blr.redhat.com"
I1026 08:35:44.305030       1 binder.go:365] "AssumePodVolumes" pod="openshift-storage/lvmpod-2" node="rhocs-bm5.lab.eng.blr.redhat.com"
I1026 08:35:44.305062       1 binder.go:373] "AssumePodVolumes: all PVCs bound and nothing to do" pod="openshift-storage/lvmpod-2" node="rhocs-bm5.lab.eng.blr.redhat.com"
I1026 08:35:44.305172       1 default_binder.go:52] "Attempting to bind pod to node" pod="openshift-storage/lvmpod-2" node="rhocs-bm5.lab.eng.blr.redhat.com"
I1026 08:35:44.311899       1 schedule_one.go:264] "Successfully bound pod to node" pod="openshift-storage/lvmpod-2" node="rhocs-bm5.lab.eng.blr.redhat.com" evaluatedNodes=1 feasibleNodes=1
I1026 08:35:44.312211       1 eventhandlers.go:159] "Delete event for unscheduled pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.312240       1 eventhandlers.go:184] "Add event for scheduled pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.331550       1 eventhandlers.go:204] "Update event for scheduled pod" pod="openshift-storage/lvmpod-2"
I1026 08:35:44.340409       1 eventhandlers.go:204] "Update event for scheduled pod" pod="openshift-storage/lvmpod-2"

Comment 5 N Balachandran 2022-11-01 12:37:37 UTC
Confirmed this with jsafrane

"This CSI CreateVolume response causes the PV.spec.nodeAffinity to be set:
https://github.com/topolvm/topolvm/blob/a712ce0cafe8f762297c9329da74fc814f91a95d/driver/controller.go#L211-L215

CSI CreateVolume response provides info what node labels the PV needs, external-provisioner translates it into created PV as node affinity
CSI NodeGetInfo response tells kubelet how to label the node where it runs, https://github.com/topolvm/topolvm/blob/a712ce0cafe8f762297c9329da74fc814f91a95d/driver/node.go#L535-L538

Kubernetes scheduler puts it all together and runs Pods with the PV on nodes with the right labels."


I'm closing this BZ based on the above.