Bug 2229670

Summary: All the Controller Operations should reach the one Controller (active) not multiple Controllers
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Madhu Rajanna <mrajanna>
Component: csi-addonsAssignee: Niels de Vos <ndevos>
Status: ASSIGNED --- QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.14CC: odf-bz-bot, srangana
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Madhu Rajanna 2023-08-07 09:41:53 UTC
As of today, the kubernetes csi addons try to connect to the random controller that are registered and try to make the RPC calls to the random controller. This can create a problem if the csi driver has implemented some internal locking mechanism or has some local cache for the lifetime of that instance.

Example as below:-

CephCSI runs deployments for Replication/Reclaimspace etc and we will have two instances running. CephCSI Internally takes a lock and processes a request one at a time based on its internal logic. With the current kubernetes sidecar, it's not a problem because the sidecar runs with a leader election and only one can process a request. Still, with kubernetes-csiaddons it becomes a problem as we don't have any such mechanism to reach the same controller/deployment which is processing the requests.

The request is to provide this kind of functionality so that it will be helpful for the CSI driver who has this kind of requirement and not to run active/active models as it can lead to many different models.


Example:-

For example in ControllerSpace Reclaim operation, if the space is getting reclaimed by one csi driver instance, if there is any CR update we might end up making one more call to the same volume to another csi driver instance.

Mainly for VolumeReplication operations, we have different RPC calls for enable,disable,promote,demote,get etc we might end up issues different RPC calls for the same volume to different CSI driver instances.