Description of problem (please be detailed as possible and provide log snippests): ======================================================================= With a recent bug 1960066, it was asked in docs to mention that provisioner pods can come up on any Worker node and not just on COS storage nodes, because the deployment has nodeAffinity spec empty. There was a bug 1883828#c30 and 1883828#c31 where it was mentioned that all CSI pods can run on any Worker node, but IMO, only csi node-plugin pods should be allowed to run on non-OCS nodes, not the provisioner pods Raising this bug to see if we can fix the provisioner pod's node affinity. It was Confirmed in chat [1] that the OCS operator is only setting the tolerations and a couple other settings, but not the CSI_PROVISIONER_NODE_AFFINITY. spec: affinity: nodeAffinity: {} Logs copied in [2] [1] - https://chat.google.com/room/AAAAREGEba8/GHjFcIPSjSk [2[ - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-1959793/bz_1959793.tar.gz Version of all relevant components (if applicable): =================================================== This issue is seen in all OCS versions, attaching logs from OCS 4.8 We need to decide if the fix will be backported, else the doc BZ needs to be addressed. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? =================================================== No Is there any workaround available to the best of your knowledge? ============================================================ Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ====================================== 3 Can this issue reproducible? ============================== yes in all versions of OCS Can this issue reproduce from the UI? ===================================== yes for both CLI and UI based installs If this is a regression, please provide more details to justify this: ==================================================================== Not sure if this is prevalent since OCS 4.2 or was introduced later Steps to Reproduce: ==================== 1. Install OCP and OCS > v4.3 on a >3 W node setup, with 3W nodes as Storage nodes 2. Post installation check the following: a) pod location of provisioner pod, they may or may not be running on OCS nodes b)the provisioner pod spec for the nodeAffinity section c) and the configmap rook-ceph-operator-config Actual results: ================== Node-affinity is not set for Provisioner pods to run only on OCS nodes. like the other noobaa and ceph pods Expected results: ================ IMO provisioner pods should be running only on OCS nodes to maintain symmetry and to not use up resources of other Worker nodes Additional info: ======================== Doc BZs : Bug 1960066 and Bug 1959967 OCS 4.6 bug for the same issue(where provisioner pod was not fixed) - Bug 1883828 - apiVersion: v1 data: CSI_LOG_LEVEL: "5" CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule kind: ConfigMap metadata: creationTimestamp: "2021-05-11T06:12:29Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:CSI_LOG_LEVEL: {} f:CSI_PLUGIN_TOLERATIONS: {} f:CSI_PROVISIONER_TOLERATIONS: {} manager: ocs-operator operation: Update time: "2021-05-11T06:12:29Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "27518" uid: 007843c8-dfc2-48c1-9d99-a040f7e2cb1c
Agreed we should have the CSI provisioners only run on OCS nodes. This setting just needs to be added to the configmap rook-ceph-operator-config CSI_PROVISIONER_NODE_AFFINITY: "cluster.ocs.openshift.io/openshift-storage"
I completely disagree with this, as I disagree with constraining anything that's not OSD Pods in general. It really limits our ability to take advantage of additional resources in the rest of the cluster. Besides that, I see no reason why an admin *should* care what nodes they're running on, best I know they only care because *we tell them to care*. That said, we're already doing it, so might as well be consistent. Hopefully I can get this reverted or at least made optional in the future. :P Not sure if this should go into 4.8 yet, but giving devel_ack+.
If we already don't have a PR for this then I would suggest to move this out of 4.8
Even with no PR, I don't think it's valid for us to try and resolve this for a few reasons: 1. It's not quite as simple as just adding NodeAffinity, because we have to consider the scenario where we have no OCS nodes (e.g. external RHGS) 2. This does nothing to actually conserve resources. The number of provisioner pods does not change. If anything, this restricts our ability to make use of available resources in OCP, leading to a guaranteed reduction in resources on our actual storage nodes 3. This reduces the resiliency of the CSI driver. In the event where the two OCS nodes hosting the provisioner Pods go down, they may be entirely unavailable for several minutes, in which case several CSI functions will be entirely unavailable. From a pure statistical point of view, it increases our chances to survive node failures the more nodes we have available for scheduling. While we still need consensus with QE, since it is not a regression it is not dire enough to warrant attention for OCS 4.8. Moving to ODF 4.9.
I understand this particular issue is still leading to repeat customer confusion, but I still stand by my statements. This is not a technical problem, it's one of perception and documentation. As such, I think the doc BZ https://bugzilla.redhat.com/show_bug.cgi?id=1960066 is more appropriate. On a more technical note, we just had an ocs-operator BZ triage meeting with QE. We are in general consensus that it makes no sense to constraint *any* OCS Pods that aren't Ceph OSD Pods by default. With the current subscription model of charging for OCS SKUs on a per-core basis, it really doesn't matter what *specific nodes* most of our Pods are running on. The idea that isolating them as such would provide more stability for either us or other applications is not really valid, as it only constraints the potential compute resources we could use to recover from failure scenarios. OSD Pods are an exception since they are by far our most cumbersome resource, and in many cases have to be limited to the reach of their associated PVs anyway. All other Pods should be fairly lightweight by comparison and not mess with the stability of other applications or nodes. *If* we were to be doing any default NodeAffinity for non-OSD Pods it should really be to the master nodes, because we really are infrastructure even if OCP doesn't recognize us as such. Of course, we should absolutely allow admins to *optionally* constraint any and all Pods to arbitrary nodes. They can do this today, it just needs to be better documented if we want to officially support it. Given the need for further discussion, and the long-standing fact we're still not blocked by this, moving this to ODF 4.10 and giving devel_ack-.
Oops, too far!
It's bee well over 6 months since this has been raised. I have continued to strongly oppose it and thus far no one has had any substantial arguments against it. :) As such, closing this as WONTFIX.