Bug 1960671

Summary:	Add nodeAffinity to CSI provisioner pods to bring them up only on OCS labelled nodes
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Neha Berry <nberry>
Component:	ocs-operator	Assignee:	Jose A. Rivera <jarrpa>
Status:	CLOSED WONTFIX	QA Contact:	Elad <ebenahar>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.8	CC:	jarrpa, madam, muagarwa, ocs-bugs, odf-bz-bot, sostapov, tnielsen
Target Milestone:	---	Keywords:	AutomationBackLog
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-01-20 16:00:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Neha Berry 2021-05-14 14:43:34 UTC

Description of problem (please be detailed as possible and provide log
snippests):
=======================================================================
With a recent bug 1960066, it was asked in docs to mention that provisioner pods can come up on any Worker node and not just on COS storage nodes, because the deployment has nodeAffinity spec empty.

There was a bug 1883828#c30 and 1883828#c31 where it was mentioned that all CSI pods can run on any Worker node, but IMO, only csi node-plugin pods should be allowed to run on non-OCS nodes, not the provisioner pods

Raising this bug to see if we can fix the provisioner pod's node affinity.

It was Confirmed in chat [1] that the OCS operator is only setting the tolerations and a couple other settings, but not the CSI_PROVISIONER_NODE_AFFINITY.

  spec:
        affinity:
          nodeAffinity: {}

Logs copied in [2]

[1] - https://chat.google.com/room/AAAAREGEba8/GHjFcIPSjSk

[2[ - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-1959793/bz_1959793.tar.gz


Version of all relevant components (if applicable):
===================================================
This issue is seen in all OCS versions, attaching logs from OCS 4.8
We need to decide if the fix will be backported, else the doc BZ needs to be addressed.

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
===================================================

No

Is there any workaround available to the best of your knowledge?
============================================================
Not sure


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
======================================
3

Can this issue reproducible?
==============================
yes in all versions of OCS

Can this issue reproduce from the UI?
=====================================
yes for both CLI and UI based installs


If this is a regression, please provide more details to justify this:
====================================================================
Not sure if this is prevalent since OCS 4.2 or was introduced later
Steps to Reproduce:
====================
1. Install OCP and OCS > v4.3 on a >3 W node setup, with 3W nodes as Storage nodes
2. Post installation check  the following:

a) pod location of provisioner pod, they may or may not be running on OCS nodes
b)the provisioner pod spec for the nodeAffinity section
c) and the configmap rook-ceph-operator-config



Actual results:
==================
Node-affinity is not set for Provisioner pods to run only on OCS nodes. like the other noobaa and ceph pods


Expected results:
================
IMO provisioner pods should be running only on OCS nodes to maintain symmetry and to not use up resources of other Worker nodes

Additional info:
========================

Doc BZs : Bug 1960066 and Bug 1959967

OCS 4.6 bug for the same issue(where provisioner pod was not fixed) - Bug 1883828



- apiVersion: v1
  data:
    CSI_LOG_LEVEL: "5"
    CSI_PLUGIN_TOLERATIONS: |2-

      - key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
        effect: NoSchedule
    CSI_PROVISIONER_TOLERATIONS: |2-

      - key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
        effect: NoSchedule
  kind: ConfigMap
  metadata:
    creationTimestamp: "2021-05-11T06:12:29Z"
    managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:data:
          .: {}
          f:CSI_LOG_LEVEL: {}
          f:CSI_PLUGIN_TOLERATIONS: {}
          f:CSI_PROVISIONER_TOLERATIONS: {}
      manager: ocs-operator
      operation: Update
      time: "2021-05-11T06:12:29Z"
    name: rook-ceph-operator-config
    namespace: openshift-storage
    resourceVersion: "27518"
    uid: 007843c8-dfc2-48c1-9d99-a040f7e2cb1c

Comment 2 Travis Nielsen 2021-05-14 19:21:17 UTC

Agreed we should have the CSI provisioners only run on OCS nodes. This setting just needs to be added to the configmap rook-ceph-operator-config

CSI_PROVISIONER_NODE_AFFINITY: "cluster.ocs.openshift.io/openshift-storage"

Comment 3 Jose A. Rivera 2021-06-01 15:27:09 UTC

I completely disagree with this, as I disagree with constraining anything that's not OSD Pods in general. It really limits our ability to take advantage of additional resources in the rest of the cluster. Besides that, I see no reason why an admin *should* care what nodes they're running on, best I know they only care because *we tell them to care*.

That said, we're already doing it, so might as well be consistent. Hopefully I can get this reverted or at least made optional in the future. :P

Not sure if this should go into 4.8 yet, but giving devel_ack+.

Comment 4 Mudit Agarwal 2021-06-04 16:39:02 UTC

If we already don't have a PR for this then I would suggest to move this out of 4.8

Comment 5 Jose A. Rivera 2021-06-09 16:02:02 UTC

Even with no PR, I don't think it's valid for us to try and resolve this for a few reasons:

1. It's not quite as simple as just adding NodeAffinity, because we have to consider the scenario where we have no OCS nodes (e.g. external RHGS)
2. This does nothing to actually conserve resources. The number of provisioner pods does not change. If anything, this restricts our ability to make use of available resources in OCP, leading to a guaranteed reduction in resources on our actual storage nodes
3. This reduces the resiliency of the CSI driver. In the event where the two OCS nodes hosting the provisioner Pods go down, they may be entirely unavailable for several minutes, in which case several CSI functions will be entirely unavailable. From a pure statistical point of view, it increases our chances to survive node failures the more nodes we have available for scheduling.

While we still need consensus with QE, since it is not a regression it is not dire enough to warrant attention for OCS 4.8. Moving to ODF 4.9.

Comment 7 Jose A. Rivera 2021-10-07 16:37:58 UTC

I understand this particular issue is still leading to repeat customer confusion, but I still stand by my statements. This is not a technical problem, it's one of perception and documentation. As such, I think the doc BZ https://bugzilla.redhat.com/show_bug.cgi?id=1960066 is more appropriate.

On a more technical note, we just had an ocs-operator BZ triage meeting with QE. We are in general consensus that it makes no sense to constraint *any* OCS Pods that aren't Ceph OSD Pods by default. With the current subscription model of charging for OCS SKUs on a per-core basis, it really doesn't matter what *specific nodes* most of our Pods are running on. The idea that isolating them as such would provide more stability for either us or other applications is not really valid, as it only constraints the potential compute resources we could use to recover from failure scenarios. OSD Pods are an exception since they are by far our most cumbersome resource, and in many cases have to be limited to the reach of their associated PVs anyway. All other Pods should be fairly lightweight by comparison and not mess with the stability of other applications or nodes.

*If* we were to be doing any default NodeAffinity for non-OSD Pods it should really be to the master nodes, because we really are infrastructure even if OCP doesn't recognize us as such. Of course, we should absolutely allow admins to *optionally* constraint any and all Pods to arbitrary nodes. They can do this today, it just needs to be better documented if we want to officially support it.

Given the need for further discussion, and the long-standing fact we're still not blocked by this, moving this to ODF 4.10 and giving devel_ack-.

Comment 8 Jose A. Rivera 2021-10-07 16:50:29 UTC

Oops, too far!

Comment 9 Jose A. Rivera 2022-01-20 16:00:22 UTC

It's bee well over 6 months since this has been raised. I have continued to strongly oppose it and thus far no one has had any substantial arguments against it. :)

As such, closing this as WONTFIX.