Bug 1963040
| Summary: | [RFE] Master node tolerations for OCS | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | James Force <jforce> |
| Component: | ocs-operator | Assignee: | Eran Tamir <etamir> |
| Status: | CLOSED WONTFIX | QA Contact: | Elad <ebenahar> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | aivaras.laimikis, assingh, etamir, hnallurv, jrivera, madam, mben, muagarwa, nicolas.marcq, nravinas, ocs-bugs, odf-bz-bot, sostapov |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-14 15:29:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I see no reason why the CSI driver Pods shouldn't exist on master nodes by default. It makes total sense to support PVs for core OCP components running on master nodes. However, I'm not certain why the proposed workaround includes additional Tolerations for the provisioner Pods, since those don't need to run on the master nodes to enable use of OCS PVs. If you can verify that just adding the tolerations to the driver/plugin Pods is sufficient, then the workaround makes sense. Otherwise, we may have a bug in Rook-Ceph. That said, we do not currently have an official support statement for running our CSI Pods on master nodes. That would be up to PM and QE to decide. Maybe creating a KCS article would be sufficient. Additionally, since a workaround has been defined, I'm moving this to ODF 4.9. Hi, Just a comment here. I had the same issue on OCP 4.10. Having the plugin running on master nodes is actually mandatory when ceph is the only storage available and you want to execute the compliance operator. It does seem that the compliance operator (0.1.44) now actually provides a workaround for this. https://docs.openshift.com/container-platform/4.10/security/compliance_operator/compliance-operator-release-notes.html#compliance-operator-release-notes-0-1-44 "You can now customize the node that is used to schedule the result server workload by configuring the nodeSelector and tolerations attributes of the ScanSetting object. These attributes are used to place the ResultServer pod, the pod that is used to mount a PV storage volume and store the raw Asset Reporting Format (ARF) results. Previously, the nodeSelector and the tolerations parameters defaulted to selecting one of the control plane nodes and tolerating the node-role.kubernetes.io/master taint. This did not work in environments where control plane nodes are not permitted to mount PVs. This feature provides a way for you to select the node and tolerate a different taint in those environments." @nicolas.marcq - Do you want to see if the above works okay and report back? I think that's a certainly better approach than running the plugin pods on the master nodes. Thanks James Hi James. Thanks for the pointer. Indeed it works.
Edited the default ScanSetting:
```
apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
name: default
namespace: openshift-compliance
rawResultStorage:
nodeSelector:
node-role.kubernetes.io/worker: ""
pvAccessModes:
- ReadWriteOnce
rotation: 3
size: 1Gi
tolerations:
- operator: Exists
roles:
- master
- worker
scanTolerations:
- operator: Exists
schedule: 0 1 * * *
showNotApplicable: false
strictNodeScan: true
```
Then deployed back my ScanSettingBinding.
|
The OpenShift Compliance Operator cannot run on OpenShift Container Storage as the Compliance Operator requires persistent storage for master nodes the compliance scan fails to run. This is due to the csi-cephfsplugin-* and csi-rbdplugin-* pods won't run on master nodes by default. After some testing I was able to get the csi-cephfsplugin-* and csi-rbdplugin-* pods to run on all nodes by modifying the the rook-ceph-operator-config configmap to look like below. ``` apiVersion: v1 data: CSI_LOG_LEVEL: "5" CSI_PLUGIN_TOLERATIONS: |2- - operator: Exists CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule kind: ConfigMap metadata: name: rook-ceph-operator-config namespace: openshift-storage ``` This change modified the csi-rbdplugin and csi-cephfsplugin DaemonSet's allowing the relevant pods to be run on all nodes in the cluster. My environment is as below: OpenShift 4.6.27 OpenShift Container Storage 4.6.4 Compliaance Operator 0.1.3.2 So my above modification technically works. However, is this supported configuration? If so, can this be documented? Additional info: For more information please see the below GitHub issues - which were originally raised by another: https://github.com/openshift/ocs-operator/issues/1180 https://github.com/openshift/compliance-operator/issues/642