Description of problem (please be detailed as possible and provide log snippests): We have a specific taint on the worker+master nodes and in order to run NooBaa/ODF, pods should tolerate the taint. In order to achieve that we are trying to add toleration to the noobaa pod via storage cluster CR but it doesn't seem to be working. in the Storage cluster CR we have the following tolerations: ========================================================= spec: placement: all: tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" mds: tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" noobaa-core: tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" ========================================================= Niether "all" nor "noobaa-core" works. Version of all relevant components (if applicable): All the versions. But my testing is with ODF 4.9.0-164.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Steps to Reproduce: 1. Install ODF 4.9 2. Add random taint to worker nodes 3. Add toleration into storagecluster CR 3. respin the noobaa pods Actual results: NooBaa pods gets stuck in pending state complaining about toleration Expected results: NooBaa pods should come fine tolerating the taint. Additional info: Check comments below.
$ oc get noobaa noobaa -o yaml | grep -A 10 "tolerations:" tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" status: accounts: $ oc get statefulset noobaa-core -oyaml | grep -A 10 "tolerations:" tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" volumes: - emptyDir: {} $ oc get pod noobaa-core-0 -oyaml |grep -A 10 "tolerations:" tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoExecute key: node.kubernetes.io/not-ready $ oc get statefulset noobaa-core -oyaml |grep -A 10 "tolerations:" tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" volumes: - emptyDir: {}
It looks like it is working for Noobaa pods as well. All the three noobaa pods : noobaa-core, noobaa-db-pg and noobaa-endpoint are working..... Only problem here is that adding toleration for noobaa-core overrides the default *node.ocs.openshift.io/storage* for noobaa pods, so toleration for *node.ocs.openshift.io/storage* needs to be added explicitly in the storagecluster CR like: ======================== noobaa-core: tolerations: - effect: NoSchedule key: xyz operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" ========================
I am closing this bug for now, I will open a new bug to ensure that passing additional toleration in storagecluster CR doesn't override the default or existing tolerations.