Bug 2010200

Summary: Not able to add toleration for NooBaa pods
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Bipin Kunal <bkunal>
Component: Multi-Cloud Object GatewayAssignee: Nimrod Becker <nbecker>
Status: CLOSED WORKSFORME QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.9CC: etamir, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-04 10:34:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bipin Kunal 2021-10-04 07:45:32 UTC
Description of problem (please be detailed as possible and provide log
snippests):

We have a specific taint on the worker+master nodes and in order to run NooBaa/ODF, pods should tolerate the taint. In order to achieve that we are trying to add toleration to the noobaa pod via storage cluster CR but it doesn't seem to be working.

in the Storage cluster CR we have the following tolerations:

=========================================================
spec:
  placement:
    all:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
    mds:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

=========================================================

Niether "all" nor "noobaa-core" works.

Version of all relevant components (if applicable):
All the versions. 
But my testing is with ODF 4.9.0-164.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes


Steps to Reproduce:
1. Install ODF 4.9
2. Add random taint to worker nodes
3. Add toleration into storagecluster CR
3. respin the noobaa pods


Actual results:

NooBaa pods gets stuck in pending state complaining about toleration


Expected results:

NooBaa pods should come fine tolerating the taint.


Additional info: Check comments below.

Comment 2 Bipin Kunal 2021-10-04 10:20:40 UTC
$ oc get noobaa noobaa -o yaml | grep -A 10 "tolerations:" 
  tolerations:
  - effect: NoSchedule
    key: xyz
    operator: Equal
    value: "true"
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
status:
  accounts:


$ oc get statefulset noobaa-core -oyaml | grep -A 10  "tolerations:" 
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      volumes:
      - emptyDir: {}


$ oc get pod noobaa-core-0 -oyaml |grep -A 10  "tolerations:"
  tolerations:
  - effect: NoSchedule
    key: xyz
    operator: Equal
    value: "true"
  - effect: NoSchedule
    key: node.ocs.openshift.io/storage
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready


$ oc get statefulset noobaa-core  -oyaml |grep -A 10  "tolerations:"
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      volumes:
      - emptyDir: {}

Comment 3 Bipin Kunal 2021-10-04 10:30:15 UTC
It looks like it is working for Noobaa pods as well. All the three noobaa pods : noobaa-core, noobaa-db-pg and noobaa-endpoint are working..... Only problem here is that adding toleration for noobaa-core overrides the default  *node.ocs.openshift.io/storage* for noobaa pods, so toleration for *node.ocs.openshift.io/storage* needs to be added explicitly in the storagecluster CR like:

========================

    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: xyz
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"

========================

Comment 4 Bipin Kunal 2021-10-04 10:34:40 UTC
I am closing this bug for now, I will open a new bug to ensure that passing additional toleration in storagecluster CR doesn't override the default or existing tolerations.