Bug 2131220

Summary: setting tolerations for non-ocs taint on toolbox pod is not working
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Bipin Kunal <bkunal>
Component: ocs-operatorAssignee: Malay Kumar parida <mparida>
Status: CLOSED CURRENTRELEASE QA Contact: Vishakha Kathole <vkathole>
Severity: high Docs Contact:
Priority: high    
Version: 4.11CC: kramdoss, mparida, muagarwa, nigoyal, ocs-bugs, odf-bz-bot, sostapov, tdesala
Target Milestone: ---Keywords: Regression
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.12.0-79 Doc Type: Bug Fix
Doc Text:
Previously, in 4.11, the handling of the ceph toolbox was moved from `ocsinitialization` to the `storagecluster` controller. The old way of enabling the toolbox from the `ocsinitialization cr` was kept as-is and did not account for the scenario when someone tries to add tolerations to the toolbox from the `ocsinitialization cr`. So, when someone tries to add tolerations to the toolbox by adding the tolerations to the `ocsinitialization cr`, they don't take effect. With this update, tolerations can now be added from the `ocsinitialization cr`, as well as the `storagecluster cr`, as intended. For example, with this fix, you can add tolerations without any issue when there is a taint on the node using the `ocsinitialization cr` even if the toolbox pod is not running.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2132693    

Comment 3 Malay Kumar parida 2022-09-30 11:50:08 UTC
Hi @bkunal , In 4.11 we copied the the rook-ceph toolbox enable & tolerations etc logic to the storagecluster cr see here-https://github.com/red-hat-storage/ocs-operator/pull/1602. But as it's not possible to remove fields from public API so we had to keep those fields still in ocsinitialization cr. I think we should have added a deprecation warning or probably could have made an implementation such that it considers edits to the ocsinitialization cr too while reconciling the toolbox pod. something like that, But I think we missed that.

So instead of the way mentioned here https://bugzilla.redhat.com/show_bug.cgi?id=2012084#c11,

Can you do like this

oc edit storagecluster -n openshift-storage <storage-cluster-name>

In this look for Spec.ManagedResources.CephToolbox field, 
```
spec:
  arbiter: {}
  encryption:
    kms: {}
  externalStorage: {}
  managedResources:
    cephBlockPools: {}
    cephCluster: {}
    cephConfig: {}
    cephDashboard: {}
    cephFilesystems: {}
    cephNonResilientPools: {}
    cephObjectStoreUsers: {}
    cephObjectStores: {}
    cephToolbox: {}
```
I this you have to add the tolerations to the cephToolbox field like below
cephToolbox:
    tolerations:
        <non-ocs-tolerations>

Comment 5 Malay Kumar parida 2022-09-30 13:35:02 UTC
Ack, Agreed there is too much variation. Will keep it in mind. If possible in 4.13 I will try to align the toleration of all the resources to a similar structure.

Comment 7 Malay Kumar parida 2022-09-30 14:18:35 UTC
It can be in 4.12, But I think it will require a good amount of effort, so it's difficult to commit to 4.12.
I will keep this in mind for 4.12 for now though.

Comment 10 Malay Kumar parida 2022-10-11 04:55:43 UTC
Hi Bipin, Working on it. PR will be up today

Comment 11 Malay Kumar parida 2022-10-11 07:42:04 UTC
PR is up, I have made it such that the old way of adding tolerations via the ocsinitialization still works & the new way of adding via the storagecluster is now structured Spec->Placement->Tolerations.

Comment 13 Malay Kumar parida 2022-10-11 11:22:29 UTC
In 4.11 when we moved the toolbox logic from ocsinitilization to storagecluster, it was a design decision. The toolbox should have been associated with the storagecluster itself from the beginning as it's unique to each cephcluster.
So we are trying to move all the enable/disable, and reconcile logic to the storagecluster now but still keeping backward compatibility. The plan is in the future somewhere the compatibility with ocsinitialization would be removed 
& storagecluster would be the only place for anything related to the ceph toolbox.