Description of problem (please be detailed as possible and provide log snippests): When applying non-ocs taints in ODF 4.10 some pods cannot schedule. # oc adm taint nodes -l cluster.ocs.openshift.io/openshift-storage= nodename=true:NoSchedule # oc delete $(oc get pods -o name) # oc get pods | grep -v Running NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7656cbcf45-n2chw 0/2 Pending 0 21m noobaa-db-pg-0 0/1 Init:0/2 0 47s noobaa-operator-764c8b74dc-8q65h 0/1 Pending 0 21m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8467d498h2xmq 0/2 Pending 0 21m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7767c896nzc4b 0/2 Pending 0 21m rook-ceph-mgr-a-5dfb7b8979-p6fxv 0/2 Pending 0 21m rook-ceph-mon-a-6ff6cfd6b7-qknq2 0/2 Pending 0 21m rook-ceph-mon-b-7876495dc4-4fxkt 0/2 Pending 0 21m rook-ceph-osd-0-675558d4d7-d7r2b 0/2 Pending 0 21m rook-ceph-osd-1-5d7bbfbcbf-jqg8v 0/2 Pending 0 21m rook-ceph-osd-2-857f6579b7-8827r 0/2 Pending 0 21m rook-ceph-tools-787676bdbd-l9xs8 0/1 Pending 0 21m # oc get subs odf-operator -o yaml | grep -A7 config config: tolerations: - effect: NoSchedule key: nodename operator: Equal value: "true" installPlanApproval: Manual name: odf-operator # oc get cm rook-ceph-operator-config -o yaml apiVersion: v1 data: CSI_ENABLE_CSIADDONS: "true" CSI_LOG_LEVEL: "5" CSI_PLUGIN_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule config: - effect: NoSchedule key: nodename operator: Equal value: "true" CSI_PROVISIONER_TOLERATIONS: |2- - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoSchedule - effect: NoSchedule key: nodename operator: Equal value: "true" # oc get storagecluster ocs-storagecluster -o yaml placement: all: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" mds: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" mgr: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" mon: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" noobaa-core: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" noobaa-operator: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" osd: tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" - effect: NoSchedule key: nodename operator: Equal value: "true" Version of all relevant components (if applicable): mcg-operator.v4.10.5 NooBaa Operator 4.10.5 mcg-operator.v4.10.4 Installing ocs-operator.v4.10.5 OpenShift Container Storage 4.10.5 ocs-operator.v4.10.4 Succeeded odf-csi-addons-operator.v4.10.5 CSI Addons 4.10.5 odf-csi-addons-operator.v4.10.4 Installing odf-operator.v4.10.5 OpenShift Data Foundation 4.10.5 odf-operator.v4.10.4 Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This is a blocker for any customer using and upgrading to 4.10non-ocs node taints. Is there any workaround available to the best of your knowledge? Edit the deployment to add toleration but this will not persist. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 5 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
IIRC, this label `cluster.ocs.openshift.io/openshift-storage= ` is not applied to operator pods(only applied to the pods that the storage cluster creates like mon,osd,mgr,mds). So, I think ` oc delete $(oc get pods -o name)` commands delete the operator pod like `ocs-operator` and `rook-ceph-operator` since these pods are missing other resources and are not being reconciled.
If you're adding taints to a running cluster, you'll first want to add the tolerations and make sure rook pods are updated with those tolerations before you add the taints. Otherwise, the pods will not be able to start and the operator won't be able to reconcile while the mons are stuck pending and out of quorum.
(In reply to Travis Nielsen from comment #6) > If you're adding taints to a running cluster, you'll first want to add the > tolerations and make sure rook pods are updated with those tolerations > before you add the taints. Otherwise, the pods will not be able to start and > the operator won't be able to reconcile while the mons are stuck pending and > out of quorum. That is the blocker, rook pods are not updated with the tolerations, prior to adding taints. Workflow from my lab: 1. Tolerations added to odf-operator, rook cm, ocs-storagecluster yaml > placement: all and by individual rook pod. 2. Taint added to nodes. 3. deleted all pods to simulate any ODF/rook pod failure. oc get pods | grep -v Running NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7656cbcf45-n2chw 0/2 Pending 0 21m noobaa-db-pg-0 0/1 Init:0/2 0 47s noobaa-operator-764c8b74dc-8q65h 0/1 Pending 0 21m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8467d498h2xmq 0/2 Pending 0 21m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7767c896nzc4b 0/2 Pending 0 21m rook-ceph-mgr-a-5dfb7b8979-p6fxv 0/2 Pending 0 21m rook-ceph-mon-a-6ff6cfd6b7-qknq2 0/2 Pending 0 21m rook-ceph-mon-b-7876495dc4-4fxkt 0/2 Pending 0 21m rook-ceph-osd-0-675558d4d7-d7r2b 0/2 Pending 0 21m rook-ceph-osd-1-5d7bbfbcbf-jqg8v 0/2 Pending 0 21m rook-ceph-osd-2-857f6579b7-8827r 0/2 Pending 0 21m rook-ceph-tools-787676bdbd-l9xs8 0/1 Pending 0 21m
Did you wait for some time between steps 1 and 2? It can take at least a few minutes for all the tolerations to be applied to the rook pods even in a healthy cluster.
(In reply to Travis Nielsen from comment #8) > Did you wait for some time between steps 1 and 2? It can take at least a few > minutes for all the tolerations to be applied to the rook pods even in a > healthy cluster. Yes we waited, cluster was stuck in this state for hours. Can reproduce every time.
You waited hours between steps 1 and 2? What was stuck between steps 1 and 2? If the cluster was healthy, the tolerations should have been applied. Hopefully the rook operator log would show why they weren't applied.
(In reply to Travis Nielsen from comment #10) > You waited hours between steps 1 and 2? What was stuck between steps 1 and > 2? If the cluster was healthy, the tolerations should have been applied. > Hopefully the rook operator log would show why they weren't applied. Hours after step 3. deleting the pods to replicate pod failure, the toleration was not applied to the pods in pending. The only workaround is to apply tolerations to the deployment witch is not persistent. Randy and I worked on this most of the day. 8/25 His opinion is this is a blocker triggering this BZ. If you wish we can reproduce via googlemeet so you can observe the behavior or check logs.
You need to delay between steps 1 and 2, otherwise the rook operator has no chance to update the ceph pod specs with the tolerations before the taints are added. Step 3 is too late for the operator to smoothly upgrade the cluster.
(In reply to Travis Nielsen from comment #12) > You need to delay between steps 1 and 2, otherwise the rook operator has no > chance to update the ceph pod specs with the tolerations before the taints > are added. Step 3 is too late for the operator to smoothly upgrade the > cluster. I see your point now its possible we did not wait long enough. My test cluster has all the tolerations currently in place as outlined in description since 8/25. Added: # oc adm taint nodes -l cluster.ocs.openshift.io/openshift-storage= nodename=true:NoSchedule Started deleting pods from the original pending list and ended up with only 2 pods pending now that the toleration didnt get passed down to. NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7656cbcf45-gzqjm 0/2 Pending 0 12m csi-cephfsplugin-9znfq 3/3 Running 0 11m csi-cephfsplugin-nzbjk 3/3 Running 0 4d1h csi-cephfsplugin-provisioner-6596b9c55f-pgq77 6/6 Running 0 4d1h csi-cephfsplugin-provisioner-6596b9c55f-w7tqc 6/6 Running 0 4d1h csi-cephfsplugin-sm865 3/3 Running 0 4d1h csi-rbdplugin-d5mdf 4/4 Running 0 4d1h csi-rbdplugin-provisioner-76494fb89-87qf5 7/7 Running 0 4d1h csi-rbdplugin-provisioner-76494fb89-zztpp 7/7 Running 0 4d1h csi-rbdplugin-sgrv9 4/4 Running 0 4d1h csi-rbdplugin-v92nf 4/4 Running 0 4d1h noobaa-core-0 1/1 Running 0 4d1h noobaa-db-pg-0 1/1 Running 0 4m26s noobaa-endpoint-6ff8bb4df-n7pf7 1/1 Running 0 3d22h noobaa-operator-764c8b74dc-s2ngc 1/1 Running 2 (8h ago) 3d22h ocs-metrics-exporter-5d94446c7b-8kpbz 1/1 Running 0 4d1h ocs-operator-5f677b5ddd-rn6gs 1/1 Running 0 99s odf-console-585d5b45b-5mh6q 1/1 Running 0 4d1h odf-operator-controller-manager-c58dd5864-pbtl2 2/2 Running 0 4d1h rook-ceph-crashcollector-ip-10-0-138-41.ec2.internal-5dc4c4t8kb 1/1 Running 0 3d22h rook-ceph-crashcollector-ip-10-0-153-37.ec2.internal-658f6sq6p7 1/1 Running 0 3d22h rook-ceph-crashcollector-ip-10-0-175-79.ec2.internal-799bd5w6tb 1/1 Running 0 2d20h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5cbcc48fq4mvh 2/2 Running 0 3d22h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-78b6c9887mzrv 2/2 Running 0 3d22h rook-ceph-mgr-a-55958bcfdb-75mhj 2/2 Running 0 3d22h rook-ceph-mon-a-6f475bb5fc-fq9ch 2/2 Running 0 15m rook-ceph-mon-b-7bfcf555dd-p2xn4 2/2 Running 0 3d22h rook-ceph-mon-c-55fb95c8cc-kpbwc 2/2 Running 0 4d1h rook-ceph-operator-64c6bc6cfd-blm5b 1/1 Running 0 87s rook-ceph-osd-0-f8ccfddd4-cwvlb 2/2 Running 0 2d20h rook-ceph-osd-1-58ccbfcc4b-m6kxg 2/2 Running 0 3d22h rook-ceph-osd-2-6dfdb6dd68-8vrvj 2/2 Running 0 2m27s rook-ceph-osd-prepare-73ac7e92a014913461027c97fb1d7aa6-gw7ng 0/1 Completed 0 2m32s rook-ceph-tools-787676bdbd-m6xzc 0/1 Pending 0 10m # oc get pod rook-ceph-tools-787676bdbd-m6xzc -o yaml status: conditions: - lastProbeTime: null lastTransitionTime: "2022-08-30T20:39:22Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled # oc get pods csi-addons-controller-manager-7656cbcf45-gzqjm -o yaml status: conditions: - lastProbeTime: null lastTransitionTime: "2022-08-30T20:37:35Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending # oc delete OCSInitialization ocsinit ocsinitialization.ocs.openshift.io "ocsinit" deleted [root@vm255-30 ~]# oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]' rook-ceph-tools-787676bdbd-549jp 0/1 Pending 0 12s I will open a separate BZ for csi-addons. Still stuck on the tools pod pending. Are these tolerations being injected by ODF subs edit now, or ocs-storagecluster ?
The tools pod is created by the ocs operator, so looks like at least rook has updated all its expected tolerations now.
Can we close this now, or what is remaining?
(In reply to Travis Nielsen from comment #15) > Can we close this now, or what is remaining? The only outstanding AI would be tools pod. Can we append component ? Or should I open a separate BZ against ocs operator.
There was a Rook fix that Madhu found necessary. The tolerations weren't being properly applied until the rook operator was restarted. This will be fixed by
Fixed by https://github.com/rook/rook/pull/10906
Merged downstream to 4.12 now with https://github.com/red-hat-storage/rook/pull/409