Description of problem (please be detailed as possible and provide log snippests): When applying non-ocs taints in ODF 4.10 csi-addons-controller-manager pods cannot schedule. # oc get subs odf-operator -o yaml | grep -A7 config config: tolerations: - effect: NoSchedule key: nodename operator: Equal value: "true" installPlanApproval: Manual name: odf-operator # oc adm taint nodes -l cluster.ocs.openshift.io/openshift-storage= nodename=true:NoSchedule # oc delete pod csi-addons-controller-manager-7656cbcf45-gzqjm # oc get pods | grep -v Running NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7656cbcf45-45r7f 0/2 Pending 0 7s # oc get pods csi-addons-controller-manager-7656cbcf45-45r7f -o yaml | grep -A8 status status: conditions: - lastProbeTime: null lastTransitionTime: "2022-08-31T12:55:37Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: Burstable # oc get csv odf-csi-addons-operator.v4.10.5 -o yaml | grep -A6 conditions conditions: - lastTransitionTime: "2022-08-31T13:08:25Z" lastUpdateTime: "2022-08-31T13:08:25Z" message: 'installing: waiting for deployment csi-addons-controller-manager to become ready: deployment "csi-addons-controller-manager" not available: Deployment does not have minimum availability.' phase: Pending Version of all relevant components (if applicable): ocs-operator.v4.10.5 OpenShift Container Storage 4.10.5 ocs-operator.v4.10.4 Succeeded odf-csi-addons-operator.v4.10.5 CSI Addons 4.10.5 odf-csi-addons-operator.v4.10.4 Installing odf-operator.v4.10.5 OpenShift Data Foundation 4.10.5 odf-operator.v4.10.4 Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? This may be a blocker for any customer using and upgrading to 4.10 non-ocs node taints. Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
ODF QE is looking into the issue, Let us know if we can help you in any way. Thanks
additional pods affected that rs+deployment+csv/subs are all owned by odf-operator This solution 4.9+ section does not work. https://access.redhat.com/articles/6408481 # oc get subs odf-operator -o yaml | grep -A7 config config: tolerations: - effect: NoSchedule key: nodename operator: Equal value: "true" installPlanApproval: Manual name: odf-operator noobaa-operator-79d9fd4599-5pxss 0/1 Pending 0 26m ocs-metrics-exporter-587db684f-jwgn5 0/1 Pending 0 26m ocs-operator-6659bd4695-kbjkr 0/1 Pending 0 13m rook-ceph-operator-66cf469cff-bgv9b 0/1 Pending 0 11m # oc get pod noobaa-operator-79d9fd4599-5pxss -o yaml | grep -A8 status: status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-02T19:23:02Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending # oc get pod ocs-metrics-exporter-587db684f-jwgn5 -o yaml | grep -A8 status: status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-02T19:23:03Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending # oc get pod ocs-operator-6659bd4695-kbjkr -o yaml | grep -A8 status: status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-02T19:36:06Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending # oc get pod rook-ceph-operator-66cf469cff-bgv9b -o yaml | grep -A8 status: status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-02T19:38:29Z" message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending
(In reply to khover from comment #7) > additional pods affected that rs+deployment+csv/subs are all owned by > odf-operator > > This solution 4.9+ section does not work. > > https://access.redhat.com/articles/6408481 > > # oc get subs odf-operator -o yaml | grep -A7 config > config: > tolerations: > - effect: NoSchedule > key: nodename > operator: Equal > value: "true" > installPlanApproval: Manual > name: odf-operator > > This should also be applied to other subs (mcg-operator-* and ocs-operator-*) in openshift-storage namespace. Looks like it wasn't applied. Can you check that?
When I apply the workaround step by step and delete the pods I get the following result. As Bipin also observed: There is less likely that odf-operator-controller-manager pod can be manually restarted because we are setting the replica to zero. For sure, if it gets restarted due for any reason, we will have to fix all the subscriptions again. I even observed that restarting odf-console updates the replica count to 1 and bring back the odf-operator-controller-manager pods. **Dependency: odf-operator-controller-manager replicas=0 state NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-98759dfbb-gw5k9 2/2 Running 0 20s csi-cephfsplugin-dw9z6 3/3 Running 0 76m csi-cephfsplugin-provisioner-6596b9c55f-2gll7 6/6 Running 0 76m csi-cephfsplugin-provisioner-6596b9c55f-dk79h 6/6 Running 0 76m csi-cephfsplugin-qzds2 3/3 Running 0 76m csi-cephfsplugin-x4sch 3/3 Running 0 76m csi-rbdplugin-2j8dh 4/4 Running 0 76m csi-rbdplugin-6flhc 4/4 Running 0 76m csi-rbdplugin-9r6jx 4/4 Running 0 76m csi-rbdplugin-provisioner-76494fb89-5dd8p 7/7 Running 0 76m csi-rbdplugin-provisioner-76494fb89-lcn9b 7/7 Running 0 76m noobaa-core-0 1/1 Running 0 75m noobaa-db-pg-0 1/1 Running 0 75m noobaa-endpoint-5744c75459-dvph7 1/1 Running 0 76m noobaa-operator-5cfc45d674-hnhns 1/1 Running 0 23m ocs-metrics-exporter-55b94f5d76-qxcb5 1/1 Running 0 76m ocs-operator-65b67f8674-9cs94 1/1 Running 0 25m odf-console-5d4c666646-wxh7g 1/1 Running 0 76m rook-ceph-crashcollector-ip-10-0-141-133.ec2.internal-74c8djl2m 1/1 Running 0 76m rook-ceph-crashcollector-ip-10-0-153-201.ec2.internal-65c6hk8f9 1/1 Running 0 76m rook-ceph-crashcollector-ip-10-0-170-200.ec2.internal-66b49gxrv 1/1 Running 0 76m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7b464946m4559 2/2 Running 0 76m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6986c8b56ggpx 2/2 Running 0 76m rook-ceph-mgr-a-7786fb48d6-kwpgd 2/2 Running 0 76m rook-ceph-mon-a-85f67c488-j74wv 2/2 Running 0 76m rook-ceph-mon-b-5b7fb6b994-4stph 2/2 Running 0 76m rook-ceph-mon-c-55794bb4dd-drjlr 2/2 Running 0 76m rook-ceph-operator-845fc866fd-8glr2 1/1 Running 0 76m rook-ceph-osd-0-7d676996b8-bjj7r 2/2 Running 0 76m rook-ceph-osd-1-cf59c5655-p97mk 2/2 Running 0 76m rook-ceph-osd-2-fc8bf9c74-6z89m 2/2 Running 0 76m rook-ceph-tools-5b87f59449-xq2bx 1/1 Running 0 76m # oc get deployments NAME READY UP-TO-DATE AVAILABLE AGE csi-addons-controller-manager 1/1 1 1 6d csi-cephfsplugin-provisioner 2/2 2 2 6d csi-rbdplugin-provisioner 2/2 2 2 6d noobaa-endpoint 1/1 1 1 6d noobaa-operator 1/1 1 1 6d ocs-metrics-exporter 1/1 1 1 6d ocs-operator 1/1 1 1 6d odf-console 1/1 1 1 6d odf-operator-controller-manager 0/0 0 0 6d rook-ceph-crashcollector-ip-10-0-141-133.ec2.internal 1/1 1 1 6d rook-ceph-crashcollector-ip-10-0-153-201.ec2.internal 1/1 1 1 6d rook-ceph-crashcollector-ip-10-0-170-200.ec2.internal 1/1 1 1 6d rook-ceph-mds-ocs-storagecluster-cephfilesystem-a 1/1 1 1 6d rook-ceph-mds-ocs-storagecluster-cephfilesystem-b 1/1 1 1 6d rook-ceph-mgr-a 1/1 1 1 6d rook-ceph-mon-a 1/1 1 1 6d rook-ceph-mon-b 1/1 1 1 6d rook-ceph-mon-c 1/1 1 1 6d rook-ceph-operator 1/1 1 1 6d rook-ceph-osd-0 1/1 1 1 6d rook-ceph-osd-1 1/1 1 1 6d rook-ceph-osd-2 1/1 1 1 6d rook-ceph-tools 1/1 1 1 5d12h