I think the change will have to be in rook.
This is important. It lets customers who have taints on some nodes fail to use OCS pvcs on those nodes... We need to fix it for 4.3 and need to discuss if we should do actually 4.2.x.
Fix posted on Rook.
Patch merged and resynced downstream.
@umanga Is the fix complete with this rook patch? Or do we need an additional patch in ocs-operator? If I read the fix right, it does not add a blanket toleration to the csi node plugin daemonset pods. So what additional steps are required for setting it up and testing it?
https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.3/82/ contains the fix
On AWS with OCS build 4.3.0-377.ci, performed following steps and observed csi-plugins coming up on the tainted nodes(i, infra nodes) so moving this bz to verified. Version: --------- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2020-03-20-053743 True False 3h48m Cluster version is 4.3.0-0.nightly-2020-03-20-053743 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded ocs-operator.v4.3.0-377.ci OpenShift Container Storage 4.3.0-377.ci Succeeded Steps performed: ---------------- 1. Created OCP cluster with 6W and 3M nodes. 2. Tainted 3W nodes as infra. $ oc adm taint nodes ip-10-0-132-176.us-east-2.compute.internal ip-10-0-150-252.us-east-2.compute.internal ip-10-0-171-193.us-east-2.compute.internal nodetype=infra:NoSchedule node/ip-10-0-132-176.us-east-2.compute.internal tainted node/ip-10-0-150-252.us-east-2.compute.internal tainted node/ip-10-0-171-193.us-east-2.compute.internal tainted $ for i in $(oc get nodes | grep worker | awk '{print$1}'); do echo $i; echo =======; oc describe node $i | grep -i taint; done ip-10-0-130-44.us-east-2.compute.internal ========================================= Taints: <none> ip-10-0-132-176.us-east-2.compute.internal ========================================== Taints: nodetype=infra:NoSchedule ip-10-0-150-252.us-east-2.compute.internal ========================================== Taints: nodetype=infra:NoSchedule ip-10-0-156-176.us-east-2.compute.internal ========================================== Taints: <none> ip-10-0-169-101.us-east-2.compute.internal ========================================== Taints: <none> ip-10-0-171-193.us-east-2.compute.internal ========================================== Taints: nodetype=infra:NoSchedule 3. Deployed OCS cluster. No pods came on non-ocs nodes. $ date;oc get pods -n openshift-storage -o wide -w Mon Mar 23 16:57:48 IST 2020 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-cephfsplugin-7sl99 3/3 Running 0 72m 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none> csi-cephfsplugin-clj2p 3/3 Running 0 72m 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none> csi-cephfsplugin-phh5d 3/3 Running 0 72m 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-6b89fb458c-drmqj 5/5 Running 0 106m 10.128.2.15 ip-10-0-169-101.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-6b89fb458c-rfk6r 5/5 Running 0 106m 10.129.2.14 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-rbdplugin-d8hlw 3/3 Running 0 72m 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none> csi-rbdplugin-pdb88 3/3 Running 0 72m 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-589578c4f4-9qc2r 5/5 Running 0 106m 10.129.2.13 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-589578c4f4-k9gvr 5/5 Running 0 106m 10.131.0.19 ip-10-0-156-176.us-east-2.compute.internal <none> <none> csi-rbdplugin-z8v2j 3/3 Running 0 72m 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none> lib-bucket-provisioner-55f74d96f6-5rrbb 1/1 Running 0 127m 10.129.2.10 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-core-0 1/1 Running 0 102m 10.129.2.23 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-db-0 1/1 Running 0 102m 10.129.2.25 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-endpoint-cf74c5d5f-vkclh 1/1 Running 0 100m 10.131.0.29 ip-10-0-156-176.us-east-2.compute.internal <none> <none> noobaa-operator-867f8f5c4b-djh44 1/1 Running 0 126m 10.131.0.18 ip-10-0-156-176.us-east-2.compute.internal <none> <none> ocs-operator-66977dc7fc-hz254 1/1 Running 0 126m 10.129.2.11 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv 1/1 Running 0 104m 10.129.2.18 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d 1/1 Running 0 105m 10.131.0.21 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d 1/1 Running 0 103m 10.128.2.17 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c 1/1 Running 0 102m 10.129.2.22 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp 1/1 Running 0 102m 10.131.0.26 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb 1/1 Running 0 102m 10.128.2.21 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz 1/1 Running 0 101m 10.131.0.28 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v 1/1 Running 0 101m 10.128.2.23 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mgr-a-6875585985-tm4b4 1/1 Running 0 103m 10.128.2.19 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mon-a-f498b986-22f6r 1/1 Running 0 71m 10.131.0.30 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-mon-b-bc95f9c5d-qcp7q 1/1 Running 0 71m 10.129.2.26 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-mon-c-84cf9c6bd7-qkr4f 1/1 Running 0 72m 10.128.2.26 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-operator-577cb7dfd9-l6hkw 1/1 Running 0 46m 10.129.2.27 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-osd-0-99d6b964b-w2pzs 1/1 Running 0 102m 10.128.2.22 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-79c88689d9-tbrwh 1/1 Running 0 102m 10.131.0.27 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-osd-2-f486c86dc-4pv7c 1/1 Running 0 102m 10.129.2.24 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5 0/1 Completed 0 102m 10.128.2.20 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp 0/1 Completed 0 102m 10.131.0.25 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p 0/1 Completed 0 102m 10.129.2.21 ip-10-0-130-44.us-east-2.compute.internal <none> <none> 4.Created a configmap rook-ceph-operator-config $ oc create -f rook-ceph-operator-config.yaml configmap/rook-ceph-operator-config created $ oc get configmap rook-ceph-operator-config -n openshift-storage -o yaml apiVersion: v1 data: CSI_PLUGIN_TOLERATIONS: | - effect: NoSchedule key: nodetype operator: Equal value: infra - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Exists kind: ConfigMap metadata: creationTimestamp: "2020-03-23T11:49:27Z" name: rook-ceph-operator-config namespace: openshift-storage resourceVersion: "114879" selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config uid: ac22e63a-8df1-4650-a57f-89bf7a2ce06a 5. Restarted rook-ceph-operator. Observed csi-plugins on non-ocs nodes. $ date; oc delete pod rook-ceph-operator-577cb7dfd9-qpbj4 -n openshift-storage Mon Mar 23 17:20:08 IST 2020 pod "rook-ceph-operator-577cb7dfd9-qpbj4" deleted $ date;oc get pods -n openshift-storage -o wide -w Mon Mar 23 17:23:48 IST 2020 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-cephfsplugin-246nv 3/3 Running 0 2m38s 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none> csi-cephfsplugin-25dtz 3/3 Running 0 3m42s 10.0.132.176 ip-10-0-132-176.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-cephfsplugin-5htc8 3/3 Running 0 2m51s 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-cephfsplugin-9s798 3/3 Running 0 3m42s 10.0.150.252 ip-10-0-150-252.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-cephfsplugin-ccrfb 3/3 Running 0 3m42s 10.0.171.193 ip-10-0-171-193.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-cephfsplugin-lrt9z 3/3 Running 0 3m4s 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-6b89fb458c-drmqj 5/5 Running 0 132m 10.128.2.15 ip-10-0-169-101.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-6b89fb458c-rfk6r 5/5 Running 0 132m 10.129.2.14 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-rbdplugin-5w26c 3/3 Running 0 3m5s 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none> csi-rbdplugin-7rnsp 3/3 Running 0 2m54s 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none> csi-rbdplugin-7tl26 3/3 Running 0 3m42s 10.0.150.252 ip-10-0-150-252.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-rbdplugin-8wlxr 3/3 Running 0 3m8s 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none> csi-rbdplugin-f5c2b 3/3 Running 0 3m42s 10.0.171.193 ip-10-0-171-193.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-rbdplugin-h7gqc 3/3 Running 0 3m42s 10.0.132.176 ip-10-0-132-176.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-589578c4f4-9qc2r 5/5 Running 0 132m 10.129.2.13 ip-10-0-130-44.us-east-2.compute.internal <none> <none> --- non-ocs(infra) csi-rbdplugin-provisioner-589578c4f4-k9gvr 5/5 Running 0 132m 10.131.0.19 ip-10-0-156-176.us-east-2.compute.internal <none> <none> lib-bucket-provisioner-55f74d96f6-5rrbb 1/1 Running 0 153m 10.129.2.10 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-core-0 1/1 Running 0 128m 10.129.2.23 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-db-0 1/1 Running 0 128m 10.129.2.25 ip-10-0-130-44.us-east-2.compute.internal <none> <none> noobaa-endpoint-cf74c5d5f-vkclh 1/1 Running 0 126m 10.131.0.29 ip-10-0-156-176.us-east-2.compute.internal <none> <none> noobaa-operator-867f8f5c4b-djh44 1/1 Running 0 152m 10.131.0.18 ip-10-0-156-176.us-east-2.compute.internal <none> <none> ocs-operator-66977dc7fc-hz254 1/1 Running 0 152m 10.129.2.11 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv 1/1 Running 0 130m 10.129.2.18 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d 1/1 Running 0 131m 10.131.0.21 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d 1/1 Running 0 129m 10.128.2.17 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c 1/1 Running 0 128m 10.129.2.22 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp 1/1 Running 0 128m 10.131.0.26 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb 1/1 Running 0 128m 10.128.2.21 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz 1/1 Running 0 128m 10.131.0.28 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v 1/1 Running 0 128m 10.128.2.23 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mgr-a-6875585985-tm4b4 1/1 Running 0 129m 10.128.2.19 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-mon-a-7d58886f5c-k8bft 1/1 Running 0 24m 10.131.0.31 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-mon-b-65b94b4896-hkxgx 1/1 Running 0 25m 10.129.2.28 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-mon-c-6cf6cdd9d-l7bps 1/1 Running 0 24m 10.128.2.31 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-operator-577cb7dfd9-tmrhf 1/1 Running 0 3m45s 10.128.2.32 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-osd-0-99d6b964b-w2pzs 1/1 Running 0 128m 10.128.2.22 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-79c88689d9-tbrwh 1/1 Running 0 128m 10.131.0.27 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-osd-2-f486c86dc-4pv7c 1/1 Running 0 128m 10.129.2.24 ip-10-0-130-44.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5 0/1 Completed 0 128m 10.128.2.20 ip-10-0-169-101.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp 0/1 Completed 0 128m 10.131.0.25 ip-10-0-156-176.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p 0/1 Completed 0 128m 10.129.2.21 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1437