Bug 1794389

Summary: [GSS] Provide additional Toleration for Ceph CSI driver DS
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Levy Sant'Anna <lsantann>
Component: rookAssignee: umanga <uchapaga>
Status: CLOSED ERRATA QA Contact: akarsha <akrai>
Severity: high Docs Contact:
Priority: high    
Version: 4.2CC: akrai, alchan, assingh, bkunal, kramdoss, madam, ocs-bugs, shan, sostapov, tdesala, uchapaga
Target Milestone: ---Flags: kramdoss: needinfo+
Target Release: OCS 4.3.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-14 09:45:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Michael Adam 2020-01-23 14:44:53 UTC
I think the change will have to be in rook.

Comment 3 Michael Adam 2020-01-23 14:46:05 UTC
This is important.
It lets customers who have taints on some nodes fail to use OCS pvcs on those nodes...
We need to fix it for 4.3 and need to discuss if we should do actually 4.2.x.

Comment 5 umanga 2020-02-04 11:20:41 UTC
Fix posted on Rook.

Comment 9 Sébastien Han 2020-02-18 08:14:57 UTC
Patch merged and resynced downstream.

Comment 11 Michael Adam 2020-02-20 08:47:45 UTC
@umanga Is the fix complete with this rook patch? Or do we need an additional patch in ocs-operator?

If I read the fix right, it does not add a blanket toleration to the csi node plugin daemonset pods. So what additional steps are required for setting it up and testing it?

Comment 17 akarsha 2020-03-23 12:20:46 UTC
On AWS with OCS build 4.3.0-377.ci, performed following steps and observed csi-plugins coming up on the tainted nodes(i, infra nodes) so moving this bz to verified.


Version:
---------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-20-053743   True        False         3h48m   Cluster version is 4.3.0-0.nightly-2020-03-20-053743

$ oc get csv -n openshift-storage
NAME                            DISPLAY                       VERSION        REPLACES   PHASE
lib-bucket-provisioner.v1.0.0   lib-bucket-provisioner        1.0.0                     Succeeded
ocs-operator.v4.3.0-377.ci      OpenShift Container Storage   4.3.0-377.ci              Succeeded


Steps performed:
----------------

1. Created OCP cluster with 6W and 3M nodes.

2. Tainted 3W nodes as infra.

$ oc adm taint nodes ip-10-0-132-176.us-east-2.compute.internal ip-10-0-150-252.us-east-2.compute.internal ip-10-0-171-193.us-east-2.compute.internal nodetype=infra:NoSchedule
node/ip-10-0-132-176.us-east-2.compute.internal tainted
node/ip-10-0-150-252.us-east-2.compute.internal tainted
node/ip-10-0-171-193.us-east-2.compute.internal tainted

$ for i in $(oc get nodes | grep worker | awk '{print$1}'); do echo $i; echo =======; oc describe node $i | grep -i taint; done

ip-10-0-130-44.us-east-2.compute.internal
=========================================
Taints:             <none>

ip-10-0-132-176.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

ip-10-0-150-252.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

ip-10-0-156-176.us-east-2.compute.internal
==========================================
Taints:             <none>

ip-10-0-169-101.us-east-2.compute.internal

==========================================
Taints:             <none>

ip-10-0-171-193.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

3. Deployed OCS cluster. No pods came on non-ocs nodes.

$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 16:57:48 IST 2020
NAME                                                              READY   STATUS      RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-7sl99                                            3/3     Running     0          72m    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-clj2p                                            3/3     Running     0          72m    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-phh5d                                            3/3     Running     0          72m    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj                     5/5     Running     0          106m   10.128.2.15    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r                     5/5     Running     0          106m   10.129.2.14    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-d8hlw                                               3/3     Running     0          72m    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-pdb88                                               3/3     Running     0          72m    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r                        5/5     Running     0          106m   10.129.2.13    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-k9gvr                        5/5     Running     0          106m   10.131.0.19    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-z8v2j                                               3/3     Running     0          72m    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
lib-bucket-provisioner-55f74d96f6-5rrbb                           1/1     Running     0          127m   10.129.2.10    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-core-0                                                     1/1     Running     0          102m   10.129.2.23    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-db-0                                                       1/1     Running     0          102m   10.129.2.25    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-endpoint-cf74c5d5f-vkclh                                   1/1     Running     0          100m   10.131.0.29    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
noobaa-operator-867f8f5c4b-djh44                                  1/1     Running     0          126m   10.131.0.18    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
ocs-operator-66977dc7fc-hz254                                     1/1     Running     0          126m   10.129.2.11    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv           1/1     Running     0          104m   10.129.2.18    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d         1/1     Running     0          105m   10.131.0.21    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d         1/1     Running     0          103m   10.128.2.17    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c   1/1     Running     0          102m   10.129.2.22    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp   1/1     Running     0          102m   10.131.0.26    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb   1/1     Running     0          102m   10.128.2.21    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz   1/1     Running     0          101m   10.131.0.28    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v   1/1     Running     0          101m   10.128.2.23    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mgr-a-6875585985-tm4b4                                  1/1     Running     0          103m   10.128.2.19    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-a-f498b986-22f6r                                    1/1     Running     0          71m    10.131.0.30    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-bc95f9c5d-qcp7q                                   1/1     Running     0          71m    10.129.2.26    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-84cf9c6bd7-qkr4f                                  1/1     Running     0          72m    10.128.2.26    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-operator-577cb7dfd9-l6hkw                               1/1     Running     0          46m    10.129.2.27    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-0-99d6b964b-w2pzs                                   1/1     Running     0          102m   10.128.2.22    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-79c88689d9-tbrwh                                  1/1     Running     0          102m   10.131.0.27    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-2-f486c86dc-4pv7c                                   1/1     Running     0          102m   10.129.2.24    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5               0/1     Completed   0          102m   10.128.2.20    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp               0/1     Completed   0          102m   10.131.0.25    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p               0/1     Completed   0          102m   10.129.2.21    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>


4.Created a configmap rook-ceph-operator-config

$ oc create -f rook-ceph-operator-config.yaml 
configmap/rook-ceph-operator-config created

$ oc get configmap rook-ceph-operator-config -n openshift-storage -o yaml
apiVersion: v1
data:
  CSI_PLUGIN_TOLERATIONS: |
    - effect: NoSchedule
      key: nodetype
      operator: Equal
      value: infra
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Exists
kind: ConfigMap
metadata:
  creationTimestamp: "2020-03-23T11:49:27Z"
  name: rook-ceph-operator-config
  namespace: openshift-storage
  resourceVersion: "114879"
  selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config
  uid: ac22e63a-8df1-4650-a57f-89bf7a2ce06a

5. Restarted rook-ceph-operator. Observed csi-plugins on non-ocs nodes.

$ date; oc delete pod rook-ceph-operator-577cb7dfd9-qpbj4 -n openshift-storage
Mon Mar 23 17:20:08 IST 2020
pod "rook-ceph-operator-577cb7dfd9-qpbj4" deleted

$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 17:23:48 IST 2020
NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-246nv                                            3/3     Running     0          2m38s   10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-25dtz                                            3/3     Running     0          3m42s   10.0.132.176   ip-10-0-132-176.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-5htc8                                            3/3     Running     0          2m51s   10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-9s798                                            3/3     Running     0          3m42s   10.0.150.252   ip-10-0-150-252.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-ccrfb                                            3/3     Running     0          3m42s   10.0.171.193   ip-10-0-171-193.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-lrt9z                                            3/3     Running     0          3m4s    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj                     5/5     Running     0          132m    10.128.2.15    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r                     5/5     Running     0          132m    10.129.2.14    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-5w26c                                               3/3     Running     0          3m5s    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-7rnsp                                               3/3     Running     0          2m54s   10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-7tl26                                               3/3     Running     0          3m42s   10.0.150.252   ip-10-0-150-252.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-rbdplugin-8wlxr                                               3/3     Running     0          3m8s    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-f5c2b                                               3/3     Running     0          3m42s   10.0.171.193   ip-10-0-171-193.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-rbdplugin-h7gqc                                               3/3     Running     0          3m42s   10.0.132.176   ip-10-0-132-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r                        5/5     Running     0          132m    10.129.2.13    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none> --- non-ocs(infra)
csi-rbdplugin-provisioner-589578c4f4-k9gvr                        5/5     Running     0          132m    10.131.0.19    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
lib-bucket-provisioner-55f74d96f6-5rrbb                           1/1     Running     0          153m    10.129.2.10    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-core-0                                                     1/1     Running     0          128m    10.129.2.23    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-db-0                                                       1/1     Running     0          128m    10.129.2.25    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-endpoint-cf74c5d5f-vkclh                                   1/1     Running     0          126m    10.131.0.29    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
noobaa-operator-867f8f5c4b-djh44                                  1/1     Running     0          152m    10.131.0.18    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
ocs-operator-66977dc7fc-hz254                                     1/1     Running     0          152m    10.129.2.11    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv           1/1     Running     0          130m    10.129.2.18    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d         1/1     Running     0          131m    10.131.0.21    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d         1/1     Running     0          129m    10.128.2.17    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c   1/1     Running     0          128m    10.129.2.22    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp   1/1     Running     0          128m    10.131.0.26    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb   1/1     Running     0          128m    10.128.2.21    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz   1/1     Running     0          128m    10.131.0.28    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v   1/1     Running     0          128m    10.128.2.23    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mgr-a-6875585985-tm4b4                                  1/1     Running     0          129m    10.128.2.19    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-a-7d58886f5c-k8bft                                  1/1     Running     0          24m     10.131.0.31    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-65b94b4896-hkxgx                                  1/1     Running     0          25m     10.129.2.28    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-6cf6cdd9d-l7bps                                   1/1     Running     0          24m     10.128.2.31    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-operator-577cb7dfd9-tmrhf                               1/1     Running     0          3m45s   10.128.2.32    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-0-99d6b964b-w2pzs                                   1/1     Running     0          128m    10.128.2.22    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-79c88689d9-tbrwh                                  1/1     Running     0          128m    10.131.0.27    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-2-f486c86dc-4pv7c                                   1/1     Running     0          128m    10.129.2.24    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5               0/1     Completed   0          128m    10.128.2.20    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp               0/1     Completed   0          128m    10.131.0.25    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p               0/1     Completed   0          128m    10.129.2.21    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>

Comment 19 errata-xmlrpc 2020-04-14 09:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437