Bug 1794389 - [GSS] Provide additional Toleration for Ceph CSI driver DS
Summary: [GSS] Provide additional Toleration for Ceph CSI driver DS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: rook
Version: 4.2
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: OCS 4.3.0
Assignee: umanga
QA Contact: akarsha
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-23 13:36 UTC by Levy Sant'Anna
Modified: 2023-09-07 21:34 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-14 09:45:28 UTC
Embargoed:
kramdoss: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-ci pull 3790 0 None closed Automates BZ-1794387 Provide additional Toleration for Ceph CSI driver DS 2021-03-03 11:28:23 UTC
Github rook rook pull 4817 0 None closed Ceph: adds "rook-operator-config" configmap 2021-02-19 11:24:08 UTC
Red Hat Product Errata RHBA-2020:1437 0 None None None 2020-04-14 09:45:46 UTC

Comment 2 Michael Adam 2020-01-23 14:44:53 UTC
I think the change will have to be in rook.

Comment 3 Michael Adam 2020-01-23 14:46:05 UTC
This is important.
It lets customers who have taints on some nodes fail to use OCS pvcs on those nodes...
We need to fix it for 4.3 and need to discuss if we should do actually 4.2.x.

Comment 5 umanga 2020-02-04 11:20:41 UTC
Fix posted on Rook.

Comment 9 Sébastien Han 2020-02-18 08:14:57 UTC
Patch merged and resynced downstream.

Comment 11 Michael Adam 2020-02-20 08:47:45 UTC
@umanga Is the fix complete with this rook patch? Or do we need an additional patch in ocs-operator?

If I read the fix right, it does not add a blanket toleration to the csi node plugin daemonset pods. So what additional steps are required for setting it up and testing it?

Comment 17 akarsha 2020-03-23 12:20:46 UTC
On AWS with OCS build 4.3.0-377.ci, performed following steps and observed csi-plugins coming up on the tainted nodes(i, infra nodes) so moving this bz to verified.


Version:
---------

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-03-20-053743   True        False         3h48m   Cluster version is 4.3.0-0.nightly-2020-03-20-053743

$ oc get csv -n openshift-storage
NAME                            DISPLAY                       VERSION        REPLACES   PHASE
lib-bucket-provisioner.v1.0.0   lib-bucket-provisioner        1.0.0                     Succeeded
ocs-operator.v4.3.0-377.ci      OpenShift Container Storage   4.3.0-377.ci              Succeeded


Steps performed:
----------------

1. Created OCP cluster with 6W and 3M nodes.

2. Tainted 3W nodes as infra.

$ oc adm taint nodes ip-10-0-132-176.us-east-2.compute.internal ip-10-0-150-252.us-east-2.compute.internal ip-10-0-171-193.us-east-2.compute.internal nodetype=infra:NoSchedule
node/ip-10-0-132-176.us-east-2.compute.internal tainted
node/ip-10-0-150-252.us-east-2.compute.internal tainted
node/ip-10-0-171-193.us-east-2.compute.internal tainted

$ for i in $(oc get nodes | grep worker | awk '{print$1}'); do echo $i; echo =======; oc describe node $i | grep -i taint; done

ip-10-0-130-44.us-east-2.compute.internal
=========================================
Taints:             <none>

ip-10-0-132-176.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

ip-10-0-150-252.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

ip-10-0-156-176.us-east-2.compute.internal
==========================================
Taints:             <none>

ip-10-0-169-101.us-east-2.compute.internal

==========================================
Taints:             <none>

ip-10-0-171-193.us-east-2.compute.internal
==========================================
Taints:             nodetype=infra:NoSchedule

3. Deployed OCS cluster. No pods came on non-ocs nodes.

$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 16:57:48 IST 2020
NAME                                                              READY   STATUS      RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-7sl99                                            3/3     Running     0          72m    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-clj2p                                            3/3     Running     0          72m    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-phh5d                                            3/3     Running     0          72m    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj                     5/5     Running     0          106m   10.128.2.15    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r                     5/5     Running     0          106m   10.129.2.14    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-d8hlw                                               3/3     Running     0          72m    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-pdb88                                               3/3     Running     0          72m    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r                        5/5     Running     0          106m   10.129.2.13    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-k9gvr                        5/5     Running     0          106m   10.131.0.19    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-z8v2j                                               3/3     Running     0          72m    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
lib-bucket-provisioner-55f74d96f6-5rrbb                           1/1     Running     0          127m   10.129.2.10    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-core-0                                                     1/1     Running     0          102m   10.129.2.23    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-db-0                                                       1/1     Running     0          102m   10.129.2.25    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-endpoint-cf74c5d5f-vkclh                                   1/1     Running     0          100m   10.131.0.29    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
noobaa-operator-867f8f5c4b-djh44                                  1/1     Running     0          126m   10.131.0.18    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
ocs-operator-66977dc7fc-hz254                                     1/1     Running     0          126m   10.129.2.11    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv           1/1     Running     0          104m   10.129.2.18    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d         1/1     Running     0          105m   10.131.0.21    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d         1/1     Running     0          103m   10.128.2.17    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c   1/1     Running     0          102m   10.129.2.22    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp   1/1     Running     0          102m   10.131.0.26    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb   1/1     Running     0          102m   10.128.2.21    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz   1/1     Running     0          101m   10.131.0.28    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v   1/1     Running     0          101m   10.128.2.23    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mgr-a-6875585985-tm4b4                                  1/1     Running     0          103m   10.128.2.19    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-a-f498b986-22f6r                                    1/1     Running     0          71m    10.131.0.30    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-bc95f9c5d-qcp7q                                   1/1     Running     0          71m    10.129.2.26    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-84cf9c6bd7-qkr4f                                  1/1     Running     0          72m    10.128.2.26    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-operator-577cb7dfd9-l6hkw                               1/1     Running     0          46m    10.129.2.27    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-0-99d6b964b-w2pzs                                   1/1     Running     0          102m   10.128.2.22    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-79c88689d9-tbrwh                                  1/1     Running     0          102m   10.131.0.27    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-2-f486c86dc-4pv7c                                   1/1     Running     0          102m   10.129.2.24    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5               0/1     Completed   0          102m   10.128.2.20    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp               0/1     Completed   0          102m   10.131.0.25    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p               0/1     Completed   0          102m   10.129.2.21    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>


4.Created a configmap rook-ceph-operator-config

$ oc create -f rook-ceph-operator-config.yaml 
configmap/rook-ceph-operator-config created

$ oc get configmap rook-ceph-operator-config -n openshift-storage -o yaml
apiVersion: v1
data:
  CSI_PLUGIN_TOLERATIONS: |
    - effect: NoSchedule
      key: nodetype
      operator: Equal
      value: infra
    - effect: NoSchedule
      key: node.ocs.openshift.io/storage
      operator: Exists
kind: ConfigMap
metadata:
  creationTimestamp: "2020-03-23T11:49:27Z"
  name: rook-ceph-operator-config
  namespace: openshift-storage
  resourceVersion: "114879"
  selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config
  uid: ac22e63a-8df1-4650-a57f-89bf7a2ce06a

5. Restarted rook-ceph-operator. Observed csi-plugins on non-ocs nodes.

$ date; oc delete pod rook-ceph-operator-577cb7dfd9-qpbj4 -n openshift-storage
Mon Mar 23 17:20:08 IST 2020
pod "rook-ceph-operator-577cb7dfd9-qpbj4" deleted

$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 17:23:48 IST 2020
NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-246nv                                            3/3     Running     0          2m38s   10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-25dtz                                            3/3     Running     0          3m42s   10.0.132.176   ip-10-0-132-176.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-5htc8                                            3/3     Running     0          2m51s   10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-9s798                                            3/3     Running     0          3m42s   10.0.150.252   ip-10-0-150-252.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-ccrfb                                            3/3     Running     0          3m42s   10.0.171.193   ip-10-0-171-193.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-cephfsplugin-lrt9z                                            3/3     Running     0          3m4s    10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj                     5/5     Running     0          132m    10.128.2.15    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r                     5/5     Running     0          132m    10.129.2.14    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-5w26c                                               3/3     Running     0          3m5s    10.0.130.44    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
csi-rbdplugin-7rnsp                                               3/3     Running     0          2m54s   10.0.169.101   ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-7tl26                                               3/3     Running     0          3m42s   10.0.150.252   ip-10-0-150-252.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-rbdplugin-8wlxr                                               3/3     Running     0          3m8s    10.0.156.176   ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-f5c2b                                               3/3     Running     0          3m42s   10.0.171.193   ip-10-0-171-193.us-east-2.compute.internal   <none>           <none> --- non-ocs(infra)
csi-rbdplugin-h7gqc                                               3/3     Running     0          3m42s   10.0.132.176   ip-10-0-132-176.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r                        5/5     Running     0          132m    10.129.2.13    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none> --- non-ocs(infra)
csi-rbdplugin-provisioner-589578c4f4-k9gvr                        5/5     Running     0          132m    10.131.0.19    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
lib-bucket-provisioner-55f74d96f6-5rrbb                           1/1     Running     0          153m    10.129.2.10    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-core-0                                                     1/1     Running     0          128m    10.129.2.23    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-db-0                                                       1/1     Running     0          128m    10.129.2.25    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
noobaa-endpoint-cf74c5d5f-vkclh                                   1/1     Running     0          126m    10.131.0.29    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
noobaa-operator-867f8f5c4b-djh44                                  1/1     Running     0          152m    10.131.0.18    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
ocs-operator-66977dc7fc-hz254                                     1/1     Running     0          152m    10.129.2.11    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv           1/1     Running     0          130m    10.129.2.18    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d         1/1     Running     0          131m    10.131.0.21    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d         1/1     Running     0          129m    10.128.2.17    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c   1/1     Running     0          128m    10.129.2.22    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp   1/1     Running     0          128m    10.131.0.26    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb   1/1     Running     0          128m    10.128.2.21    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz   1/1     Running     0          128m    10.131.0.28    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v   1/1     Running     0          128m    10.128.2.23    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mgr-a-6875585985-tm4b4                                  1/1     Running     0          129m    10.128.2.19    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-a-7d58886f5c-k8bft                                  1/1     Running     0          24m     10.131.0.31    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-65b94b4896-hkxgx                                  1/1     Running     0          25m     10.129.2.28    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-mon-c-6cf6cdd9d-l7bps                                   1/1     Running     0          24m     10.128.2.31    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-operator-577cb7dfd9-tmrhf                               1/1     Running     0          3m45s   10.128.2.32    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-0-99d6b964b-w2pzs                                   1/1     Running     0          128m    10.128.2.22    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-1-79c88689d9-tbrwh                                  1/1     Running     0          128m    10.131.0.27    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-2-f486c86dc-4pv7c                                   1/1     Running     0          128m    10.129.2.24    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5               0/1     Completed   0          128m    10.128.2.20    ip-10-0-169-101.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp               0/1     Completed   0          128m    10.131.0.25    ip-10-0-156-176.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p               0/1     Completed   0          128m    10.129.2.21    ip-10-0-130-44.us-east-2.compute.internal    <none>           <none>

Comment 19 errata-xmlrpc 2020-04-14 09:45:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1437


Note You need to log in before you can comment on or make changes to this bug.