Bug 1794389
| Summary: | [GSS] Provide additional Toleration for Ceph CSI driver DS | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Levy Sant'Anna <lsantann> |
| Component: | rook | Assignee: | umanga <uchapaga> |
| Status: | CLOSED ERRATA | QA Contact: | akarsha <akrai> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.2 | CC: | akrai, alchan, assingh, bkunal, kramdoss, madam, ocs-bugs, shan, sostapov, tdesala, uchapaga |
| Target Milestone: | --- | Flags: | kramdoss:
needinfo+
|
| Target Release: | OCS 4.3.0 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-14 09:45:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 2
Michael Adam
2020-01-23 14:44:53 UTC
This is important. It lets customers who have taints on some nodes fail to use OCS pvcs on those nodes... We need to fix it for 4.3 and need to discuss if we should do actually 4.2.x. Fix posted on Rook. Patch merged and resynced downstream. @umanga Is the fix complete with this rook patch? Or do we need an additional patch in ocs-operator? If I read the fix right, it does not add a blanket toleration to the csi node plugin daemonset pods. So what additional steps are required for setting it up and testing it? https://ceph-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/OCS%20Build%20Pipeline%204.3/82/ contains the fix On AWS with OCS build 4.3.0-377.ci, performed following steps and observed csi-plugins coming up on the tainted nodes(i, infra nodes) so moving this bz to verified.
Version:
---------
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.3.0-0.nightly-2020-03-20-053743 True False 3h48m Cluster version is 4.3.0-0.nightly-2020-03-20-053743
$ oc get csv -n openshift-storage
NAME DISPLAY VERSION REPLACES PHASE
lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded
ocs-operator.v4.3.0-377.ci OpenShift Container Storage 4.3.0-377.ci Succeeded
Steps performed:
----------------
1. Created OCP cluster with 6W and 3M nodes.
2. Tainted 3W nodes as infra.
$ oc adm taint nodes ip-10-0-132-176.us-east-2.compute.internal ip-10-0-150-252.us-east-2.compute.internal ip-10-0-171-193.us-east-2.compute.internal nodetype=infra:NoSchedule
node/ip-10-0-132-176.us-east-2.compute.internal tainted
node/ip-10-0-150-252.us-east-2.compute.internal tainted
node/ip-10-0-171-193.us-east-2.compute.internal tainted
$ for i in $(oc get nodes | grep worker | awk '{print$1}'); do echo $i; echo =======; oc describe node $i | grep -i taint; done
ip-10-0-130-44.us-east-2.compute.internal
=========================================
Taints: <none>
ip-10-0-132-176.us-east-2.compute.internal
==========================================
Taints: nodetype=infra:NoSchedule
ip-10-0-150-252.us-east-2.compute.internal
==========================================
Taints: nodetype=infra:NoSchedule
ip-10-0-156-176.us-east-2.compute.internal
==========================================
Taints: <none>
ip-10-0-169-101.us-east-2.compute.internal
==========================================
Taints: <none>
ip-10-0-171-193.us-east-2.compute.internal
==========================================
Taints: nodetype=infra:NoSchedule
3. Deployed OCS cluster. No pods came on non-ocs nodes.
$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 16:57:48 IST 2020
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-7sl99 3/3 Running 0 72m 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-clj2p 3/3 Running 0 72m 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-phh5d 3/3 Running 0 72m 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj 5/5 Running 0 106m 10.128.2.15 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r 5/5 Running 0 106m 10.129.2.14 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-rbdplugin-d8hlw 3/3 Running 0 72m 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
csi-rbdplugin-pdb88 3/3 Running 0 72m 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r 5/5 Running 0 106m 10.129.2.13 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-rbdplugin-provisioner-589578c4f4-k9gvr 5/5 Running 0 106m 10.131.0.19 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
csi-rbdplugin-z8v2j 3/3 Running 0 72m 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
lib-bucket-provisioner-55f74d96f6-5rrbb 1/1 Running 0 127m 10.129.2.10 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-core-0 1/1 Running 0 102m 10.129.2.23 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-db-0 1/1 Running 0 102m 10.129.2.25 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-endpoint-cf74c5d5f-vkclh 1/1 Running 0 100m 10.131.0.29 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
noobaa-operator-867f8f5c4b-djh44 1/1 Running 0 126m 10.131.0.18 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
ocs-operator-66977dc7fc-hz254 1/1 Running 0 126m 10.129.2.11 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv 1/1 Running 0 104m 10.129.2.18 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d 1/1 Running 0 105m 10.131.0.21 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d 1/1 Running 0 103m 10.128.2.17 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c 1/1 Running 0 102m 10.129.2.22 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp 1/1 Running 0 102m 10.131.0.26 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb 1/1 Running 0 102m 10.128.2.21 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz 1/1 Running 0 101m 10.131.0.28 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v 1/1 Running 0 101m 10.128.2.23 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mgr-a-6875585985-tm4b4 1/1 Running 0 103m 10.128.2.19 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mon-a-f498b986-22f6r 1/1 Running 0 71m 10.131.0.30 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-mon-b-bc95f9c5d-qcp7q 1/1 Running 0 71m 10.129.2.26 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-mon-c-84cf9c6bd7-qkr4f 1/1 Running 0 72m 10.128.2.26 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-operator-577cb7dfd9-l6hkw 1/1 Running 0 46m 10.129.2.27 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-osd-0-99d6b964b-w2pzs 1/1 Running 0 102m 10.128.2.22 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-osd-1-79c88689d9-tbrwh 1/1 Running 0 102m 10.131.0.27 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-osd-2-f486c86dc-4pv7c 1/1 Running 0 102m 10.129.2.24 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5 0/1 Completed 0 102m 10.128.2.20 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp 0/1 Completed 0 102m 10.131.0.25 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p 0/1 Completed 0 102m 10.129.2.21 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
4.Created a configmap rook-ceph-operator-config
$ oc create -f rook-ceph-operator-config.yaml
configmap/rook-ceph-operator-config created
$ oc get configmap rook-ceph-operator-config -n openshift-storage -o yaml
apiVersion: v1
data:
CSI_PLUGIN_TOLERATIONS: |
- effect: NoSchedule
key: nodetype
operator: Equal
value: infra
- effect: NoSchedule
key: node.ocs.openshift.io/storage
operator: Exists
kind: ConfigMap
metadata:
creationTimestamp: "2020-03-23T11:49:27Z"
name: rook-ceph-operator-config
namespace: openshift-storage
resourceVersion: "114879"
selfLink: /api/v1/namespaces/openshift-storage/configmaps/rook-ceph-operator-config
uid: ac22e63a-8df1-4650-a57f-89bf7a2ce06a
5. Restarted rook-ceph-operator. Observed csi-plugins on non-ocs nodes.
$ date; oc delete pod rook-ceph-operator-577cb7dfd9-qpbj4 -n openshift-storage
Mon Mar 23 17:20:08 IST 2020
pod "rook-ceph-operator-577cb7dfd9-qpbj4" deleted
$ date;oc get pods -n openshift-storage -o wide -w
Mon Mar 23 17:23:48 IST 2020
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-246nv 3/3 Running 0 2m38s 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-25dtz 3/3 Running 0 3m42s 10.0.132.176 ip-10-0-132-176.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-cephfsplugin-5htc8 3/3 Running 0 2m51s 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-9s798 3/3 Running 0 3m42s 10.0.150.252 ip-10-0-150-252.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-cephfsplugin-ccrfb 3/3 Running 0 3m42s 10.0.171.193 ip-10-0-171-193.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-cephfsplugin-lrt9z 3/3 Running 0 3m4s 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-provisioner-6b89fb458c-drmqj 5/5 Running 0 132m 10.128.2.15 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
csi-cephfsplugin-provisioner-6b89fb458c-rfk6r 5/5 Running 0 132m 10.129.2.14 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-rbdplugin-5w26c 3/3 Running 0 3m5s 10.0.130.44 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
csi-rbdplugin-7rnsp 3/3 Running 0 2m54s 10.0.169.101 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
csi-rbdplugin-7tl26 3/3 Running 0 3m42s 10.0.150.252 ip-10-0-150-252.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-rbdplugin-8wlxr 3/3 Running 0 3m8s 10.0.156.176 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
csi-rbdplugin-f5c2b 3/3 Running 0 3m42s 10.0.171.193 ip-10-0-171-193.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-rbdplugin-h7gqc 3/3 Running 0 3m42s 10.0.132.176 ip-10-0-132-176.us-east-2.compute.internal <none> <none>
csi-rbdplugin-provisioner-589578c4f4-9qc2r 5/5 Running 0 132m 10.129.2.13 ip-10-0-130-44.us-east-2.compute.internal <none> <none> --- non-ocs(infra)
csi-rbdplugin-provisioner-589578c4f4-k9gvr 5/5 Running 0 132m 10.131.0.19 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
lib-bucket-provisioner-55f74d96f6-5rrbb 1/1 Running 0 153m 10.129.2.10 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-core-0 1/1 Running 0 128m 10.129.2.23 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-db-0 1/1 Running 0 128m 10.129.2.25 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
noobaa-endpoint-cf74c5d5f-vkclh 1/1 Running 0 126m 10.131.0.29 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
noobaa-operator-867f8f5c4b-djh44 1/1 Running 0 152m 10.131.0.18 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
ocs-operator-66977dc7fc-hz254 1/1 Running 0 152m 10.129.2.11 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-130-44-9b45cfbcd-fjjkv 1/1 Running 0 130m 10.129.2.18 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-156-176-77f96fc7f6-hcm8d 1/1 Running 0 131m 10.131.0.21 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-169-101-6c58667fcb-n828d 1/1 Running 0 129m 10.128.2.17 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-4acff26df560a70f9c864c3a043a2ba1-fdb689c 1/1 Running 0 128m 10.129.2.22 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-9bb4a3506ed2ba7324556f40161f74da-d5zkgqp 1/1 Running 0 128m 10.131.0.26 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-drain-canary-d5f09db205014ce5ede0f355c29cf767-7dv8qpb 1/1 Running 0 128m 10.128.2.21 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8674b9ddhjszz 1/1 Running 0 128m 10.131.0.28 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5fd967785497v 1/1 Running 0 128m 10.128.2.23 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mgr-a-6875585985-tm4b4 1/1 Running 0 129m 10.128.2.19 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-mon-a-7d58886f5c-k8bft 1/1 Running 0 24m 10.131.0.31 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-mon-b-65b94b4896-hkxgx 1/1 Running 0 25m 10.129.2.28 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-mon-c-6cf6cdd9d-l7bps 1/1 Running 0 24m 10.128.2.31 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-operator-577cb7dfd9-tmrhf 1/1 Running 0 3m45s 10.128.2.32 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-osd-0-99d6b964b-w2pzs 1/1 Running 0 128m 10.128.2.22 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-osd-1-79c88689d9-tbrwh 1/1 Running 0 128m 10.131.0.27 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-osd-2-f486c86dc-4pv7c 1/1 Running 0 128m 10.129.2.24 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-0-0-qddrd-q86b5 0/1 Completed 0 128m 10.128.2.20 ip-10-0-169-101.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-1-0-2bx96-dqhsp 0/1 Completed 0 128m 10.131.0.25 ip-10-0-156-176.us-east-2.compute.internal <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-2-0-gwfst-45w9p 0/1 Completed 0 128m 10.129.2.21 ip-10-0-130-44.us-east-2.compute.internal <none> <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1437 |