Bug 2121842

Summary: [GSS]toleration for "non-ocs" taints OpenShift Data Foundation pods
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: khover
Component: rookAssignee: Madhu Rajanna <mrajanna>
Status: CLOSED CURRENTRELEASE QA Contact: Vishakha Kathole <vkathole>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.10CC: bkunal, mrajanna, muagarwa, mwade, nigoyal, ocs-bugs, odf-bz-bot, srai, tdesala, tnielsen
Target Milestone: ---Keywords: Regression
Target Release: ODF 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the Rook CSI controller was watching for the configmap changes for keys starting with ROOK_CSI or CSI_ using regex only for single line values or configuration in the configmap. If the value is multiline, the regex could not identify the change because the regex was not valid for multiline formats, due to which no configurations were applied to the CSI driver. With this update, removing the regex check identifies any change in the configmap. If there is any update on the configmap, the CSI driver gets reconciled.
Story Points: ---
Clone Of:
: 2122215 2130032 (view as bug list) Environment:
Last Closed: 2023-02-08 14:06:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2122215, 2130032    

Description khover 2022-08-26 19:42:18 UTC
Description of problem (please be detailed as possible and provide log
snippests):

When applying non-ocs taints in ODF 4.10 some pods cannot schedule.

# oc adm taint nodes -l cluster.ocs.openshift.io/openshift-storage= nodename=true:NoSchedule

# oc delete $(oc get pods -o name)

# oc get pods | grep -v Running
NAME                                                              READY   STATUS     RESTARTS   AGE
csi-addons-controller-manager-7656cbcf45-n2chw                    0/2     Pending    0          21m
noobaa-db-pg-0                                                    0/1     Init:0/2   0          47s
noobaa-operator-764c8b74dc-8q65h                                  0/1     Pending    0          21m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8467d498h2xmq   0/2     Pending    0          21m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7767c896nzc4b   0/2     Pending    0          21m
rook-ceph-mgr-a-5dfb7b8979-p6fxv                                  0/2     Pending    0          21m
rook-ceph-mon-a-6ff6cfd6b7-qknq2                                  0/2     Pending    0          21m
rook-ceph-mon-b-7876495dc4-4fxkt                                  0/2     Pending    0          21m
rook-ceph-osd-0-675558d4d7-d7r2b                                  0/2     Pending    0          21m
rook-ceph-osd-1-5d7bbfbcbf-jqg8v                                  0/2     Pending    0          21m
rook-ceph-osd-2-857f6579b7-8827r                                  0/2     Pending    0          21m
rook-ceph-tools-787676bdbd-l9xs8                                  0/1     Pending    0          21m


# oc get subs odf-operator -o yaml | grep -A7 config
  config:
    tolerations:
    - effect: NoSchedule
      key: nodename
      operator: Equal
      value: "true"
  installPlanApproval: Manual
  name: odf-operator

# oc get cm rook-ceph-operator-config -o yaml
apiVersion: v1
data:
  CSI_ENABLE_CSIADDONS: "true"
  CSI_LOG_LEVEL: "5"
  CSI_PLUGIN_TOLERATIONS: |2-

    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
      config:
    - effect: NoSchedule
      key: nodename
      operator: Equal
      value: "true"
  CSI_PROVISIONER_TOLERATIONS: |2-

    - key: node.ocs.openshift.io/storage
      operator: Equal
      value: "true"
      effect: NoSchedule
    - effect: NoSchedule
      key: nodename
      operator: Equal
      value: "true"

# oc get storagecluster ocs-storagecluster -o yaml

  placement:
    all:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    mds:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    mgr:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    mon:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    noobaa-core:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    noobaa-operator:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"
    osd:
      tolerations:
      - effect: NoSchedule
        key: node.ocs.openshift.io/storage
        operator: Equal
        value: "true"
      - effect: NoSchedule
        key: nodename
        operator: Equal
        value: "true"


Version of all relevant components (if applicable):

mcg-operator.v4.10.5              NooBaa Operator               4.10.5    mcg-operator.v4.10.4              Installing
ocs-operator.v4.10.5              OpenShift Container Storage   4.10.5    ocs-operator.v4.10.4              Succeeded
odf-csi-addons-operator.v4.10.5   CSI Addons                    4.10.5    odf-csi-addons-operator.v4.10.4   Installing
odf-operator.v4.10.5              OpenShift Data Foundation     4.10.5    odf-operator.v4.10.4              Succeeded


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

This is a blocker for any customer using and upgrading to 4.10non-ocs node taints.

Is there any workaround available to the best of your knowledge?

Edit the deployment to add toleration but this will not persist.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

5

Can this issue reproducible?

Yes 

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 4 Subham Rai 2022-08-29 12:55:29 UTC
IIRC, this label `cluster.ocs.openshift.io/openshift-storage= ` is not applied to operator pods(only applied to the pods that the storage cluster creates like mon,osd,mgr,mds). So, I think ` oc delete $(oc get pods -o name)` commands delete the operator pod like `ocs-operator` and `rook-ceph-operator` since these pods are missing other resources and are not being reconciled.

Comment 6 Travis Nielsen 2022-08-29 21:32:07 UTC
If you're adding taints to a running cluster, you'll first want to add the tolerations and make sure rook pods are updated with those tolerations before you add the taints. Otherwise, the pods will not be able to start and the operator won't be able to reconcile while the mons are stuck pending and out of quorum.

Comment 7 khover 2022-08-30 09:16:08 UTC
(In reply to Travis Nielsen from comment #6)
> If you're adding taints to a running cluster, you'll first want to add the
> tolerations and make sure rook pods are updated with those tolerations
> before you add the taints. Otherwise, the pods will not be able to start and
> the operator won't be able to reconcile while the mons are stuck pending and
> out of quorum.

That is the blocker, rook pods are not updated with the tolerations, prior to adding taints.

Workflow from my lab:

1. Tolerations added to odf-operator, rook cm, ocs-storagecluster yaml > placement: all and by individual rook pod.

2. Taint added to nodes.

3. deleted all pods to simulate any ODF/rook pod failure.


oc get pods | grep -v Running
NAME                                                              READY   STATUS     RESTARTS   AGE
csi-addons-controller-manager-7656cbcf45-n2chw                    0/2     Pending    0          21m
noobaa-db-pg-0                                                    0/1     Init:0/2   0          47s
noobaa-operator-764c8b74dc-8q65h                                  0/1     Pending    0          21m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-8467d498h2xmq   0/2     Pending    0          21m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-7767c896nzc4b   0/2     Pending    0          21m
rook-ceph-mgr-a-5dfb7b8979-p6fxv                                  0/2     Pending    0          21m
rook-ceph-mon-a-6ff6cfd6b7-qknq2                                  0/2     Pending    0          21m
rook-ceph-mon-b-7876495dc4-4fxkt                                  0/2     Pending    0          21m
rook-ceph-osd-0-675558d4d7-d7r2b                                  0/2     Pending    0          21m
rook-ceph-osd-1-5d7bbfbcbf-jqg8v                                  0/2     Pending    0          21m
rook-ceph-osd-2-857f6579b7-8827r                                  0/2     Pending    0          21m
rook-ceph-tools-787676bdbd-l9xs8                                  0/1     Pending    0          21m

Comment 8 Travis Nielsen 2022-08-30 17:14:53 UTC
Did you wait for some time between steps 1 and 2? It can take at least a few minutes for all the tolerations to be applied to the rook pods even in a healthy cluster.

Comment 9 khover 2022-08-30 19:01:01 UTC
(In reply to Travis Nielsen from comment #8)
> Did you wait for some time between steps 1 and 2? It can take at least a few
> minutes for all the tolerations to be applied to the rook pods even in a
> healthy cluster.


Yes we waited, cluster was stuck in this state for hours.

Can reproduce every time.

Comment 10 Travis Nielsen 2022-08-30 19:08:39 UTC
You waited hours between steps 1 and 2? What was stuck between steps 1 and 2? If the cluster was healthy, the tolerations should have been applied. Hopefully the rook operator log would show why they weren't applied.

Comment 11 khover 2022-08-30 20:13:00 UTC
(In reply to Travis Nielsen from comment #10)
> You waited hours between steps 1 and 2? What was stuck between steps 1 and
> 2? If the cluster was healthy, the tolerations should have been applied.
> Hopefully the rook operator log would show why they weren't applied.

Hours after step 3. deleting the pods to replicate pod failure, the toleration was not applied to the pods in pending.

The only workaround is to apply tolerations to the deployment witch is not persistent.

Randy and I worked on this most of the day. 8/25 His opinion is this is a blocker triggering this BZ.

If you wish we can reproduce via googlemeet so you can observe the behavior or check logs.

Comment 12 Travis Nielsen 2022-08-30 20:24:38 UTC
You need to delay between steps 1 and 2, otherwise the rook operator has no chance to update the ceph pod specs with the tolerations before the taints are added. Step 3 is too late for the operator to smoothly upgrade the cluster.

Comment 13 khover 2022-08-30 21:15:08 UTC
(In reply to Travis Nielsen from comment #12)
> You need to delay between steps 1 and 2, otherwise the rook operator has no
> chance to update the ceph pod specs with the tolerations before the taints
> are added. Step 3 is too late for the operator to smoothly upgrade the
> cluster.

I see your point now its possible we did not wait long enough.

My test cluster has all the tolerations currently in place as outlined in description since 8/25.

Added:

# oc adm taint nodes -l cluster.ocs.openshift.io/openshift-storage= nodename=true:NoSchedule

Started deleting pods from the original pending list and ended up with only 2 pods pending now that the toleration didnt get passed down to.

NAME                                                              READY   STATUS      RESTARTS     AGE
csi-addons-controller-manager-7656cbcf45-gzqjm                    0/2     Pending     0            12m
csi-cephfsplugin-9znfq                                            3/3     Running     0            11m
csi-cephfsplugin-nzbjk                                            3/3     Running     0            4d1h
csi-cephfsplugin-provisioner-6596b9c55f-pgq77                     6/6     Running     0            4d1h
csi-cephfsplugin-provisioner-6596b9c55f-w7tqc                     6/6     Running     0            4d1h
csi-cephfsplugin-sm865                                            3/3     Running     0            4d1h
csi-rbdplugin-d5mdf                                               4/4     Running     0            4d1h
csi-rbdplugin-provisioner-76494fb89-87qf5                         7/7     Running     0            4d1h
csi-rbdplugin-provisioner-76494fb89-zztpp                         7/7     Running     0            4d1h
csi-rbdplugin-sgrv9                                               4/4     Running     0            4d1h
csi-rbdplugin-v92nf                                               4/4     Running     0            4d1h
noobaa-core-0                                                     1/1     Running     0            4d1h
noobaa-db-pg-0                                                    1/1     Running     0            4m26s
noobaa-endpoint-6ff8bb4df-n7pf7                                   1/1     Running     0            3d22h
noobaa-operator-764c8b74dc-s2ngc                                  1/1     Running     2 (8h ago)   3d22h
ocs-metrics-exporter-5d94446c7b-8kpbz                             1/1     Running     0            4d1h
ocs-operator-5f677b5ddd-rn6gs                                     1/1     Running     0            99s
odf-console-585d5b45b-5mh6q                                       1/1     Running     0            4d1h
odf-operator-controller-manager-c58dd5864-pbtl2                   2/2     Running     0            4d1h
rook-ceph-crashcollector-ip-10-0-138-41.ec2.internal-5dc4c4t8kb   1/1     Running     0            3d22h
rook-ceph-crashcollector-ip-10-0-153-37.ec2.internal-658f6sq6p7   1/1     Running     0            3d22h
rook-ceph-crashcollector-ip-10-0-175-79.ec2.internal-799bd5w6tb   1/1     Running     0            2d20h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5cbcc48fq4mvh   2/2     Running     0            3d22h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-78b6c9887mzrv   2/2     Running     0            3d22h
rook-ceph-mgr-a-55958bcfdb-75mhj                                  2/2     Running     0            3d22h
rook-ceph-mon-a-6f475bb5fc-fq9ch                                  2/2     Running     0            15m
rook-ceph-mon-b-7bfcf555dd-p2xn4                                  2/2     Running     0            3d22h
rook-ceph-mon-c-55fb95c8cc-kpbwc                                  2/2     Running     0            4d1h
rook-ceph-operator-64c6bc6cfd-blm5b                               1/1     Running     0            87s
rook-ceph-osd-0-f8ccfddd4-cwvlb                                   2/2     Running     0            2d20h
rook-ceph-osd-1-58ccbfcc4b-m6kxg                                  2/2     Running     0            3d22h
rook-ceph-osd-2-6dfdb6dd68-8vrvj                                  2/2     Running     0            2m27s
rook-ceph-osd-prepare-73ac7e92a014913461027c97fb1d7aa6-gw7ng      0/1     Completed   0            2m32s
rook-ceph-tools-787676bdbd-m6xzc                                  0/1     Pending     0            10m

# oc get pod rook-ceph-tools-787676bdbd-m6xzc -o yaml

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-30T20:39:22Z"
    message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master:
      }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that
      the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled

# oc get pods csi-addons-controller-manager-7656cbcf45-gzqjm -o yaml

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-30T20:37:35Z"
    message: '0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master:
      }, that the pod didn''t tolerate, 3 node(s) had taint {nodename: true}, that
      the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending

# oc delete OCSInitialization ocsinit
ocsinitialization.ocs.openshift.io "ocsinit" deleted
[root@vm255-30 ~]# oc patch OCSInitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'

rook-ceph-tools-787676bdbd-549jp                                  0/1     Pending     0            12s

I will open a separate BZ for csi-addons.

Still stuck on the tools pod pending. 

Are these tolerations being injected by ODF subs edit now, or ocs-storagecluster ?

Comment 14 Travis Nielsen 2022-08-30 21:17:56 UTC
The tools pod is created by the ocs operator, so looks like at least rook has updated all its expected tolerations now.

Comment 15 Travis Nielsen 2022-09-06 22:51:28 UTC
Can we close this now, or what is remaining?

Comment 16 khover 2022-09-06 23:27:13 UTC
(In reply to Travis Nielsen from comment #15)
> Can we close this now, or what is remaining?

The only outstanding AI would be tools pod. 

Can we append component ? 

Or should I open a separate BZ against ocs operator.

Comment 19 Travis Nielsen 2022-09-07 22:31:52 UTC
There was a Rook fix that Madhu found necessary. The tolerations weren't being properly applied until the rook operator was restarted. This will be fixed by

Comment 20 Travis Nielsen 2022-09-07 22:42:01 UTC
Fixed by https://github.com/rook/rook/pull/10906

Comment 21 Travis Nielsen 2022-09-07 23:22:35 UTC
Merged downstream to 4.12 now with https://github.com/red-hat-storage/rook/pull/409