2249735 – [4.14 clone] the multus network address detection job does not derive placement configs from CephCluster "all" placement

Bug 2249735 - [4.14 clone] the multus network address detection job does not derive placement configs from CephCluster "all" placement

Summary: [4.14 clone] the multus network address detection job does not derive placeme...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.14.5
Assignee:	Blaine Gardner
QA Contact:	Oded
Docs Contact:
URL:
Whiteboard:
Depends On:	2249678
Blocks:
TreeView+	depends on / blocked

Reported:	2023-11-15 04:20 UTC by Mudit Agarwal
Modified:	2024-07-17 04:25 UTC (History)
CC List:	15 users (show)
Fixed In Version:	4.14.1-8
Doc Type:	No Doc Update
Doc Text:
Clone Of:	2249678
Environment:
Last Closed:	2024-03-18 11:40:50 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage rook pull 537	None	open	Bug 2249735: fix multus net detect job all placement	2023-11-21 07:16:17 UTC
Github	rook rook pull 13206	None	Merged	multus: fix placement error for net addr detect job	2023-11-15 04:22:04 UTC
Red Hat Knowledge Base (Solution)	7044764	None	None	None	2023-11-20 02:45:54 UTC

Description Mudit Agarwal 2023-11-15 04:20:46 UTC

+++ This bug was initially created as a clone of Bug #2249678 +++

Description of problem (please be detailed as possible and provide log
snippests):

The multus network address detection job does not derive placement from the CephCluster's "all" placement, only from "osd". This is a bug reported upstream here: https://github.com/rook/rook/issues/13138


This is also in the process of being fixed upstream here: https://github.com/rook/rook/pull/13206



Version of all relevant components (if applicable): ODF v4.14.0


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No, but it might be an upgrade issue for some existing multus supportex customers. 


Is there any workaround available to the best of your knowledge?

A valid workaround is to have a user experiencing issues who is using the 'all' placement to manually specify cephcluster.spec.network.addressRanges for cluster/public networks. This will cause rook to skip its network address autodetection process.



Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

3 - somewhat complex since this requires multus AND CephCluster 'all' placement configs


Can this issue reproducible?

Yes. 


Can this issue reproduce from the UI?

Not sure


If this is a regression, please provide more details to justify this:

I believe this is a regression. Customers who are currently using Multus and the 'all' placement spec might hit this issue. Not all users will hit the issue; that depends on if the spec allows the detection job to run on another node in the cluster that has the requisite host networks.


Steps to Reproduce:

Taint all nodes in the openshift cluster, and then only set the toleration for said taint to the "all" section of the CephCluster.

For example, Use this taint...

kubectl taint nodes --all node-role.kubernetes.io/storage=true:NoSchedule


And this placement spec on CephCluster...

  placement:
    all:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/storage
          operator: Equal
          value: "true"


Actual results:

rook-ceph-network-*-canary jobs will remain in pending with an error event like below:

Warning  FailedScheduling  12s   default-scheduler  0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/storage: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..



Expected results:

rook-ceph-network-*-canary jobs should be schedulable with 'all' placement settings.

--- Additional comment from RHEL Program Management on 2023-11-14 19:54:24 UTC ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.15.0' to '?', and so is being proposed to be fixed at the ODF 4.15.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Blaine Gardner on 2023-11-14 20:12:17 UTC ---

As far as QE testing goes, it should be sufficient to include the 'all' placement as part of one of the existing suite of multus tests. I don't see a need to create a new test, and it should be sufficient to only test it once.

Comment 3 Blaine Gardner 2023-11-15 16:52:09 UTC

This is ready to be merged for 4.14.z here, whenever it is appropriate to do so: https://github.com/red-hat-storage/rook/pull/537

Comment 8 Oded 2023-11-23 13:00:29 UTC

Hi Blaine, 

Can you check my test procedure and answer the question in section 6?


Test Process:
1.Install OCP4.14
2.Install ODF4.14
3.Running validation tool
4.Install Storage clutser with multus
5.Taint all nodes in the openshift cluster
kubectl taint nodes --all node-role.kubernetes.io/storage=true:NoSchedule
6.Set toleration in CephCluster: [question: Do I need to add the item to tolerations or tolerations list size will be one?
for example:
  placement:
    all:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/storage
          operator: Equal
          value: "true"

OR:

  placement:
    all:
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/storage
          operator: Equal
          value: "true"
        - effect: NoSchedule
          key: node.ocs.openshift.io/storage
          operator: Equal
          value: "true"

7. Verify rook-ceph-network-*-canary job  Completed
$ oc get jobs -n openshift-storage

Comment 9 Blaine Gardner 2023-11-27 18:31:05 UTC

I have some small concerns, and I think I can answer your question as well:

2. I think this is probably a typo. ODF shouldn't be installed until after running the validation tool and tainting nodes. Instead, this is the time to apply the new taint to the nodes.
- - As a note, the important thing here is that the taint used is not the default taint/toleration built into ODF ("node.ocs.openshift.io/storage=true:NoSchedule")

3. Yes, and an additional need: The validation tool will need to be configured with a toleration for the taint. The latest tool version on the KCS supports a config file for configuring tolerations. `rook multus validation config converged` will print out a config file that's documented with comments that you can use as a starting point. Ping me if you need more help setting up the config file.

4. This is good, with one caveat: The install must use the 'cluster' Multus network. It doesn't matter if 'public' is used or not, but 'cluster' must be used.

5. As noted, this should be step 2

6. I have one concern in addition to trying answering your question:
- My concern: Depending on ocs-operator's reconcile strategy, ocs-operator might override the CephCluster placement settings. Setting the toleration via StorageCluster during the initial deployment seems like it might be the best place to specify this. Hopefully that means that there won't be any CI behavior changes based on the ocs-op reconcile strategy.
- It seems best to me to only specify a single toleration. It's simpler, plus doing that should also ensure that the test isn't implicitly using the default toleration as well -- helping prevent any false positives if there were to be a regression in the future.
- Thus, this probably becomes step 2+4+6 all in one: "Install ODF4.14 with Multus cluster network and custom 'all' placement"

7. Yes, exactly. This is an important validation to check that there is no regression when upgrading from one ODF version to the next, so also make sure this test is run for upgrades if that isn't part of the current plan.
- For upgrades, you can make sure it is the correct canary job (i.e., not an older version of the job) by ensuring the canary job is configured with the same RCHS/Ceph container image as the CephCluster spec.

8. Additionally, this environment can be used for other test suites, and it is a good idea to use the non-default environment for them to ensure there aren't other errors as well. I assume that is part of the plan, but it seemed worth mentioning.

That all would make the new procedure:
1. Install OCP
2. Apply a unique taint to all non-control-plane nodes
3. Run multus validation tool with toleration config (important to also ensure that there are no errors with validation tool)
4. Install ODF and StorageCluster with
- Multus 'cluster' network
- Custom 'all' toleration for unique taint from step 2
5. Verify rook-ceph-network-cluster-canary job "Completed" with the expected RHCS container image
6. Continue with other ODF test suites.

As an overall note: the test I've suggested assumes the whole cluster is only storage nodes with no worker nodes. This is valid, but I also understand that there could be CI automations that expect one or more worker nodes. If the test needs worker nodes, the procedure will have to factor in adding a node label and placement selector as well.

Comment 10 Oded 2023-11-30 13:46:11 UTC

There is no option to install ODF4.14.1 when all worker nodes tainted 

kubectl taint nodes argo005.ceph.redhat.com node-role.kubernetes.io/storage=true:NoSchedule
node/argo005.ceph.redhat.com tainted
$ kubectl describe nodes argo005.ceph.redhat.com | grep  Taints
Taints:             node-role.kubernetes.io/storage=true:NoSchedule


$ oc get job 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 -n openshift-marketplace
NAME                                                              COMPLETIONS   DURATION   AGE
1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757   0/1           21h        21h


$ oc describe job 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 -n openshift-marketplace
Name:                     1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
Namespace:                openshift-marketplace
Selector:                 batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6
Labels:                   batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6
                          batch.kubernetes.io/job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
                          controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6
                          job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
Annotations:              batch.kubernetes.io/job-tracking: 
Parallelism:              1
Completions:              1
Completion Mode:          NonIndexed
Start Time:               Wed, 29 Nov 2023 17:53:48 +0200
Active Deadline Seconds:  600s
Pods Statuses:            0 Active (0 Ready) / 0 Succeeded / 1 Failed
Pod Template:
  Labels:  batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6
           batch.kubernetes.io/job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
           controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6
           job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
  Init Containers:
   util:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8af0a4afdd1d4b263f8365a765bbab04fe8b271710a52b394b285dd29497143a
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/cp
      -Rv
      /bin/cpb
      /util/cpb
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /util from util (rw)
   pull:
    Image:      quay.io/rhceph-dev/odf4-odf-operator-bundle@sha256:d4c5bf429fed12ff3a3330d56fcb80af3651ed5edc73f3080cbf3aa614554e6b
    Port:       <none>
    Host Port:  <none>
    Command:
      /util/cpb
      /bundle
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /bundle from bundle (rw)
      /util from util (rw)
  Containers:
   extract:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ebbbc7f05e939be5adfd0220304888d422cedf8a6807b6ac4da531d2ed6e88a
    Port:       <none>
    Host Port:  <none>
    Command:
      opm
      alpha
      bundle
      extract
      -m
      /bundle/
      -n
      openshift-marketplace
      -c
      1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757
      -z
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      CONTAINER_IMAGE:  quay.io/rhceph-dev/odf4-odf-operator-bundle@sha256:d4c5bf429fed12ff3a3330d56fcb80af3651ed5edc73f3080cbf3aa614554e6b
    Mounts:
      /bundle from bundle (rw)
  Volumes:
   bundle:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   util:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
Events:         <none>

Comment 11 Oded 2023-11-30 13:47:54 UTC

$ oc describe pod  redhat-operators-4xkhr -n openshift-marketplace

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  11m (x259 over 21h)  default-scheduler  0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/storage: true}. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

Comment 16 krishnaram Karthick 2023-12-07 08:16:05 UTC

Verification of this fix won't happen in 4.14.1 timeline. 
It was agreed to move the bug to 4.14.2 for verification.

Comment 18 Blaine Gardner 2023-12-12 19:10:51 UTC

I had a chat with Oded today to help get this test running.

I had suggested that the ODF document linked -> [1] <- seems like it is the right one to allow ODF to be deployed onto nodes that have custom taints. Oded said the procedure was not working. Oded was also unable to find anyone on the QE team who was familiar with testing that feature.

Given that, it seems worth asking whether ODF supports users supplying their own taints/tolerations, affinities, or node selectors. @etamir, @bkunal is this supported for customers?

It seems to me that it is at least partially supported since the StorageCluster spec has a `placement` configuration that allows specifying custom placement. But the procedure for allowing the ODF/OCS operators to run on custom-placed nodes is possibly untested.

[1] https://docs.openshift.com/container-platform/4.14/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-projects_nodes-scheduler-taints-tolerations

-----

In the mantime, I think Oded can continue to test this by modifying the procedure.

1. Install OCP with 4 nodes
2. Reserve 3 of the 4 nodes for the StorageCluster using unique taints and node labels (i.e., not the preferred ODF ones)
- On 3 nodes, apply these:
- kubectl taint nodes <node names> custom-storage=true:NoSchedule
- kubectl label nodes <node names> custom-storage=true
3. Install ODF without installing the StorageCluster yet
- All ODF operators should schedule to the node that does not have the above taint+label
4. Configure Network Attachment Definition(s)
5. Run multus validation tool
6. Install StorageCluster with the following modification to the spec
placement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: custom-storage
operator: In
values:
- "true"
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/storage
operator: Equal
value: "true"
7. Verify rook-ceph-network-cluster-canary job "Completed" with the expected RHCS container image using the `--watch` flag of the kubectl command (I suggest json output for parsing)
- kubectl -n openshift-storage get job rook-ceph-network-cluster-canary --watch -o json

Comment 21 krishnaram Karthick 2023-12-13 12:07:55 UTC

QE need clarification and hence more time to verify the bug. It was decided to move the bug to 4.14.3 here - https://chat.google.com/room/AAAAREGEba8/1WifqGfpy5U

Comment 23 Oded 2023-12-13 15:55:32 UTC

Hi, 
I am working with this KCS https://access.redhat.com/articles/6408481

Procedure:
1. Install OCP with 6 worker nodes and 3 master nodes
$ oc get nodes
NAME              STATUS   ROLES                  AGE   VERSION
compute-0         Ready    worker                 8h    v1.27.8+4fab27b
compute-1         Ready    worker                 8h    v1.27.8+4fab27b
compute-2         Ready    worker                 8h    v1.27.8+4fab27b
compute-3         Ready    worker                 8h    v1.27.8+4fab27b
compute-4         Ready    worker                 8h    v1.27.8+4fab27b
compute-5         Ready    worker                 8h    v1.27.8+4fab27b
control-plane-0   Ready    control-plane,master   9h    v1.27.8+4fab27b
control-plane-1   Ready    control-plane,master   9h    v1.27.8+4fab27b
control-plane-2   Ready    control-plane,master   9h    v1.27.8+4fab27b

2. Install ODF Operator

3. Install StorageCluster with multus [cluster-net + public-net]

4. Add taint to comute-0 node
$ kubectl taint nodes compute-0 custom-storage=true:NoSchedule
$ kubectl label nodes compute-0 custom-storage=true

5.Edit Storagecluster 
placement:
  all:
    tolerations:
    - effect: NoSchedule
      key: custom-storage
      operator: In
      value: "true"
  mds:
    tolerations:
    - effect: NoSchedule
      key: custom-storage
      operator: In
      value: "true"
  noobaa-core:
    tolerations:
    - effect: NoSchedule
      key: custom-storage
      operator: In
      value: "true"
  rgw:
    tolerations:
    - effect: NoSchedule
      key: custom-storage
      operator: In
      value: "true"
 
6.run "oc get pods -w" and "oc get job -w"
$ oc get jobs -w
NAME                                                     COMPLETIONS   DURATION   AGE
rook-ceph-network-public-canary                          0/1                      0s
rook-ceph-network-cluster-canary                         0/1                      0s
rook-ceph-network-cluster-canary                         0/1           0s         0s
rook-ceph-network-public-canary                          0/1           0s         0s
rook-ceph-network-cluster-canary                         0/1           4s         4s
rook-ceph-network-cluster-canary                         0/1           5s         5s
rook-ceph-network-public-canary                          0/1           5s         5s
rook-ceph-network-cluster-canary                         0/1           6s         6s
rook-ceph-network-public-canary                          0/1           6s         6s
rook-ceph-network-public-canary                          0/1           7s         7s

$ oc get pods -w
rook-ceph-network-public-canary-z7dhd                             0/1     Pending             0              0s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Pending             0              0s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Pending             0              0s
rook-ceph-network-public-canary-z7dhd                             0/1     Pending             0              0s
rook-ceph-network-public-canary-z7dhd                             0/1     Pending             0              0s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Pending             0              0s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Init:0/1            0              0s
rook-ceph-network-public-canary-z7dhd                             0/1     Init:0/1            0              0s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Init:0/1            0              1s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Init:0/1            0              2s
rook-ceph-network-public-canary-z7dhd                             0/1     Init:0/1            0              2s
rook-ceph-network-public-canary-z7dhd                             0/1     Init:0/1            0              3s
rook-ceph-network-cluster-canary-lvhfn                            0/1     PodInitializing     0              4s
rook-ceph-network-cluster-canary-lvhfn                            0/1     PodInitializing     0              4s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              4s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              5s
rook-ceph-network-public-canary-z7dhd                             0/1     PodInitializing     0              5s
rook-ceph-network-public-canary-z7dhd                             0/1     PodInitializing     0              5s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              5s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              6s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              6s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              6s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              6s
rook-ceph-network-cluster-canary-lvhfn                            0/1     Terminating         0              6s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              7s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              7s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              7s
rook-ceph-network-public-canary-z7dhd                             0/1     Terminating         0              7s

Blaine, Can you check this procedure?

Comment 26 krishnaram Karthick 2023-12-15 06:08:03 UTC

Moving the bug to 4.14.4 as we are doing a quick 4.14.3 to include a critical fix at RGW (2254303) before to shutdown

Comment 30 Mudit Agarwal 2024-01-05 05:24:54 UTC

I added the flag, please update the doc text

Comment 31 Oded 2024-01-07 13:38:03 UTC

The rook-ceph operator pod in pending state because:
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  4m29s  default-scheduler  0/6 nodes are available: 3 node(s) had untolerated taint {custom-storage: true}, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..


1.Install ODF4.14.4 operator:
$ oc get csv -n openshift-storage
NAME                                    DISPLAY                       VERSION        REPLACES                                PHASE
mcg-operator.v4.14.4-rhodf              NooBaa Operator               4.14.4-rhodf   mcg-operator.v4.14.3-rhodf              Succeeded
ocs-operator.v4.14.4-rhodf              OpenShift Container Storage   4.14.4-rhodf   ocs-operator.v4.14.3-rhodf              Succeeded
odf-csi-addons-operator.v4.14.4-rhodf   CSI Addons                    4.14.4-rhodf   odf-csi-addons-operator.v4.14.3-rhodf   Succeeded
odf-operator.v4.14.4-rhodf              OpenShift Data Foundation     4.14.4-rhodf   odf-operator.v4.14.3-rhodf              Succeeded

2.Create public and privat nad:
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: public-net
 namespace: default
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: cluster-net
 namespace: default
 labels: {}
 annotations: {}
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.30.0/24" } }'


3.Taint nodes:
kubectl taint nodes compute-0 custom-storage=true:NoSchedule
kubectl label nodes compute-0 custom-storage=true

kubectl taint nodes compute-1 custom-storage=true:NoSchedule
kubectl label nodes compute-1 custom-storage=true

kubectl taint nodes compute-2 custom-storage=true:NoSchedule
kubectl label nodes compute-2 custom-storage=true

4.Apply storagesystem
---
apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
metadata:
  name: ocs-storagecluster-storagesystem
  namespace: openshift-storage
spec:
  kind: storagecluster.ocs.openshift.io/v1
  name: ocs-storagecluster
  namespace: openshift-storage

5.Create thin storage class:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  name: thin-csi-odf
parameters:
  StoragePolicyName: "vSAN Default Storage Policy"
provisioner: csi.vsphere.vmware.com
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

6.Create Storagecluster:

---
apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
  name: ocs-storagecluster
  namespace: openshift-storage
spec:
  resources:
    mds:
      Limits: null
      Requests: null
    mgr:
      Limits: null
      Requests: null
    mon:
      Limits: null
      Requests: null
    noobaa-core:
      Limits: null
      Requests: null
    noobaa-db:
      Limits: null
      Requests: null
    noobaa-endpoint:
      limits:
        cpu: 1
        memory: 500Mi
      requests:
        cpu: 1
        memory: 500Mi
    rgw:
      Limits: null
      Requests: null
  storageDeviceSets:
  - count: 1
    dataPVCTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 256Gi
        storageClassName: thin-csi-odf
        volumeMode: Block
    name: ocs-deviceset
    placement:
      all:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: custom-storage
                  operator: In
                  values:
                    - "true"
        tolerations:
          - effect: NoSchedule
            key: node-role.kubernetes.io/storage
            operator: Equal
            value: "true"
    network:
      provider: multus
      selectors:
        cluster: default/cluster-net
        public: default/public-net
    portable: true
    replica: 3
    resources:
      Limits: null
      Requests: null
---

7.Check rook-ceph operator pod
$ oc get pod rook-ceph-operator-7b7b6b8d5c-q6kzt
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-7b7b6b8d5c-q6kzt   0/1     Pending   0          3m8s

Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  4m29s  default-scheduler  0/6 nodes are available: 3 node(s) had untolerated taint {custom-storage: true}, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

Comment 33 Oded 2024-01-11 12:01:30 UTC

I ran this procedure here https://bugzilla.redhat.com/show_bug.cgi?id=2249735#c23. But Blaine thinks this is not the right process

Comment 47 Red Hat Bugzilla 2024-07-17 04:25:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.