+++ This bug was initially created as a clone of Bug #2249678 +++ Description of problem (please be detailed as possible and provide log snippests): The multus network address detection job does not derive placement from the CephCluster's "all" placement, only from "osd". This is a bug reported upstream here: https://github.com/rook/rook/issues/13138 This is also in the process of being fixed upstream here: https://github.com/rook/rook/pull/13206 Version of all relevant components (if applicable): ODF v4.14.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No, but it might be an upgrade issue for some existing multus supportex customers. Is there any workaround available to the best of your knowledge? A valid workaround is to have a user experiencing issues who is using the 'all' placement to manually specify cephcluster.spec.network.addressRanges for cluster/public networks. This will cause rook to skip its network address autodetection process. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 - somewhat complex since this requires multus AND CephCluster 'all' placement configs Can this issue reproducible? Yes. Can this issue reproduce from the UI? Not sure If this is a regression, please provide more details to justify this: I believe this is a regression. Customers who are currently using Multus and the 'all' placement spec might hit this issue. Not all users will hit the issue; that depends on if the spec allows the detection job to run on another node in the cluster that has the requisite host networks. Steps to Reproduce: Taint all nodes in the openshift cluster, and then only set the toleration for said taint to the "all" section of the CephCluster. For example, Use this taint... kubectl taint nodes --all node-role.kubernetes.io/storage=true:NoSchedule And this placement spec on CephCluster... placement: all: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/storage operator: Equal value: "true" Actual results: rook-ceph-network-*-canary jobs will remain in pending with an error event like below: Warning FailedScheduling 12s default-scheduler 0/3 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/storage: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.. Expected results: rook-ceph-network-*-canary jobs should be schedulable with 'all' placement settings. --- Additional comment from RHEL Program Management on 2023-11-14 19:54:24 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.15.0' to '?', and so is being proposed to be fixed at the ODF 4.15.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from Blaine Gardner on 2023-11-14 20:12:17 UTC --- As far as QE testing goes, it should be sufficient to include the 'all' placement as part of one of the existing suite of multus tests. I don't see a need to create a new test, and it should be sufficient to only test it once.
This is ready to be merged for 4.14.z here, whenever it is appropriate to do so: https://github.com/red-hat-storage/rook/pull/537
Hi Blaine, Can you check my test procedure and answer the question in section 6? Test Process: 1.Install OCP4.14 2.Install ODF4.14 3.Running validation tool 4.Install Storage clutser with multus 5.Taint all nodes in the openshift cluster kubectl taint nodes --all node-role.kubernetes.io/storage=true:NoSchedule 6.Set toleration in CephCluster: [question: Do I need to add the item to tolerations or tolerations list size will be one? for example: placement: all: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/storage operator: Equal value: "true" OR: placement: all: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/storage operator: Equal value: "true" - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" 7. Verify rook-ceph-network-*-canary job Completed $ oc get jobs -n openshift-storage
I have some small concerns, and I think I can answer your question as well: 2. I think this is probably a typo. ODF shouldn't be installed until after running the validation tool and tainting nodes. Instead, this is the time to apply the new taint to the nodes. - - As a note, the important thing here is that the taint used is not the default taint/toleration built into ODF ("node.ocs.openshift.io/storage=true:NoSchedule") 3. Yes, and an additional need: The validation tool will need to be configured with a toleration for the taint. The latest tool version on the KCS supports a config file for configuring tolerations. `rook multus validation config converged` will print out a config file that's documented with comments that you can use as a starting point. Ping me if you need more help setting up the config file. 4. This is good, with one caveat: The install must use the 'cluster' Multus network. It doesn't matter if 'public' is used or not, but 'cluster' must be used. 5. As noted, this should be step 2 6. I have one concern in addition to trying answering your question: - My concern: Depending on ocs-operator's reconcile strategy, ocs-operator might override the CephCluster placement settings. Setting the toleration via StorageCluster during the initial deployment seems like it might be the best place to specify this. Hopefully that means that there won't be any CI behavior changes based on the ocs-op reconcile strategy. - It seems best to me to only specify a single toleration. It's simpler, plus doing that should also ensure that the test isn't implicitly using the default toleration as well -- helping prevent any false positives if there were to be a regression in the future. - Thus, this probably becomes step 2+4+6 all in one: "Install ODF4.14 with Multus cluster network and custom 'all' placement" 7. Yes, exactly. This is an important validation to check that there is no regression when upgrading from one ODF version to the next, so also make sure this test is run for upgrades if that isn't part of the current plan. - For upgrades, you can make sure it is the correct canary job (i.e., not an older version of the job) by ensuring the canary job is configured with the same RCHS/Ceph container image as the CephCluster spec. 8. Additionally, this environment can be used for other test suites, and it is a good idea to use the non-default environment for them to ensure there aren't other errors as well. I assume that is part of the plan, but it seemed worth mentioning. That all would make the new procedure: 1. Install OCP 2. Apply a unique taint to all non-control-plane nodes 3. Run multus validation tool with toleration config (important to also ensure that there are no errors with validation tool) 4. Install ODF and StorageCluster with - Multus 'cluster' network - Custom 'all' toleration for unique taint from step 2 5. Verify rook-ceph-network-cluster-canary job "Completed" with the expected RHCS container image 6. Continue with other ODF test suites. As an overall note: the test I've suggested assumes the whole cluster is only storage nodes with no worker nodes. This is valid, but I also understand that there could be CI automations that expect one or more worker nodes. If the test needs worker nodes, the procedure will have to factor in adding a node label and placement selector as well.
There is no option to install ODF4.14.1 when all worker nodes tainted kubectl taint nodes argo005.ceph.redhat.com node-role.kubernetes.io/storage=true:NoSchedule node/argo005.ceph.redhat.com tainted $ kubectl describe nodes argo005.ceph.redhat.com | grep Taints Taints: node-role.kubernetes.io/storage=true:NoSchedule $ oc get job 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 -n openshift-marketplace NAME COMPLETIONS DURATION AGE 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 0/1 21h 21h $ oc describe job 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 -n openshift-marketplace Name: 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 Namespace: openshift-marketplace Selector: batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6 Labels: batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6 batch.kubernetes.io/job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6 job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 Annotations: batch.kubernetes.io/job-tracking: Parallelism: 1 Completions: 1 Completion Mode: NonIndexed Start Time: Wed, 29 Nov 2023 17:53:48 +0200 Active Deadline Seconds: 600s Pods Statuses: 0 Active (0 Ready) / 0 Succeeded / 1 Failed Pod Template: Labels: batch.kubernetes.io/controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6 batch.kubernetes.io/job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 controller-uid=2758a2bc-bb6c-4a44-9032-ddf9930e4db6 job-name=1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 Init Containers: util: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8af0a4afdd1d4b263f8365a765bbab04fe8b271710a52b394b285dd29497143a Port: <none> Host Port: <none> Command: /bin/cp -Rv /bin/cpb /util/cpb Requests: cpu: 10m memory: 50Mi Environment: <none> Mounts: /util from util (rw) pull: Image: quay.io/rhceph-dev/odf4-odf-operator-bundle@sha256:d4c5bf429fed12ff3a3330d56fcb80af3651ed5edc73f3080cbf3aa614554e6b Port: <none> Host Port: <none> Command: /util/cpb /bundle Requests: cpu: 10m memory: 50Mi Environment: <none> Mounts: /bundle from bundle (rw) /util from util (rw) Containers: extract: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2ebbbc7f05e939be5adfd0220304888d422cedf8a6807b6ac4da531d2ed6e88a Port: <none> Host Port: <none> Command: opm alpha bundle extract -m /bundle/ -n openshift-marketplace -c 1bd180a90a1d205118da2402a530a9c94838fd0a90283339b7e5c68602f3757 -z Requests: cpu: 10m memory: 50Mi Environment: CONTAINER_IMAGE: quay.io/rhceph-dev/odf4-odf-operator-bundle@sha256:d4c5bf429fed12ff3a3330d56fcb80af3651ed5edc73f3080cbf3aa614554e6b Mounts: /bundle from bundle (rw) Volumes: bundle: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> util: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> Events: <none>
$ oc describe pod redhat-operators-4xkhr -n openshift-marketplace Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 11m (x259 over 21h) default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/storage: true}. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
Verification of this fix won't happen in 4.14.1 timeline. It was agreed to move the bug to 4.14.2 for verification.
I had a chat with Oded today to help get this test running. I had suggested that the ODF document linked -> [1] <- seems like it is the right one to allow ODF to be deployed onto nodes that have custom taints. Oded said the procedure was not working. Oded was also unable to find anyone on the QE team who was familiar with testing that feature. Given that, it seems worth asking whether ODF supports users supplying their own taints/tolerations, affinities, or node selectors. @etamir, @bkunal is this supported for customers? It seems to me that it is at least partially supported since the StorageCluster spec has a `placement` configuration that allows specifying custom placement. But the procedure for allowing the ODF/OCS operators to run on custom-placed nodes is possibly untested. [1] https://docs.openshift.com/container-platform/4.14/nodes/scheduling/nodes-scheduler-taints-tolerations.html#nodes-scheduler-taints-tolerations-projects_nodes-scheduler-taints-tolerations ----- In the mantime, I think Oded can continue to test this by modifying the procedure. 1. Install OCP with 4 nodes 2. Reserve 3 of the 4 nodes for the StorageCluster using unique taints and node labels (i.e., not the preferred ODF ones) - On 3 nodes, apply these: - kubectl taint nodes <node names> custom-storage=true:NoSchedule - kubectl label nodes <node names> custom-storage=true 3. Install ODF without installing the StorageCluster yet - All ODF operators should schedule to the node that does not have the above taint+label 4. Configure Network Attachment Definition(s) 5. Run multus validation tool 6. Install StorageCluster with the following modification to the spec placement: all: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: custom-storage operator: In values: - "true" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/storage operator: Equal value: "true" 7. Verify rook-ceph-network-cluster-canary job "Completed" with the expected RHCS container image using the `--watch` flag of the kubectl command (I suggest json output for parsing) - kubectl -n openshift-storage get job rook-ceph-network-cluster-canary --watch -o json
QE need clarification and hence more time to verify the bug. It was decided to move the bug to 4.14.3 here - https://chat.google.com/room/AAAAREGEba8/1WifqGfpy5U
Hi, I am working with this KCS https://access.redhat.com/articles/6408481 Procedure: 1. Install OCP with 6 worker nodes and 3 master nodes $ oc get nodes NAME STATUS ROLES AGE VERSION compute-0 Ready worker 8h v1.27.8+4fab27b compute-1 Ready worker 8h v1.27.8+4fab27b compute-2 Ready worker 8h v1.27.8+4fab27b compute-3 Ready worker 8h v1.27.8+4fab27b compute-4 Ready worker 8h v1.27.8+4fab27b compute-5 Ready worker 8h v1.27.8+4fab27b control-plane-0 Ready control-plane,master 9h v1.27.8+4fab27b control-plane-1 Ready control-plane,master 9h v1.27.8+4fab27b control-plane-2 Ready control-plane,master 9h v1.27.8+4fab27b 2. Install ODF Operator 3. Install StorageCluster with multus [cluster-net + public-net] 4. Add taint to comute-0 node $ kubectl taint nodes compute-0 custom-storage=true:NoSchedule $ kubectl label nodes compute-0 custom-storage=true 5.Edit Storagecluster placement: all: tolerations: - effect: NoSchedule key: custom-storage operator: In value: "true" mds: tolerations: - effect: NoSchedule key: custom-storage operator: In value: "true" noobaa-core: tolerations: - effect: NoSchedule key: custom-storage operator: In value: "true" rgw: tolerations: - effect: NoSchedule key: custom-storage operator: In value: "true" 6.run "oc get pods -w" and "oc get job -w" $ oc get jobs -w NAME COMPLETIONS DURATION AGE rook-ceph-network-public-canary 0/1 0s rook-ceph-network-cluster-canary 0/1 0s rook-ceph-network-cluster-canary 0/1 0s 0s rook-ceph-network-public-canary 0/1 0s 0s rook-ceph-network-cluster-canary 0/1 4s 4s rook-ceph-network-cluster-canary 0/1 5s 5s rook-ceph-network-public-canary 0/1 5s 5s rook-ceph-network-cluster-canary 0/1 6s 6s rook-ceph-network-public-canary 0/1 6s 6s rook-ceph-network-public-canary 0/1 7s 7s $ oc get pods -w rook-ceph-network-public-canary-z7dhd 0/1 Pending 0 0s rook-ceph-network-cluster-canary-lvhfn 0/1 Pending 0 0s rook-ceph-network-cluster-canary-lvhfn 0/1 Pending 0 0s rook-ceph-network-public-canary-z7dhd 0/1 Pending 0 0s rook-ceph-network-public-canary-z7dhd 0/1 Pending 0 0s rook-ceph-network-cluster-canary-lvhfn 0/1 Pending 0 0s rook-ceph-network-cluster-canary-lvhfn 0/1 Init:0/1 0 0s rook-ceph-network-public-canary-z7dhd 0/1 Init:0/1 0 0s rook-ceph-network-cluster-canary-lvhfn 0/1 Init:0/1 0 1s rook-ceph-network-cluster-canary-lvhfn 0/1 Init:0/1 0 2s rook-ceph-network-public-canary-z7dhd 0/1 Init:0/1 0 2s rook-ceph-network-public-canary-z7dhd 0/1 Init:0/1 0 3s rook-ceph-network-cluster-canary-lvhfn 0/1 PodInitializing 0 4s rook-ceph-network-cluster-canary-lvhfn 0/1 PodInitializing 0 4s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 4s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 5s rook-ceph-network-public-canary-z7dhd 0/1 PodInitializing 0 5s rook-ceph-network-public-canary-z7dhd 0/1 PodInitializing 0 5s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 5s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 6s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 6s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 6s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 6s rook-ceph-network-cluster-canary-lvhfn 0/1 Terminating 0 6s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 7s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 7s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 7s rook-ceph-network-public-canary-z7dhd 0/1 Terminating 0 7s Blaine, Can you check this procedure?
Moving the bug to 4.14.4 as we are doing a quick 4.14.3 to include a critical fix at RGW (2254303) before to shutdown
I added the flag, please update the doc text
The rook-ceph operator pod in pending state because: ---- ------ ---- ---- ------- Warning FailedScheduling 4m29s default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {custom-storage: true}, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.. 1.Install ODF4.14.4 operator: $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.14.4-rhodf NooBaa Operator 4.14.4-rhodf mcg-operator.v4.14.3-rhodf Succeeded ocs-operator.v4.14.4-rhodf OpenShift Container Storage 4.14.4-rhodf ocs-operator.v4.14.3-rhodf Succeeded odf-csi-addons-operator.v4.14.4-rhodf CSI Addons 4.14.4-rhodf odf-csi-addons-operator.v4.14.3-rhodf Succeeded odf-operator.v4.14.4-rhodf OpenShift Data Foundation 4.14.4-rhodf odf-operator.v4.14.3-rhodf Succeeded 2.Create public and privat nad: --- apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: public-net namespace: default labels: {} annotations: {} spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }' --- apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: cluster-net namespace: default labels: {} annotations: {} spec: config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.30.0/24" } }' 3.Taint nodes: kubectl taint nodes compute-0 custom-storage=true:NoSchedule kubectl label nodes compute-0 custom-storage=true kubectl taint nodes compute-1 custom-storage=true:NoSchedule kubectl label nodes compute-1 custom-storage=true kubectl taint nodes compute-2 custom-storage=true:NoSchedule kubectl label nodes compute-2 custom-storage=true 4.Apply storagesystem --- apiVersion: odf.openshift.io/v1alpha1 kind: StorageSystem metadata: name: ocs-storagecluster-storagesystem namespace: openshift-storage spec: kind: storagecluster.ocs.openshift.io/v1 name: ocs-storagecluster namespace: openshift-storage 5.Create thin storage class: --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "false" name: thin-csi-odf parameters: StoragePolicyName: "vSAN Default Storage Policy" provisioner: csi.vsphere.vmware.com allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer 6.Create Storagecluster: --- apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: name: ocs-storagecluster namespace: openshift-storage spec: resources: mds: Limits: null Requests: null mgr: Limits: null Requests: null mon: Limits: null Requests: null noobaa-core: Limits: null Requests: null noobaa-db: Limits: null Requests: null noobaa-endpoint: limits: cpu: 1 memory: 500Mi requests: cpu: 1 memory: 500Mi rgw: Limits: null Requests: null storageDeviceSets: - count: 1 dataPVCTemplate: spec: accessModes: - ReadWriteOnce resources: requests: storage: 256Gi storageClassName: thin-csi-odf volumeMode: Block name: ocs-deviceset placement: all: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: custom-storage operator: In values: - "true" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/storage operator: Equal value: "true" network: provider: multus selectors: cluster: default/cluster-net public: default/public-net portable: true replica: 3 resources: Limits: null Requests: null --- 7.Check rook-ceph operator pod $ oc get pod rook-ceph-operator-7b7b6b8d5c-q6kzt NAME READY STATUS RESTARTS AGE rook-ceph-operator-7b7b6b8d5c-q6kzt 0/1 Pending 0 3m8s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m29s default-scheduler 0/6 nodes are available: 3 node(s) had untolerated taint {custom-storage: true}, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
I ran this procedure here https://bugzilla.redhat.com/show_bug.cgi?id=2249735#c23. But Blaine thinks this is not the right process
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days