Description of problem: on the OSD cluster, the storage API pod is running on the infra node. Version-Release number of selected component (if applicable): OCP cluster version: 4.10.8 ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Succeeded ocs-osd-deployer.v2.0.0 OCS OSD Deployer 2.0.0 Succeeded odf-csi-addons-operator.v4.10.0 CSI Addons 4.10.0 Succeeded odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 Succeeded ose-prometheus-operator.4.8.0 Prometheus Operator 4.8.0 Succeeded route-monitor-operator.v0.1.408-c2256a2 Route Monitor Operator 0.1.408-c2256a2 route-monitor-operator.v0.1.406-54ff884 Succeeded provider add-on ID: producible on ocs-provider and ocs-provider QE both How reproducible: Steps to Reproduce: 1.Deploy Managed service cluster 2. 3. Actual results: API pod is scheduled is running on infra node Expected results: In OSD cluster, the storage cluster node selector only allows the schedule of API pods only on the worker node and not on the infra node Additional info: more discussion in https://chat.google.com/room/AAAASHA9vWs/evBXthCPcMQ http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/must-gather.local.7772965443419395352/ http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/logs/ocs_must_gather/ http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/nohup.out
@sgatfane The IP provided on the storage cluster has no relation to where the API pod is scheduled as we can use any node to access to provider IP. Can you please add logs showing the pod running on an infra node as a reproduction? In any case, this is not the correct component for this bug, moving it to ocs-operator
@omitrani Not sure why this was moved to the ocs-operator component. Wouldn't the deployer be able to take care of this by configuring an appropriate node selector in the StorageCluster Spec?
@jrivera To the best of my understanding, the deployer is already configuring the selector correctly (worker, non-infra nodes) I think this bug shows that the storage API deployment, created by ocs-operator, is not respecting the given selector, which makes this an ocs-operator bug
(In reply to Ohad from comment #4) > @jrivera To the best of my understanding, the deployer is already > configuring the selector correctly (worker, non-infra nodes) > I think this bug shows that the storage API deployment, created by > ocs-operator, is not respecting the given selector, which makes this an > ocs-operator bug I think we are talking about the provider server here, We will need a few API changes in the storage cluster to get the node selector from the user. Then we can deploy the provider server using the node selectors. Moving it out for now.
We do not support MS deployment any More. hence this can not be verified directly on MS setup . However I have verified this BZ indirectly with below 2 types of verifications: 1. Using Provider Client Setups and 2. Verification of RHSTOR-5098 story ----------------------------------------- 1. Provider client setups. This setup have of master,control plane and worker nodes. Verified on 2 different setups with provider ocs-client-operator.v4.16.0-94.stable and ocs-client-operator.v4.16.0-96.stable "StorageproviderEndpoint" is selected the IP of "control-plane,master,worker" node and "worker" node in second setup. 2. Verification of RHSTOR-5098 story: Happy path validation of this story is already completed and no issue found yet. Which implies now OpenShift Data Foundation is allowed to use a custom taint. https://docs.google.com/document/d/1n7gICNT6m8MrYD54WK9E55NkbO0uNYM25R2_iXEOKac/edit?usp=sharing Conclusion: Based on Comment#4 the deployer is already configuring the selector correctly (worker, non-infra nodes). In Provider client deployment only non-infra worker nodes IP is selected for storageProviderEndpoint. Happy path Verification of RHSTOR-5098 indicates that now ocs-operator is now supported for the custom taint. Hence moving this BZ to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days