Bug 2078270 - In OSD cluster, storage cluster node selector allowing scheduling storage api pods on infra node [NEEDINFO]
Summary: In OSD cluster, storage cluster node selector allowing scheduling storage ap...
Keywords:
Status: POST
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.14.0
Assignee: Nitin Goyal
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-25 02:11 UTC by suchita
Modified: 2023-08-17 06:19 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
omitrani: needinfo? (sgatfane)
omitrani: needinfo? (jrivera)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2133 0 None open provider: add mechanism to constrain the ocs-provider-server 2023-08-11 09:45:05 UTC

Description suchita 2022-04-25 02:11:32 UTC
Description of problem:
on the OSD cluster, the storage API pod is running on the infra node. 





Version-Release number of selected component (if applicable):
OCP cluster version: 4.10.8
========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.0                   OCS OSD Deployer              2.0.0                                                       Succeeded
odf-csi-addons-operator.v4.10.0           CSI Addons                    4.10.0                                                      Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded
route-monitor-operator.v0.1.408-c2256a2   Route Monitor Operator        0.1.408-c2256a2   route-monitor-operator.v0.1.406-54ff884   Succeeded

provider add-on ID: producible on ocs-provider and ocs-provider QE both

How reproducible:


Steps to Reproduce:
1.Deploy Managed service cluster
2.
3.

Actual results:
API pod is scheduled is running on infra node
Expected results:
In OSD cluster, the storage cluster node selector only allows the schedule of API pods only on the worker node and not on the infra node



Additional info:

more discussion in https://chat.google.com/room/AAAASHA9vWs/evBXthCPcMQ
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/must-gather.local.7772965443419395352/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/logs/ocs_must_gather/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/nohup.out

Comment 1 Ohad 2022-04-25 05:20:45 UTC
@sgatfane 
The IP provided on the storage cluster has no relation to where the API pod is scheduled as we can use any node to access to provider IP. 
Can you please add logs showing the pod running on an infra node as a reproduction?

In any case, this is not the correct component for this bug, moving it to ocs-operator

Comment 3 Jose A. Rivera 2022-06-17 14:42:54 UTC
@omitrani Not sure why this was moved to the ocs-operator component. Wouldn't the deployer be able to take care of this by configuring an appropriate node selector in the StorageCluster Spec?

Comment 4 Ohad 2022-06-22 08:01:46 UTC
@jrivera To the best of my understanding, the deployer is already configuring the selector correctly (worker, non-infra nodes)
I think this bug shows that the storage API deployment, created by ocs-operator, is not respecting the given selector, which makes this an ocs-operator bug

Comment 13 Nitin Goyal 2023-03-10 06:22:13 UTC
(In reply to Ohad from comment #4)
> @jrivera To the best of my understanding, the deployer is already
> configuring the selector correctly (worker, non-infra nodes)
> I think this bug shows that the storage API deployment, created by
> ocs-operator, is not respecting the given selector, which makes this an
> ocs-operator bug

I think we are talking about the provider server here, We will need a few API changes in the storage cluster to get the node selector from the user. Then we can deploy the provider server using the node selectors.

Moving it out for now.


Note You need to log in before you can comment on or make changes to this bug.