Bug 2078270

Summary: In OSD cluster, storage cluster node selector allowing scheduling storage api pods on infra node
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: suchita <sgatfane>
Component: ocs-operatorAssignee: Nitin Goyal <nigoyal>
Status: POST --- QA Contact: suchita <sgatfane>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, kramdoss, nigoyal, odf-bz-bot, sgatfane, sostapov
Target Milestone: ---Flags: omitrani: needinfo? (sgatfane)
omitrani: needinfo? (jrivera)
Target Release: ODF 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description suchita 2022-04-25 02:11:32 UTC
Description of problem:
on the OSD cluster, the storage API pod is running on the infra node. 





Version-Release number of selected component (if applicable):
OCP cluster version: 4.10.8
========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.0                   OCS OSD Deployer              2.0.0                                                       Succeeded
odf-csi-addons-operator.v4.10.0           CSI Addons                    4.10.0                                                      Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded
route-monitor-operator.v0.1.408-c2256a2   Route Monitor Operator        0.1.408-c2256a2   route-monitor-operator.v0.1.406-54ff884   Succeeded

provider add-on ID: producible on ocs-provider and ocs-provider QE both

How reproducible:


Steps to Reproduce:
1.Deploy Managed service cluster
2.
3.

Actual results:
API pod is scheduled is running on infra node
Expected results:
In OSD cluster, the storage cluster node selector only allows the schedule of API pods only on the worker node and not on the infra node



Additional info:

more discussion in https://chat.google.com/room/AAAASHA9vWs/evBXthCPcMQ
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/must-gather.local.7772965443419395352/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/logs/ocs_must_gather/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/nohup.out

Comment 1 Ohad 2022-04-25 05:20:45 UTC
@sgatfane 
The IP provided on the storage cluster has no relation to where the API pod is scheduled as we can use any node to access to provider IP. 
Can you please add logs showing the pod running on an infra node as a reproduction?

In any case, this is not the correct component for this bug, moving it to ocs-operator

Comment 3 Jose A. Rivera 2022-06-17 14:42:54 UTC
@omitrani Not sure why this was moved to the ocs-operator component. Wouldn't the deployer be able to take care of this by configuring an appropriate node selector in the StorageCluster Spec?

Comment 4 Ohad 2022-06-22 08:01:46 UTC
@jrivera To the best of my understanding, the deployer is already configuring the selector correctly (worker, non-infra nodes)
I think this bug shows that the storage API deployment, created by ocs-operator, is not respecting the given selector, which makes this an ocs-operator bug

Comment 13 Nitin Goyal 2023-03-10 06:22:13 UTC
(In reply to Ohad from comment #4)
> @jrivera To the best of my understanding, the deployer is already
> configuring the selector correctly (worker, non-infra nodes)
> I think this bug shows that the storage API deployment, created by
> ocs-operator, is not respecting the given selector, which makes this an
> ocs-operator bug

I think we are talking about the provider server here, We will need a few API changes in the storage cluster to get the node selector from the user. Then we can deploy the provider server using the node selectors.

Moving it out for now.