2078270 – In OSD cluster, storage cluster node selector allowing scheduling storage api pods on infra node

Bug 2078270 - In OSD cluster, storage cluster node selector allowing scheduling storage api pods on infra node

Summary: In OSD cluster, storage cluster node selector allowing scheduling storage ap...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.16.0
Assignee:	Nitin Goyal
QA Contact:	suchita
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-25 02:11 UTC by suchita
Modified:	2024-11-15 04:25 UTC (History)
CC List:	7 users (show)
Fixed In Version:	4.16.0-92
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-07-17 13:10:26 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 2133	None	open	provider: add mechanism to constrain the ocs-provider-server	2023-08-11 09:45:05 UTC
Github	red-hat-storage ocs-operator pull 2502	None	Merged	Apply custom taints via the storagecluster CR	2024-04-30 06:41:25 UTC
Red Hat Product Errata	RHSA-2024:4591	None	None	None	2024-07-17 13:10:30 UTC

Description suchita 2022-04-25 02:11:32 UTC

Description of problem:
on the OSD cluster, the storage API pod is running on the infra node. 





Version-Release number of selected component (if applicable):
OCP cluster version: 4.10.8
========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.0                      NooBaa Operator               4.10.0                                                      Succeeded
ocs-operator.v4.10.0                      OpenShift Container Storage   4.10.0                                                      Succeeded
ocs-osd-deployer.v2.0.0                   OCS OSD Deployer              2.0.0                                                       Succeeded
odf-csi-addons-operator.v4.10.0           CSI Addons                    4.10.0                                                      Succeeded
odf-operator.v4.10.0                      OpenShift Data Foundation     4.10.0                                                      Succeeded
ose-prometheus-operator.4.8.0             Prometheus Operator           4.8.0                                                       Succeeded
route-monitor-operator.v0.1.408-c2256a2   Route Monitor Operator        0.1.408-c2256a2   route-monitor-operator.v0.1.406-54ff884   Succeeded

provider add-on ID: producible on ocs-provider and ocs-provider QE both

How reproducible:


Steps to Reproduce:
1.Deploy Managed service cluster
2.
3.

Actual results:
API pod is scheduled is running on infra node
Expected results:
In OSD cluster, the storage cluster node selector only allows the schedule of API pods only on the worker node and not on the infra node



Additional info:

more discussion in https://chat.google.com/room/AAAASHA9vWs/evBXthCPcMQ
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/must-gather.local.7772965443419395352/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/logs/ocs_must_gather/
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-13pr1/sgatfane-13pr1_20220414T010442/openshift-cluster-dir/nohup.out

Comment 1 Ohad 2022-04-25 05:20:45 UTC

@sgatfane 
The IP provided on the storage cluster has no relation to where the API pod is scheduled as we can use any node to access to provider IP. 
Can you please add logs showing the pod running on an infra node as a reproduction?

In any case, this is not the correct component for this bug, moving it to ocs-operator

Comment 3 Jose A. Rivera 2022-06-17 14:42:54 UTC

@omitrani Not sure why this was moved to the ocs-operator component. Wouldn't the deployer be able to take care of this by configuring an appropriate node selector in the StorageCluster Spec?

Comment 4 Ohad 2022-06-22 08:01:46 UTC

@jrivera To the best of my understanding, the deployer is already configuring the selector correctly (worker, non-infra nodes)
I think this bug shows that the storage API deployment, created by ocs-operator, is not respecting the given selector, which makes this an ocs-operator bug

Comment 13 Nitin Goyal 2023-03-10 06:22:13 UTC

(In reply to Ohad from comment #4)
> @jrivera To the best of my understanding, the deployer is already
> configuring the selector correctly (worker, non-infra nodes)
> I think this bug shows that the storage API deployment, created by
> ocs-operator, is not respecting the given selector, which makes this an
> ocs-operator bug

I think we are talking about the provider server here, We will need a few API changes in the storage cluster to get the node selector from the user. Then we can deploy the provider server using the node selectors.

Moving it out for now.

Comment 26 suchita 2024-05-15 06:09:26 UTC

We do not support MS deployment any More. hence this can not be verified directly on MS setup . 

However I have verified this BZ indirectly with below 2 types of verifications:

1. Using Provider Client Setups and 
2. Verification of RHSTOR-5098 story

-----------------------------------------
1. Provider client setups. 
This setup have of master,control plane and worker nodes.
Verified on 2 different setups with provider ocs-client-operator.v4.16.0-94.stable and ocs-client-operator.v4.16.0-96.stable
"StorageproviderEndpoint" is selected the IP of "control-plane,master,worker"  node and "worker" node in second setup.

2.  Verification of RHSTOR-5098 story: 
Happy path validation of this story is already completed and no issue found yet. 
Which implies now OpenShift Data Foundation is allowed to use a custom taint. 
https://docs.google.com/document/d/1n7gICNT6m8MrYD54WK9E55NkbO0uNYM25R2_iXEOKac/edit?usp=sharing


Conclusion: 
Based on Comment#4 the deployer is already configuring the selector correctly (worker, non-infra nodes). In Provider client deployment only non-infra worker nodes IP is selected for storageProviderEndpoint. 
Happy path Verification of RHSTOR-5098 indicates that now ocs-operator is now supported for the custom taint. 

Hence moving this BZ to verified.

Comment 29 errata-xmlrpc 2024-07-17 13:10:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Comment 30 Red Hat Bugzilla 2024-11-15 04:25:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.