2255232 – rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on control-plan

Bug 2255232 - rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on control-plan

Summary: rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on con...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.14
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.15.0
Assignee:	Jiffin
QA Contact:	Parikshith
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-12-19 12:31 UTC by Ramon Gordillo
Modified:	2024-03-29 14:32 UTC (History)
CC List:	9 users (show)
Fixed In Version:	4.15.0-123
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-19 15:25:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 2398	None	open	[Bug 2255232]: revert of f5ba884b012bf0a7acfe612578f1f0091b2d1fd2	2024-01-21 14:36:51 UTC
Github	red-hat-storage ocs-operator pull 2411	None	open	Bug 2255232: [release-4.15] Revert "add topology mode annotation rgw service"	2024-01-22 14:56:15 UTC
Red Hat Knowledge Base (Solution)	7062284	None	None	None	2024-03-29 14:32:20 UTC
Red Hat Product Errata	RHSA-2024:1383	None	None	None	2024-03-19 15:25:58 UTC

Description Ramon Gordillo 2023-12-19 12:31:55 UTC

Description of problem (please be detailed as possible and provide log
snippests):

In OpenShift 4.14, warning events are being raised every some time regarding Topology Aware Routing:

Service: rook-ceph-rgw-ocs-storagecluster-cephobjectstore
Namespace: openshift-storage
Generated from endpoint-slice-controller
22 times in the last 1 hour
Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4

The ODF cluster is deployed in the control-plane, and there is some limitations in this regard:

https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/#constraints

The EndpointSlice controller ignores nodes with the node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label set. This could be problematic if workloads are also running on those nodes.

Version of all relevant components (if applicable):

ODF 4.14.2 (and .3), OCP 4.14.2 (and .7)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

It creates noise and false alarms when monitoring is performed on warning events.

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Always

Can this issue reproduce from the UI?

Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy the products
2. Check through:

oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

Actual results:

Warning events

Expected results:

No warning


Additional info:

Comment 3 Ramon Gordillo 2023-12-19 13:19:10 UTC

oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

4m14s       Warning   TopologyAwareHintsDisabled   service/rook-ceph-rgw-ocs-storagecluster-cephobjectstore              Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4


ClusterID: edf2e1a1-aa36-4464-ba23-b83e29cb4aef
ClusterVersion: Stable at "4.14.6"
ClusterOperators:
	All healthy and stable

Comment 6 Subham Rai 2023-12-20 08:55:18 UTC

The only thing here I see we can do is add annotation`service.kubernetes.io/topology-mode: auto` to the service. Is this what you are suggesting?

Comment 7 Ramon Gordillo 2023-12-20 09:09:23 UTC

Hi.

Yes.

The rook-ceph-rgw-ocs-storagecluster-cephobjectstore service is using "service.kubernetes.io/topology-mode: Auto" annotation. That forces to use the TAH, which in the documentation says it cannot be provided in master/control-planes.

Please, review the reasons and provide an alternative for this use case.

Thanks.

Comment 13 Malay Kumar parida 2024-01-18 05:30:56 UTC

Hi Jiffin, Are we planning to revert the change as Blaine suggested or do we need to move it out to 4.16

Comment 14 Jiffin 2024-01-19 06:52:50 UTC

PR posted in https://github.com/red-hat-storage/ocs-operator/pull/2398

Comment 19 errata-xmlrpc 2024-03-19 15:25:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383

Note You need to log in before you can comment on or make changes to this bug.