Bug 2255232 - rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on control-plan
Summary: rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on con...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.15.0
Assignee: Jiffin
QA Contact: Parikshith
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-12-19 12:31 UTC by Ramon Gordillo
Modified: 2024-03-29 14:32 UTC (History)
9 users (show)

Fixed In Version: 4.15.0-123
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:25:55 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2398 0 None open [Bug 2255232]: revert of f5ba884b012bf0a7acfe612578f1f0091b2d1fd2 2024-01-21 14:36:51 UTC
Github red-hat-storage ocs-operator pull 2411 0 None open Bug 2255232: [release-4.15] Revert "add topology mode annotation rgw service" 2024-01-22 14:56:15 UTC
Red Hat Knowledge Base (Solution) 7062284 0 None None None 2024-03-29 14:32:20 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:25:58 UTC

Description Ramon Gordillo 2023-12-19 12:31:55 UTC
Description of problem (please be detailed as possible and provide log
snippests):

In OpenShift 4.14, warning events are being raised every some time regarding Topology Aware Routing:

Service: rook-ceph-rgw-ocs-storagecluster-cephobjectstore
Namespace: openshift-storage
Generated from endpoint-slice-controller
22 times in the last 1 hour
Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4

The ODF cluster is deployed in the control-plane, and there is some limitations in this regard:

https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/#constraints

The EndpointSlice controller ignores nodes with the node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label set. This could be problematic if workloads are also running on those nodes.

Version of all relevant components (if applicable):

ODF 4.14.2 (and .3), OCP 4.14.2 (and .7)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

It creates noise and false alarms when monitoring is performed on warning events.

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Always

Can this issue reproduce from the UI?

Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy the products
2. Check through:

oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

Actual results:

Warning events

Expected results:

No warning


Additional info:

Comment 3 Ramon Gordillo 2023-12-19 13:19:10 UTC
oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

4m14s       Warning   TopologyAwareHintsDisabled   service/rook-ceph-rgw-ocs-storagecluster-cephobjectstore              Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4


ClusterID: edf2e1a1-aa36-4464-ba23-b83e29cb4aef
ClusterVersion: Stable at "4.14.6"
ClusterOperators:
	All healthy and stable

Comment 6 Subham Rai 2023-12-20 08:55:18 UTC
The only thing here I see we can do is add annotation`service.kubernetes.io/topology-mode: auto` to the service. Is this what you are suggesting?

Comment 7 Ramon Gordillo 2023-12-20 09:09:23 UTC
Hi.

Yes.

The rook-ceph-rgw-ocs-storagecluster-cephobjectstore service is using "service.kubernetes.io/topology-mode: Auto" annotation. That forces to use the TAH, which in the documentation says it cannot be provided in master/control-planes.

Please, review the reasons and provide an alternative for this use case.

Thanks.

Comment 13 Malay Kumar parida 2024-01-18 05:30:56 UTC
Hi Jiffin, Are we planning to revert the change as Blaine suggested or do we need to move it out to 4.16

Comment 14 Jiffin 2024-01-19 06:52:50 UTC
PR posted in https://github.com/red-hat-storage/ocs-operator/pull/2398

Comment 19 errata-xmlrpc 2024-03-19 15:25:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.