Bug 2255232

Summary: rook-ceph-rgw-ocs-storagecluster-cephobjectstore warning when deployed on control-plan
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Ramon Gordillo <ramon.gordillo>
Component: ocs-operatorAssignee: Jiffin <jthottan>
Status: CLOSED ERRATA QA Contact: Parikshith <pbyregow>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.14CC: brgardne, etamir, jansingh, jthottan, mparida, muagarwa, odf-bz-bot, srai, tnielsen
Target Milestone: ---   
Target Release: ODF 4.15.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.15.0-123 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-19 15:25:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ramon Gordillo 2023-12-19 12:31:55 UTC
Description of problem (please be detailed as possible and provide log
snippests):

In OpenShift 4.14, warning events are being raised every some time regarding Topology Aware Routing:

Service: rook-ceph-rgw-ocs-storagecluster-cephobjectstore
Namespace: openshift-storage
Generated from endpoint-slice-controller
22 times in the last 1 hour
Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4

The ODF cluster is deployed in the control-plane, and there is some limitations in this regard:

https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/#constraints

The EndpointSlice controller ignores nodes with the node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label set. This could be problematic if workloads are also running on those nodes.

Version of all relevant components (if applicable):

ODF 4.14.2 (and .3), OCP 4.14.2 (and .7)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

It creates noise and false alarms when monitoring is performed on warning events.

Is there any workaround available to the best of your knowledge?

No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Always

Can this issue reproduce from the UI?

Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy the products
2. Check through:

oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

Actual results:

Warning events

Expected results:

No warning


Additional info:

Comment 3 Ramon Gordillo 2023-12-19 13:19:10 UTC
oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled

4m14s       Warning   TopologyAwareHintsDisabled   service/rook-ceph-rgw-ocs-storagecluster-cephobjectstore              Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4


ClusterID: edf2e1a1-aa36-4464-ba23-b83e29cb4aef
ClusterVersion: Stable at "4.14.6"
ClusterOperators:
	All healthy and stable

Comment 6 Subham Rai 2023-12-20 08:55:18 UTC
The only thing here I see we can do is add annotation`service.kubernetes.io/topology-mode: auto` to the service. Is this what you are suggesting?

Comment 7 Ramon Gordillo 2023-12-20 09:09:23 UTC
Hi.

Yes.

The rook-ceph-rgw-ocs-storagecluster-cephobjectstore service is using "service.kubernetes.io/topology-mode: Auto" annotation. That forces to use the TAH, which in the documentation says it cannot be provided in master/control-planes.

Please, review the reasons and provide an alternative for this use case.

Thanks.

Comment 13 Malay Kumar parida 2024-01-18 05:30:56 UTC
Hi Jiffin, Are we planning to revert the change as Blaine suggested or do we need to move it out to 4.16

Comment 14 Jiffin 2024-01-19 06:52:50 UTC
PR posted in https://github.com/red-hat-storage/ocs-operator/pull/2398

Comment 19 errata-xmlrpc 2024-03-19 15:25:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383