Description of problem (please be detailed as possible and provide log snippests): In OpenShift 4.14, warning events are being raised every some time regarding Topology Aware Routing: Service: rook-ceph-rgw-ocs-storagecluster-cephobjectstore Namespace: openshift-storage Generated from endpoint-slice-controller 22 times in the last 1 hour Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 The ODF cluster is deployed in the control-plane, and there is some limitations in this regard: https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/#constraints The EndpointSlice controller ignores nodes with the node-role.kubernetes.io/control-plane or node-role.kubernetes.io/master label set. This could be problematic if workloads are also running on those nodes. Version of all relevant components (if applicable): ODF 4.14.2 (and .3), OCP 4.14.2 (and .7) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? It creates noise and false alarms when monitoring is performed on warning events. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Always Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy the products 2. Check through: oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled Actual results: Warning events Expected results: No warning Additional info:
oc get events -n openshift-storage |grep -i TopologyAwareHintsDisabled 4m14s Warning TopologyAwareHintsDisabled service/rook-ceph-rgw-ocs-storagecluster-cephobjectstore Insufficient Node information: allocatable CPU or zone not specified on one or more nodes, addressType: IPv4 ClusterID: edf2e1a1-aa36-4464-ba23-b83e29cb4aef ClusterVersion: Stable at "4.14.6" ClusterOperators: All healthy and stable
The only thing here I see we can do is add annotation`service.kubernetes.io/topology-mode: auto` to the service. Is this what you are suggesting?
Hi. Yes. The rook-ceph-rgw-ocs-storagecluster-cephobjectstore service is using "service.kubernetes.io/topology-mode: Auto" annotation. That forces to use the TAH, which in the documentation says it cannot be provided in master/control-planes. Please, review the reasons and provide an alternative for this use case. Thanks.
Hi Jiffin, Are we planning to revert the change as Blaine suggested or do we need to move it out to 4.16
PR posted in https://github.com/red-hat-storage/ocs-operator/pull/2398
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383