Bug 2280342 - ServiceMonitor ramen-hub-operator-metrics-monitor selector too open
Summary: ServiceMonitor ramen-hub-operator-metrics-monitor selector too open
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.16
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.16.0
Assignee: rakesh-gm
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-14 11:16 UTC by Shyamsundar
Modified: 2024-07-17 13:22 UTC (History)
3 users (show)

Fixed In Version: 4.16.0-102
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-07-17 13:22:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github RamenDR ramen pull 1385 0 None Merged add resources to transformer config 2024-05-14 11:19:29 UTC
Github red-hat-storage ramen pull 268 0 None open Bug 2280342: add resources to transformer config 2024-05-15 12:18:38 UTC
Red Hat Issue Tracker RHSTOR-5752 0 None None None 2024-05-14 11:16:46 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:22:32 UTC

Description Shyamsundar 2024-05-14 11:16:47 UTC
Migrated from: https://issues.redhat.com/browse/RHSTOR-5752

The pod selector of ServiceMonitor ramen-hub-operator-metrics-monitor  is to open it hits other pods as well. 

$ oc get servicemonitor ramen-hub-operator-metrics-monitor -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2024-04-05T18:12:32Z"
  generation: 1
  labels:
    control-plane: controller-manager
    olm.managed: "true"
  name: ramen-hub-operator-metrics-monitor
  namespace: openshift-operators
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: ClusterServiceVersion
    name: odr-hub-operator.v4.15.0-rhodf
    uid: c198ece5-952e-4aa6-9809-dc428a13e2c2
  resourceVersion: "635683986"
  uid: a6b5be5c-cea6-4919-b3e1-f81195dec1d3
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    path: /metrics
    port: https
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  selector:
    matchLabels:
      control-plane: controller-manager 
$ oc get pods -l control-plane=controller-manager --show-labels
NAME                                                            READY   STATUS    RESTARTS      AGE    LABELS
external-secrets-operator-controller-manager-65f56c8654-crrb2   1/1     Running   0             3h3m   control-plane=controller-manager,pod-template-hash=65f56c8654
ramen-hub-operator-5d7bd796d5-r7wr9                             2/2     Running   3 (53m ago)   3h3m   app=ramen-hub,control-plane=controller-manager,pod-template-hash=5d7bd796d5

Install odf & dr and external-secrets operator, then the PrometheusOperatorRejectedResources will fired after some time.

Comment 3 Sunil Kumar Acharya 2024-05-15 09:21:12 UTC
Moving the non-blocker BZ out of ODF-4.16.0 due to blocker only phase. If this BZ should be considered as blocker, feel free to propose it back with justification note.

Comment 4 Shyamsundar 2024-05-15 11:19:03 UTC
(In reply to Sunil Kumar Acharya from comment #3)
> Moving the non-blocker BZ out of ODF-4.16.0 due to blocker only phase. If
> this BZ should be considered as blocker, feel free to propose it back with
> justification note.

This issue has a fix backport here: https://github.com/red-hat-storage/ramen/pull/268 (IOW in POST state).

The issue deals with ServiceMonitor being too open and hence hitting other pods, which is better fixed sooner than later.

As a result requesting back 4.16 flags on this BZ.

Comment 7 krishnaram Karthick 2024-05-15 12:34:37 UTC
What would be the steps to verify this bug?

Comment 8 Shyamsundar 2024-05-16 12:08:15 UTC
(In reply to krishnaram Karthick from comment #7)
> What would be the steps to verify this bug?

1) ServiceMonitor named ramen-hub-operator-metrics-monitor should have its spec.selector include "app: ramen-hub"
2) Service, Deployment for Ramen in the same namespace should have the "app: ramen-hub" label (this was already the case, but need to validate now)
3) Metrics should work as expected, i.e prometheus should be able to still collect Ramen metrics (even if we check the default policy metric for configured DRPolicies that is enough validation)

NOTE: There are no changes to the dr-cluster components, so only hub validation for the above is required on a fresh install. An upgrade should also "fix" the ServiceMonitor label selector.

Comment 13 errata-xmlrpc 2024-07-17 13:22:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591


Note You need to log in before you can comment on or make changes to this bug.