Bug 2304076 - duplicate metrics being produced
Summary: duplicate metrics being produced
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.18.0
Assignee: Aayush Chouhan
QA Contact: Sagi Hirshfeld
URL:
Whiteboard:
: 2308319 (view as bug list)
Depends On:
Blocks: 2321231 2322896
TreeView+ depends on / blocked
 
Reported: 2024-08-12 08:37 UTC by iwatson
Modified: 2025-03-26 10:22 UTC (History)
23 users (show)

Fixed In Version: 4.17.0-117
Doc Type: Bug Fix
Doc Text:
Cause: Prometheus client dependency upgrade caused process level metris (res mem, VSS, heap, etc.) to be collected when custom metrics are collected as well. Consequence: Duplicate metrics since NooBaa was collecting process level metrics as well as custom metrics. These dups caused an alert in Prometheus Fix: Remove process-level collections as those are collected now upon custom metrics. Result: Remove duplication of metrics reporting and the alert in Prometheus
Clone Of:
: 2321231 2322896 (view as bug list)
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github noobaa noobaa-core pull 8370 0 None Merged Added a fix for prometheusDuplicateTimestamp error 2024-09-30 07:36:44 UTC
Github noobaa noobaa-core pull 8416 0 None Merged NC | 5.17 backports 2024-10-06 05:33:51 UTC
Red Hat Issue Tracker OCSBZM-8812 0 None None None 2024-08-21 08:56:01 UTC
Red Hat Knowledge Base (Solution) 7083407 0 None None None 2024-08-20 12:18:05 UTC

Description iwatson 2024-08-12 08:37:08 UTC
The following alerts is firing PrometheusDuplicateTimestamps
 
Turning on debug logs indicates that the noobaa-mgmt-service-monitor and s3-service-monitor are the issue

1 ts=2024-08-09T12:35:11.506Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor/0 target=http://10.129.2.15:8080/metrics/web_server msg="Duplicate sample for timestamp" series=NooBaa_health_status 


This can be shown by manually curling the metrics

oc exec prometheus-k8s-1 -- curl "http://10.129.2.15:8080/metrics/web_server" > metrics.txt 
 
Which indeed is returning duplicate metrics by searching for 'NooBaa_health_status 0' as a example
 
Upon investigation this is because of this block
https://github.com/noobaa/noobaa-core/blob/ad73e9cb3bd483f6f34de9a28a9f4ba3ea060eb3/src/server/analytic_services/prometheus_reporting.js#L44
 
If I call /metrics/web_server/nodejs and /metrics/web_server/core seperatly they return the same results.
 
So the solution is to either alert the code above or change the service monitor items to append /nodejs onto the end. 
 
Such as
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: noobaa-mgmt-service-monitor
  labels:
    app: noobaa
spec:
  endpoints:
  - port: mgmt
    path: /metrics/web_server/nodejs
  - port: mgmt
    path: /metrics/bg_workers
  - port: mgmt
    path: /metrics/hosted_agents
  namespaceSelector: {}
  selector:
    matchLabels:
      noobaa-mgmt-svc: "true"

Comment 3 Sonigra Saurab 2024-08-20 12:18:05 UTC
created KCS

Comment 6 Neyder Achahuanco Apaza 2024-08-20 19:42:56 UTC
Greetings, due to this bug, any change in configmap cluster-monitoring-config could not be processed, so this is impacting new deployments to configure persistence.

Comment 12 Adebi Akobi 2024-09-23 12:59:55 UTC
Hello team,

 Thank you for the information. I see the target for fix is approved for 4.17 bug fix cycle ..Is it possible to get a fix/patch in 4.16 while waiting for the 4.17 cycle? Please and thank you

Adebi

Comment 13 Liran Mauda 2024-09-30 07:40:09 UTC
Hi,

Fixing on an older version, while on a newer version, there is an issue, is a regression.
We will fix it on 4.17, and consider it to be in one of the 4.16.z

Best Regards,
Liran.

Comment 15 Sunil Kumar Acharya 2024-10-08 13:17:11 UTC
Please update the RDT flag/text appropriately.


Note You need to log in before you can comment on or make changes to this bug.