Bug 2269354
| Summary: | [RFE] Change the default interval duration for two ServiceMonitors, 'rook-ceph-exporter' and 'rook-ceph-mgr' | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | arun kumar mohan <amohan> |
| Component: | ceph-monitoring | Assignee: | arun kumar mohan <amohan> |
| Status: | CLOSED ERRATA | QA Contact: | Daniel Osypenko <dosypenk> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.14 | CC: | asriram, etamir, kbg, muagarwa, nthomas, odf-bz-bot, ramon.gordillo, tnielsen |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | ODF 4.16.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.16.0-94 | Doc Type: | Bug Fix |
| Doc Text: |
.Low default interval duration for two ServiceMonitors, 'rook-ceph-exporter' and 'rook-ceph-mgr'
Previously, the exporter data collected by prometheus added load to the system as the prometheus scrap interval provided for service monitors, 'rook-ceph-exporter’ and 'rook-ceph-mgr’ was only 5 seconds.
With this fix, the interval is increased to 30 seconds to balance the prometheus scrapping, thereby reducing the system load.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-07-17 13:16:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2260844 | ||
|
Description
arun kumar mohan
2024-03-13 10:56:15 UTC
We could fix the issue in either of the following two ways,
A. We can either change the default values for the ServiceMonitors in their respective files in rook repo,
That is to change in these two yaml files,
https://github.com/rook/rook/blob/master/deploy/examples/monitoring/exporter-service-monitor.yaml (for rook-ceph-exporter)
https://github.com/rook/rook/blob/master/deploy/examples/monitoring/service-monitor.yaml (for rook-ceph-mgr)
OR
B. We can change the cephcluster creation in ocs-operator repo, adding an additional 'Interval' field to Spec->Monitoring, which will then be read by rook-operator and make the needed changes to both the SMs (rook-ceph-exporter and rook-ceph-mgr)
ocs-operator code: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storagecluster/cephcluster.go#L455
Will discuss further, with the team, on which (optimal) path to take.
Submitted the RFE google form. PS: not very sure about the name of the customer to add in the form (In reply to arun kumar mohan from comment #4) > We could fix the issue in either of the following two ways, > > A. We can either change the default values for the ServiceMonitors in their > respective files in rook repo, > That is to change in these two yaml files, > > https://github.com/rook/rook/blob/master/deploy/examples/monitoring/exporter- > service-monitor.yaml (for rook-ceph-exporter) > > https://github.com/rook/rook/blob/master/deploy/examples/monitoring/service- > monitor.yaml (for rook-ceph-mgr) > > > OR > > B. We can change the cephcluster creation in ocs-operator repo, adding an > additional 'Interval' field to Spec->Monitoring, which will then be read by > rook-operator and make the needed changes to both the SMs > (rook-ceph-exporter and rook-ceph-mgr) > ocs-operator code: > https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/ > storagecluster/cephcluster.go#L455 > > Will discuss further, with the team, on which (optimal) path to take. Option B would be recommended. The monitoring.interval setting is the intended way to override this value. Thanks Travis. Created PR: https://github.com/red-hat-storage/ocs-operator/pull/2506 A jira ticket is raised (in RHSTOR project): https://issues.redhat.com/browse/RHSTOR-5765 Please take a look. oc get cephclusters.ceph.rook.io ocs-storagecluster-cephcluster -n openshift-storage -o=jsonpath={'.spec.monitoring'}
{"enabled":true,"interval":"30s"}
oc get servicemonitor rook-ceph-exporter -n openshift-storage -o jsonpath='{.spec.endpoints[0].interval}'
30s
oc get servicemonitor rook-ceph-mgr -n openshift-storage -o jsonpath='{.spec.endpoints[0].interval}'
30s
OC version:
Client Version: 4.13.4
Kustomize Version: v4.5.7
Server Version: 4.16.0-0.nightly-2024-05-15-001800
Kubernetes Version: v1.29.4+4a87b53
OCS verison:
ocs-operator.v4.16.0-99.stable OpenShift Container Storage 4.16.0-99.stable Succeeded
Cluster version
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.nightly-2024-05-15-001800 True False 79m Cluster version is 4.16.0-0.nightly-2024-05-15-001800
Rook version:
2024/05/15 10:10:52 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
rook: v4.16.0-0.32d64a561bd504448dedcbda3a7a4e6083227ad5
go: go1.21.9 (Red Hat 1.21.9-1.el9_4)
Ceph version:
ceph version 18.2.1-167.el9cp (e8c836edb24adb7717a6c8ba1e93a07e3efede29) reef (stable)
Verified
additionally, tests from tests/functional/pod_and_daemons/test_mgr_pods.py passed Providing the RDT details, please take a look Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |