Description of problem:
Although osd's track slow ops and bubble this up to mgr, the metrics that mgr/prometheus generates does not include alert related metrics. This BZ
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. Generate SLOW_OPS by lowering osd_op_complaint_time
2. run rados bench to trigger the health check
3. no field in the prometheus scrape exposes this condition
Actual results:
SLOW_OPS triggers a health status change (0->1), but does not provide any information related to slow_ops
Expected results:
SLOW_OPS should be visible in the prometheus data
Additional info:
Comment 17Veera Raghava Reddy
2021-04-06 18:19:03 UTC
Hi Josh, Is it targetted for 4.2z1. If not can we move this to 4.2z2?
(In reply to Veera Raghava Reddy from comment #17)
> Hi Josh, Is it targetted for 4.2z1. If not can we move this to 4.2z2?
Checking with Paul to verify the test setup. Also adding @jefbrown .
If it is confirmed to be an issue I think we should move it to z2.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: Red Hat Ceph Storage security, bug fix, and enhancement Update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2021:1452