Description of problem: Although osd's track slow ops and bubble this up to mgr, the metrics that mgr/prometheus generates does not include alert related metrics. This BZ Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Generate SLOW_OPS by lowering osd_op_complaint_time 2. run rados bench to trigger the health check 3. no field in the prometheus scrape exposes this condition Actual results: SLOW_OPS triggers a health status change (0->1), but does not provide any information related to slow_ops Expected results: SLOW_OPS should be visible in the prometheus data Additional info:
Hi Josh, Is it targetted for 4.2z1. If not can we move this to 4.2z2?
(In reply to Veera Raghava Reddy from comment #17) > Hi Josh, Is it targetted for 4.2z1. If not can we move this to 4.2z2? Checking with Paul to verify the test setup. Also adding @jefbrown . If it is confirmed to be an issue I think we should move it to z2.
Created attachment 1769755 [details] prometheus raph showing the slow_ops metric
Can we have this BZ moved to Verified and raise a new BZ to track alerts?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage security, bug fix, and enhancement Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1452