Description of problem: Expanding on the work done in https://bugzilla.redhat.com/show_bug.cgi?id=1929756. Display OSD counter for number of slow ops against a specific OSD device and also list counters for the type of slow request against the OSD. We currently have to use bash scripts to generate these numbers from the ceph cluster logs, example: Here is the slow request by OSD breakdown in the cluster log: $ grep 'slow request [3-5][0-9]\.' ceph.log | awk '{print $3}' | sort -g | uniq -c | sort -g 2 osd.1869 4 osd.1446 7 osd.1145 8 osd.1045 8 osd.2084 13 osd.1172 22 osd.0 35 osd.17 49 osd.309 106 osd.2361 196 osd.1651 450 osd.2484 533 osd.1629 1849 osd.1237 4228 osd.2301 9332 osd.118 We can also breakdown by slow request type. Displaying these metrics per OSD in the dashboard will help customers troubleshoot where the actual issue resides as determining the largest offender is what usually solves these issues. Also being able to classify these slow requests in per bucket (host, rack) is largely part of troubleshooting slow request issues.
https://github.com/ceph/ceph/pull/49519 will be in v17.2.6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days