Bug 2048681 - [RFE] Limit slow request details to cluster log
Summary: [RFE] Limit slow request details to cluster log
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 5.2
Assignee: Prashant Dhange
QA Contact: Tintu Mathew
Akash Raj
URL:
Whiteboard:
Depends On: 1998330
Blocks: 2102272
TreeView+ depends on / blocked
 
Reported: 2022-01-31 17:21 UTC by Vikhyat Umrao
Modified: 2022-09-01 02:33 UTC (History)
19 users (show)

Fixed In Version: ceph-16.2.8-2.el8cp
Doc Type: Enhancement
Doc Text:
.OSDs report the slow operations details in an aggregated format to the Ceph cluster log. Previously, slow requests would overwhelm a cluster log with too many details, filling up the monitor database. With this release, slow requests by operation type and by pool information gets logged to the cluster log.
Clone Of: 1998330
Environment:
Last Closed: 2022-08-09 17:37:27 UTC
Embargoed:
pdhange: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 53944 0 None None None 2022-01-31 17:21:29 UTC
Github ceph ceph pull 44771 0 None open pacific: osd/OSD: Log aggregated slow ops detail to cluster logs 2022-03-15 22:19:39 UTC
Red Hat Issue Tracker RHCEPH-3075 0 None None None 2022-01-31 17:24:58 UTC
Red Hat Product Errata RHSA-2022:5997 0 None None None 2022-08-09 17:38:03 UTC

Comment 7 Tintu Mathew 2022-06-21 04:35:31 UTC
Verified it on ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable).

expected log format

<date>T<time> osd.<osdid> (osd.<osdid>) : cluster [WRN] <number-of-slow-requests> slow requests (by type [ 'slow-op-type1' : <number-of-slow-op-of-type1> 'slow-op-type2' : <number-of-slow-op-of-type2> .. ] most affected pool [ '<max-slow-for-pool-name>' : <number-of-slow-ops-for-pool> ])

Output from the ceph.log

2022-06-20T16:39:23.465436+0000 osd.9 (osd.9) 5 : cluster [WRN] 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'default.rgw.control' : 2 ])
2022-06-20T16:39:33.138344+0000 osd.13 (osd.13) 2 : cluster [WRN] 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'default.rgw.control' : 2 ])
2022-06-20T16:39:33.287790+0000 osd.15 (osd.15) 3 : cluster [WRN] 4 slow requests (by type [ 'delayed' : 4 ] most affected pool [ 'default.rgw.log' : 2 ])
2022-06-20T16:39:58.428863+0000 osd.7 (osd.7) 3 : cluster [WRN] 1 slow requests (by type [ 'started' : 1 ] most affected pool [ 'rbdpool' : 1 ])
2022-06-20T16:39:59.484604+0000 osd.0 (osd.0) 1 : cluster [WRN] 1 slow requests (by type [ 'reached pg' : 1 ] most affected pool [ 'ercpool_2' : 1 ])
2022-06-20T16:39:59.529937+0000 osd.2 (osd.2) 9 : cluster [WRN] 1 slow requests (by type [ 'reached pg' : 1 ] most affected pool [ 'ercpool_2' : 1 ])

Comment 13 errata-xmlrpc 2022-08-09 17:37:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997


Note You need to log in before you can comment on or make changes to this bug.