Bug 2048681

Summary: [RFE] Limit slow request details to cluster log
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: CLOSED ERRATA QA Contact: Tintu Mathew <tmathew>
Severity: medium Docs Contact: Akash Raj <akraj>
Priority: medium    
Version: 4.2CC: akraj, akupczyk, amathuri, bhubbard, ceph-eng-bugs, choffman, kdreyer, ksirivad, lflores, linuxkidd, nojha, pdhange, pdhiran, rfriedma, rzarzyns, skanta, sseshasa, vereddy, vumrao
Target Milestone: ---Keywords: FutureFeature, Rebase
Target Release: 5.2Flags: pdhange: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.8-2.el8cp Doc Type: Enhancement
Doc Text:
.OSDs report the slow operations details in an aggregated format to the Ceph cluster log. Previously, slow requests would overwhelm a cluster log with too many details, filling up the monitor database. With this release, slow requests by operation type and by pool information gets logged to the cluster log.
Story Points: ---
Clone Of: 1998330 Environment:
Last Closed: 2022-08-09 17:37:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1998330    
Bug Blocks: 2102272    

Comment 7 Tintu Mathew 2022-06-21 04:35:31 UTC
Verified it on ceph version 16.2.8-47.el8cp (48087358763c55c41f590e2beabc1fd341b89226) pacific (stable).

expected log format

<date>T<time> osd.<osdid> (osd.<osdid>) : cluster [WRN] <number-of-slow-requests> slow requests (by type [ 'slow-op-type1' : <number-of-slow-op-of-type1> 'slow-op-type2' : <number-of-slow-op-of-type2> .. ] most affected pool [ '<max-slow-for-pool-name>' : <number-of-slow-ops-for-pool> ])

Output from the ceph.log

2022-06-20T16:39:23.465436+0000 osd.9 (osd.9) 5 : cluster [WRN] 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'default.rgw.control' : 2 ])
2022-06-20T16:39:33.138344+0000 osd.13 (osd.13) 2 : cluster [WRN] 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'default.rgw.control' : 2 ])
2022-06-20T16:39:33.287790+0000 osd.15 (osd.15) 3 : cluster [WRN] 4 slow requests (by type [ 'delayed' : 4 ] most affected pool [ 'default.rgw.log' : 2 ])
2022-06-20T16:39:58.428863+0000 osd.7 (osd.7) 3 : cluster [WRN] 1 slow requests (by type [ 'started' : 1 ] most affected pool [ 'rbdpool' : 1 ])
2022-06-20T16:39:59.484604+0000 osd.0 (osd.0) 1 : cluster [WRN] 1 slow requests (by type [ 'reached pg' : 1 ] most affected pool [ 'ercpool_2' : 1 ])
2022-06-20T16:39:59.529937+0000 osd.2 (osd.2) 9 : cluster [WRN] 1 slow requests (by type [ 'reached pg' : 1 ] most affected pool [ 'ercpool_2' : 1 ])

Comment 13 errata-xmlrpc 2022-08-09 17:37:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997