Bug 2240839

Summary: [5.3 backport][RADOS] "currently delayed" slow ops does not provide details on why op has been delayed
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: CLOSED ERRATA QA Contact: Pawan <pdhiran>
Severity: high Docs Contact: Ranjini M N <rmandyam>
Priority: high    
Version: 5.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, pdhange, pdhiran, rmandyam, tserlin, vereddy, vumrao
Target Milestone: ---   
Target Release: 5.3z6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.10-238.el8cp Doc Type: Enhancement
Doc Text:
.New reports available for sub-events for delayed operations Previously, slow operations were marked as delayed but without a detailed description. With this enhancement, you can view the detailed descriptions of delayed sub-events for operations.
Story Points: ---
Clone Of: 2240832 Environment:
Last Closed: 2024-02-08 16:55:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2240832    
Bug Blocks: 2240838, 2258797    

Description Vikhyat Umrao 2023-09-26 21:00:31 UTC
+++ This bug was initially created as a clone of Bug #2240832 +++

Description of problem:
With reference to BZ#2240819, the osd.0 observed slow ops and most of the slow ops were delayed but with no details on why op marked as delayed e.g is it because of "waiting for rw locks" or "waiting for missing objects" or "waiting for peered" etc.

There could be different reason for op being marked as delayed and it could be either of below reason :
  op->mark_delayed("waiting for missing object");
  op->mark_delayed("waiting for degraded object");
  op->mark_delayed("waiting for cache not full");
  op->mark_delayed("waiting for clean to repair");
  op->mark_delayed("waiting for blocked object");
  op->mark_delayed("waiting for readable");
  op->mark_delayed("waiting for readable");
          op->mark_delayed("waiting for scrub");
          op->mark_delayed("waiting for readable");
    op->mark_delayed("waiting_for_map not empty");
      op->mark_delayed("waiting for peered");
    op->mark_delayed("waiting for flush");
      op->mark_delayed("waiting for active");
      op->mark_delayed("waiting for scrub");
	op->mark_delayed("waiting for ondisk");
    op->mark_delayed("waiting for rw locks");
	op->mark_delayed("waiting for scrub");
      op->mark_delayed("waiting for scrub");
  op->mark_delayed("waiting for missing object");

Version-Release number of selected component (if applicable):
RHCS 7

How reproducible:
Frequently

Steps to Reproduce:
1. Deploy ceph cluster
2. Run extensive client workload against the ceph cluster 
3. Observe "currently delayed" slow ops

Actual results:
The delayed ops does provide details on reason for op being flagged as delayed

Expected results:
The delayed ops should provide details on reason for op being flagged as delayed

Additional info:

--- Additional comment from Vikhyat Umrao on 2023-09-26 20:44:44 UTC ---

Marking this one blocker because it is a kind of regression and causing issues in troubleshooting slow requests!

--- Additional comment from Vikhyat Umrao on 2023-09-26 20:56:51 UTC ---

The issue was reported in ODF 4.10 which is nothing but 5.1.z2 - 16.2.7-126 hence changing the reported version to 5.1!

Comment 8 errata-xmlrpc 2024-02-08 16:55:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 Security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:0745