Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2189920

Summary: osd already out. but still have slow request on it.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shiqi <qshi>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: CLOSED ERRATA QA Contact: Pawan <pdhiran>
Severity: high Docs Contact: Disha Walvekar <dwalveka>
Priority: unspecified    
Version: 3.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, dwalveka, nojha, pdhange, rzarzyns, sostapov, tserlin, vumrao
Target Milestone: ---   
Target Release: 6.1z4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-183.el9cp Doc Type: Bug Fix
Doc Text:
Cause: Cluster reporting slow requests on OSD which is already out. Consequence: This is an inconsistent behavior as the OSD which is already out should not have the ops registered against it. Fix: Do not keep the daemon_state records in the mgr daemon for the down and out osd daemon Result: Slow requests are not reported for down&out OSD daemon.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-02-08 18:11:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2261930    

Description shiqi 2023-04-26 12:52:29 UTC
Description of problem:
osd already out. but still have slow request on it.

2023-04-18 09:36:58.985728 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 104 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW)
2023-04-18 09:37:58.829200 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 114 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW)
2023-04-18 09:38:03.829544 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 119 slow requests are blocked > 32 sec. Implicated osds 15,46,91 (REQUEST_SLOW)

2023-04-18 09:38:45.081818 7f4649923700  0 mon.N-PC-SRH310-187@0(leader) e1 handle_command mon_command({"prefix": "osd out", "ids": ["15"]} v 0) v1
2023-04-18 09:38:45.081867 7f4649923700  0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd=[{"prefix": "osd out", "ids": ["15"]}]: dispatch
2023-04-18 09:38:45.081988 7f4649923700  0 log_channel(cluster) log [INF] : Client client.admin marked osd.15 out, while it was still marked up
2023-04-18 09:38:46.123486 7f464511a700  1 mon.N-PC-SRH310-187@0(leader).osd e30192 e30192: 288 total, 283 up, 266 in
2023-04-18 09:38:46.143387 7f464511a700  0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd='[{"prefix": "osd out", "ids": ["15"]}]': finished

2023-04-18 09:39:06.392868 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 129 slow requests are blocked > 32 sec. Implicated osds 15,150,224 (REQUEST_SLOW)
2023-04-18 09:39:39.115526 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 176 slow requests are blocked > 32 sec. Implicated osds 15,60,91,99,125,129,146,150,157,224 (REQUEST_SLOW)
2023-04-18 09:40:06.778683 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 76 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW)
2023-04-18 09:40:30.580214 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 82 slow requests are blocked > 32 sec. Implicated osds 15,129,224 (REQUEST_SLOW)
2023-04-18 09:40:37.028340 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 75 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW)
2023-04-18 09:41:01.157178 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 11 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:41:11.173565 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 6 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:44:50.783392 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 19 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW)
2023-04-18 09:44:56.580815 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 22 slow requests are blocked > 32 sec. Implicated osds 15,17,236,247 (REQUEST_SLOW)
2023-04-18 09:45:17.868877 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 26 slow requests are blocked > 32 sec. Implicated osds 15,236 (REQUEST_SLOW)
2023-04-18 09:45:26.620679 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 28 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:30.987311 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 29 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:40.603139 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 30 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:50.918547 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 35 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW)
2023-04-18 09:46:00.993006 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 37 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:46:10.744300 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 41 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:46:19.179848 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 48 slow requests are blocked > 32 sec. Implicated osds 15,236,247 (REQUEST_SLOW)
2023-04-18 09:46:55.641033 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 70 slow requests are blocked > 32 sec. Implicated osds 6,15,16,19,56,60,70,91,109,110,125,211,213,236 (REQUEST_SLOW)
2023-04-18 09:56:22.814425 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 118 slow requests are blocked > 32 sec. Implicated osds 15,213,236 (REQUEST_SLOW)

2023-04-18 09:56:30.503006 7f4649923700  0 log_channel(cluster) log [INF] : osd.15 marked itself down

2023-04-18 10:05:29.076344 7f464c128700  0 log_channel(cluster) log [INF] : Cluster is now healthy

Version-Release number of selected component (if applicable):
RHCS3.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Prashant Dhange 2023-06-09 21:50:49 UTC
*** Bug 2189921 has been marked as a duplicate of this bug. ***

Comment 4 Scott Ostapovicz 2023-07-12 12:39:51 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.

Comment 16 errata-xmlrpc 2024-02-08 18:11:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0747