Bug 2189920

Summary: osd already out. but still have slow request on it.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shiqi <qshi>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: ASSIGNED --- QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, pdhange, sostapov, vumrao
Target Milestone: ---Flags: pdhiran: needinfo? (pdhange)
Target Release: 6.1z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shiqi 2023-04-26 12:52:29 UTC
Description of problem:
osd already out. but still have slow request on it.

2023-04-18 09:36:58.985728 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 104 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW)
2023-04-18 09:37:58.829200 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 114 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW)
2023-04-18 09:38:03.829544 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 119 slow requests are blocked > 32 sec. Implicated osds 15,46,91 (REQUEST_SLOW)

2023-04-18 09:38:45.081818 7f4649923700  0 mon.N-PC-SRH310-187@0(leader) e1 handle_command mon_command({"prefix": "osd out", "ids": ["15"]} v 0) v1
2023-04-18 09:38:45.081867 7f4649923700  0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd=[{"prefix": "osd out", "ids": ["15"]}]: dispatch
2023-04-18 09:38:45.081988 7f4649923700  0 log_channel(cluster) log [INF] : Client client.admin marked osd.15 out, while it was still marked up
2023-04-18 09:38:46.123486 7f464511a700  1 mon.N-PC-SRH310-187@0(leader).osd e30192 e30192: 288 total, 283 up, 266 in
2023-04-18 09:38:46.143387 7f464511a700  0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd='[{"prefix": "osd out", "ids": ["15"]}]': finished

2023-04-18 09:39:06.392868 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 129 slow requests are blocked > 32 sec. Implicated osds 15,150,224 (REQUEST_SLOW)
2023-04-18 09:39:39.115526 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 176 slow requests are blocked > 32 sec. Implicated osds 15,60,91,99,125,129,146,150,157,224 (REQUEST_SLOW)
2023-04-18 09:40:06.778683 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 76 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW)
2023-04-18 09:40:30.580214 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 82 slow requests are blocked > 32 sec. Implicated osds 15,129,224 (REQUEST_SLOW)
2023-04-18 09:40:37.028340 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 75 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW)
2023-04-18 09:41:01.157178 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 11 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:41:11.173565 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 6 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:44:50.783392 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 19 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW)
2023-04-18 09:44:56.580815 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 22 slow requests are blocked > 32 sec. Implicated osds 15,17,236,247 (REQUEST_SLOW)
2023-04-18 09:45:17.868877 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 26 slow requests are blocked > 32 sec. Implicated osds 15,236 (REQUEST_SLOW)
2023-04-18 09:45:26.620679 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 28 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:30.987311 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 29 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:40.603139 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 30 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:45:50.918547 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 35 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW)
2023-04-18 09:46:00.993006 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 37 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:46:10.744300 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 41 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW)
2023-04-18 09:46:19.179848 7f464c128700  0 log_channel(cluster) log [WRN] : Health check failed: 48 slow requests are blocked > 32 sec. Implicated osds 15,236,247 (REQUEST_SLOW)
2023-04-18 09:46:55.641033 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 70 slow requests are blocked > 32 sec. Implicated osds 6,15,16,19,56,60,70,91,109,110,125,211,213,236 (REQUEST_SLOW)
2023-04-18 09:56:22.814425 7f464c128700  0 log_channel(cluster) log [WRN] : Health check update: 118 slow requests are blocked > 32 sec. Implicated osds 15,213,236 (REQUEST_SLOW)

2023-04-18 09:56:30.503006 7f4649923700  0 log_channel(cluster) log [INF] : osd.15 marked itself down

2023-04-18 10:05:29.076344 7f464c128700  0 log_channel(cluster) log [INF] : Cluster is now healthy

Version-Release number of selected component (if applicable):
RHCS3.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Prashant Dhange 2023-06-09 21:50:49 UTC
*** Bug 2189921 has been marked as a duplicate of this bug. ***

Comment 4 Scott Ostapovicz 2023-07-12 12:39:51 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.