Bug 2189920
| Summary: | osd already out. but still have slow request on it. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | shiqi <qshi> |
| Component: | RADOS | Assignee: | Prashant Dhange <pdhange> |
| Status: | CLOSED ERRATA | QA Contact: | Pawan <pdhiran> |
| Severity: | high | Docs Contact: | Disha Walvekar <dwalveka> |
| Priority: | unspecified | ||
| Version: | 3.1 | CC: | bhubbard, ceph-eng-bugs, cephqe-warriors, dwalveka, nojha, pdhange, rzarzyns, sostapov, tserlin, vumrao |
| Target Milestone: | --- | ||
| Target Release: | 6.1z4 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ceph-17.2.6-183.el9cp | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Cluster reporting slow requests on OSD which is already out.
Consequence: This is an inconsistent behavior as the OSD which is already out should not have the ops registered against it.
Fix: Do not keep the daemon_state records in the mgr daemon for the down and out osd daemon
Result: Slow requests are not reported for down&out OSD daemon.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-02-08 18:11:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2261930 | ||
*** Bug 2189921 has been marked as a duplicate of this bug. *** Missed the 6.1 z1 window. Retargeting to 6.1 z2. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0747 |
Description of problem: osd already out. but still have slow request on it. 2023-04-18 09:36:58.985728 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 104 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW) 2023-04-18 09:37:58.829200 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 114 slow requests are blocked > 32 sec. Implicated osds 15,46 (REQUEST_SLOW) 2023-04-18 09:38:03.829544 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 119 slow requests are blocked > 32 sec. Implicated osds 15,46,91 (REQUEST_SLOW) 2023-04-18 09:38:45.081818 7f4649923700 0 mon.N-PC-SRH310-187@0(leader) e1 handle_command mon_command({"prefix": "osd out", "ids": ["15"]} v 0) v1 2023-04-18 09:38:45.081867 7f4649923700 0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd=[{"prefix": "osd out", "ids": ["15"]}]: dispatch 2023-04-18 09:38:45.081988 7f4649923700 0 log_channel(cluster) log [INF] : Client client.admin marked osd.15 out, while it was still marked up 2023-04-18 09:38:46.123486 7f464511a700 1 mon.N-PC-SRH310-187@0(leader).osd e30192 e30192: 288 total, 283 up, 266 in 2023-04-18 09:38:46.143387 7f464511a700 0 log_channel(audit) log [INF] : from='client.869908682 -' entity='client.admin' cmd='[{"prefix": "osd out", "ids": ["15"]}]': finished 2023-04-18 09:39:06.392868 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 129 slow requests are blocked > 32 sec. Implicated osds 15,150,224 (REQUEST_SLOW) 2023-04-18 09:39:39.115526 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 176 slow requests are blocked > 32 sec. Implicated osds 15,60,91,99,125,129,146,150,157,224 (REQUEST_SLOW) 2023-04-18 09:40:06.778683 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 76 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW) 2023-04-18 09:40:30.580214 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 82 slow requests are blocked > 32 sec. Implicated osds 15,129,224 (REQUEST_SLOW) 2023-04-18 09:40:37.028340 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 75 slow requests are blocked > 32 sec. Implicated osds 15,224 (REQUEST_SLOW) 2023-04-18 09:41:01.157178 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 11 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:41:11.173565 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 6 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:44:50.783392 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 19 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW) 2023-04-18 09:44:56.580815 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 22 slow requests are blocked > 32 sec. Implicated osds 15,17,236,247 (REQUEST_SLOW) 2023-04-18 09:45:17.868877 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 26 slow requests are blocked > 32 sec. Implicated osds 15,236 (REQUEST_SLOW) 2023-04-18 09:45:26.620679 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 28 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:45:30.987311 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 29 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:45:40.603139 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 30 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:45:50.918547 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 35 slow requests are blocked > 32 sec. Implicated osds 15,110 (REQUEST_SLOW) 2023-04-18 09:46:00.993006 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 37 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:46:10.744300 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 41 slow requests are blocked > 32 sec. Implicated osds 15 (REQUEST_SLOW) 2023-04-18 09:46:19.179848 7f464c128700 0 log_channel(cluster) log [WRN] : Health check failed: 48 slow requests are blocked > 32 sec. Implicated osds 15,236,247 (REQUEST_SLOW) 2023-04-18 09:46:55.641033 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 70 slow requests are blocked > 32 sec. Implicated osds 6,15,16,19,56,60,70,91,109,110,125,211,213,236 (REQUEST_SLOW) 2023-04-18 09:56:22.814425 7f464c128700 0 log_channel(cluster) log [WRN] : Health check update: 118 slow requests are blocked > 32 sec. Implicated osds 15,213,236 (REQUEST_SLOW) 2023-04-18 09:56:30.503006 7f4649923700 0 log_channel(cluster) log [INF] : osd.15 marked itself down 2023-04-18 10:05:29.076344 7f464c128700 0 log_channel(cluster) log [INF] : Cluster is now healthy Version-Release number of selected component (if applicable): RHCS3.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: