Bug 2228635
Summary: | (mds.1): 3 slow requests are blocked | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Scott Nipp <snipp> | |
Component: | CephFS | Assignee: | Xiubo Li <xiubli> | |
Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> | |
Severity: | high | Docs Contact: | Rivka Pollack <rpollack> | |
Priority: | unspecified | |||
Version: | 6.1 | CC: | akraj, bkunal, ceph-eng-bugs, cephqe-warriors, gfarnum, mcaldeir, ngangadh, pdonnell, tserlin, vumrao, xiubli | |
Target Milestone: | --- | |||
Target Release: | 7.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | ceph-18.2.0-46.el9cp | Doc Type: | Bug Fix | |
Doc Text: |
.Deadlocks no longer occur between the unlink and reintegration requests
Previously, when fixing async dirop bug, a regression was introduced by previous commits, causing deadlocks between the unlink and reintegration request.
With this fix, the old commits are reverted and there is no longer a deadlock between unlink and reintegration requests.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2233131 (view as bug list) | Environment: | ||
Last Closed: | 2023-12-13 15:21:28 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2233131, 2237662 |
Description
Scott Nipp
2023-08-02 22:37:34 UTC
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity. We have already requested from the customer... Can you get us an SOS report off your lead MON node and also upload all the MDS logs from every node hosting and MDS instance? Can you also list blocked ops and in flight ops and redirect that output to a file? Attach that file to the case also Please let us know if there is anything additional you would like for us to obtain from the customer for this BZ. So BofA is still experiencing occassional occurrences of slow/blocked ops on clusters that have been upgraded to 6.1z1. In their PVCEPH cluster they had another occurrence @ Thu Aug 17 04:50:13 EDT 2023. They provided the following files uploaded to SupportShell in case 03578367. ceph-mds.root.host3.wnboxv.log <-- mds.1 before fail @ Thu Aug 17 04:50:13 EDT 2023 ceph-mds.root.host7.oqqvka.log <-- mds.1 after fail @ Thu Aug 17 04:50:13 EDT 2023 mds.1.1692262213.failed.tar.gz <-- taken before mds.1 fail @ Thu Aug 17 04:50:13 EDT 2023 pvceph.ceph.config.dump.mds.txt In their PTCEPH cluster they are reporting 4 occurrences since 8/6/2023. Here is a snapshot of those files in SupportShell: $ yank 03590519 Authenticating the user using the OIDC device authorization grant ... The SSO authentication is successful Initializing yank for case 03590519 ... Retrieving attachments listing for case 03590519 ... | IDX | PRFX | FILENAME | SIZE (KB) | DATE | SOURCE | CACHED | |-------|--------|--------------------------------------------|-------------|----------------------|----------|----------| | 1 | 0010 | mds.1.1692238512.failed.tar.gz | 33.62 | 2023-08-17 15:22 UTC | S3 | No | | 2 | 0020 | ceph-mds.root.host4.duplag.log-20230817.gz | 12417.59 | 2023-08-17 15:22 UTC | S3 | No | | 3 | 0030 | ceph-mds.root.host0.djvost.log-20230817.gz | 149948.99 | 2023-08-17 15:22 UTC | S3 | No | See KCS article #7031927, (https://access.redhat.com/solutions/7031927) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780 |