Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2249651

Summary: multisite: DeleteObjects requests may deadlock in RGWDataChangesLog
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Casey Bodley <cbodley>
Component: RGW-MultisiteAssignee: Casey Bodley <cbodley>
Status: CLOSED ERRATA QA Contact: Hemanth Sai <hmaheswa>
Severity: high Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 7.0CC: akraj, ceph-eng-bugs, cephqe-warriors, tserlin, vereddy
Target Milestone: ---   
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.1-21.el9cp Doc Type: Bug Fix
Doc Text:
.Ceph Object Gateway no longer deadlocks during object deletion Previously, during object deletion, the Ceph Object Gateway S3 DeleteObjects would run together with a multi-site deployment, causing the Ceph Object Gateway to deadlock and stop accepting new requests. This was caused by the DeleteObjects requests processing several object deletions at a time. With this fix, the replication logs are serialized and the deadlock is prevented.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-13 14:23:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2267614, 2298578, 2298579    

Description Casey Bodley 2023-11-14 15:49:16 UTC
Description of problem:

the s3 DeleteObjects operation was changed in https://github.com/ceph/ceph/pull/48679 to support concurrent object deletes. there have been several reports from upstream users that this leads to deadlocks when multisite is enabled

https://tracker.ceph.com/issues/63373 contains the stack traces of several threads that are blocked trying to acquire a mutex in LazyFIFO::lazy_init()


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. deploy a multisite configuration with at least 2 zones
2. create a bucket and upload many objects
3. delete the objects in bulk, for example with `s3cmd rm -r s3://some-large-bucket`

Comment 1 RHEL Program Management 2023-11-14 18:03:58 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 7 errata-xmlrpc 2024-06-13 14:23:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925