Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 2112122

Summary: [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Michael J. Kidd <linuxkidd>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED NOTABUG QA Contact: Madhavi Kasturi <mkasturi>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2CC: bniver, cbodley, ceph-eng-bugs, cephqe-warriors, gjose, jdurgin, kbader, kkeithle, lithomas, mbenjamin, mmuench, nojha, vumrao
Target Milestone: ---Keywords: FutureFeature
Target Release: 6.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-24 18:05:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael J. Kidd 2022-07-28 21:35:55 UTC
Description of problem:
- When backing storage has a problem that prevents a request from being serviceable ( unfound bucket index object, for example ), the RGW will hold the connection/thread waiting indefinitely for the requested asset to become available from backing storage.
- For S3 / Swift HTTP requests, most clients will timeout after 30 to 60 seconds by default, then retry the request.
- This will eventually lead to thread pool exhaustion of the RGW, blocking all client requests.
- The only way to free up the RGW from this state is a service restart.

Version-Release number of selected component (if applicable):
- 4.2z2

How reproducible:
- 100%

Steps to Reproduce:
1. Generate an unfound object situation for a bucket index object
2. Start a loop from client(s) which incur a bucket index op for the injured bucket

Actual results:
- After thread exhaustion, RGW begins responding with 504 errors to all requests, not just those which require the injured object.


Expected results:
- Prompt HTTP 500 (or similar) error returned to the client when a request cannot be serviced due to a known lack of object availability.