Bug 2112122 - [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests
Summary: [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 4.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 6.1
Assignee: Matt Benjamin (redhat)
QA Contact: Madhavi Kasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-28 21:35 UTC by Michael J. Kidd
Modified: 2023-03-24 18:05 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-24 18:05:50 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 56956 0 None None None 2022-07-28 21:55:57 UTC
Github ceph ceph pull 38350 0 None open osdc: Add objecter fastfail 2022-07-28 21:47:18 UTC
Red Hat Issue Tracker RHCEPH-4966 0 None None None 2022-07-28 21:37:05 UTC

Description Michael J. Kidd 2022-07-28 21:35:55 UTC
Description of problem:
- When backing storage has a problem that prevents a request from being serviceable ( unfound bucket index object, for example ), the RGW will hold the connection/thread waiting indefinitely for the requested asset to become available from backing storage.
- For S3 / Swift HTTP requests, most clients will timeout after 30 to 60 seconds by default, then retry the request.
- This will eventually lead to thread pool exhaustion of the RGW, blocking all client requests.
- The only way to free up the RGW from this state is a service restart.

Version-Release number of selected component (if applicable):
- 4.2z2

How reproducible:
- 100%

Steps to Reproduce:
1. Generate an unfound object situation for a bucket index object
2. Start a loop from client(s) which incur a bucket index op for the injured bucket

Actual results:
- After thread exhaustion, RGW begins responding with 504 errors to all requests, not just those which require the injured object.


Expected results:
- Prompt HTTP 500 (or similar) error returned to the client when a request cannot be serviced due to a known lack of object availability.


Note You need to log in before you can comment on or make changes to this bug.