2112122 – [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests

Bug 2112122 - [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests

Summary: [RFE] RGW should quickly respond HTTP 500 on non-serviceable requests

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RGW
Sub Component:
Version:	4.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	6.1
Assignee:	Matt Benjamin (redhat)
QA Contact:	Madhavi Kasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-07-28 21:35 UTC by Michael J. Kidd
Modified:	2023-03-24 18:05 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-24 18:05:50 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	56956	None	None	None	2022-07-28 21:55:57 UTC
Github	ceph ceph pull 38350	None	open	osdc: Add objecter fastfail	2022-07-28 21:47:18 UTC
Red Hat Issue Tracker	RHCEPH-4966	None	None	None	2022-07-28 21:37:05 UTC

Description Michael J. Kidd 2022-07-28 21:35:55 UTC

Description of problem:
- When backing storage has a problem that prevents a request from being serviceable ( unfound bucket index object, for example ), the RGW will hold the connection/thread waiting indefinitely for the requested asset to become available from backing storage.
- For S3 / Swift HTTP requests, most clients will timeout after 30 to 60 seconds by default, then retry the request.
- This will eventually lead to thread pool exhaustion of the RGW, blocking all client requests.
- The only way to free up the RGW from this state is a service restart.

Version-Release number of selected component (if applicable):
- 4.2z2

How reproducible:
- 100%

Steps to Reproduce:
1. Generate an unfound object situation for a bucket index object
2. Start a loop from client(s) which incur a bucket index op for the injured bucket

Actual results:
- After thread exhaustion, RGW begins responding with 504 errors to all requests, not just those which require the injured object.


Expected results:
- Prompt HTTP 500 (or similar) error returned to the client when a request cannot be serviced due to a known lack of object availability.

Note You need to log in before you can comment on or make changes to this bug.