DescriptionMichael J. Kidd
2022-07-28 21:35:55 UTC
Description of problem:
- When backing storage has a problem that prevents a request from being serviceable ( unfound bucket index object, for example ), the RGW will hold the connection/thread waiting indefinitely for the requested asset to become available from backing storage.
- For S3 / Swift HTTP requests, most clients will timeout after 30 to 60 seconds by default, then retry the request.
- This will eventually lead to thread pool exhaustion of the RGW, blocking all client requests.
- The only way to free up the RGW from this state is a service restart.
Version-Release number of selected component (if applicable):
- 4.2z2
How reproducible:
- 100%
Steps to Reproduce:
1. Generate an unfound object situation for a bucket index object
2. Start a loop from client(s) which incur a bucket index op for the injured bucket
Actual results:
- After thread exhaustion, RGW begins responding with 504 errors to all requests, not just those which require the injured object.
Expected results:
- Prompt HTTP 500 (or similar) error returned to the client when a request cannot be serviced due to a known lack of object availability.