Bug 1668368

Summary: image upload failed even swift-object-server failed on one of controller node in 3 controller nodes environment
Product: Red Hat OpenStack Reporter: Meiyan Zheng <mzheng>
Component: openstack-swiftAssignee: Christian Schwede (cschwede) <cschwede>
Status: CLOSED NOTABUG QA Contact: Gal Amado <gamado>
Severity: high Docs Contact: Tana <tberry>
Priority: medium    
Version: 13.0 (Queens)CC: cschwede, derekh, zaitcev
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---Flags: cschwede: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-02 11:55:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Christian Schwede (cschwede) 2019-01-31 10:33:21 UTC
So the problem here is that the container is paused, not stopped. 

If it is paused it will freeze the container, and any network request is simply not answered (not even reset).

Eg., a request to a stopped swift object server container looks like this:

[root@cocurl http://172.17.4.10:6000
curl: (7) Failed connect to 172.17.4.10:6000; Connection refused

However, a curl request to a paused container just hangs - there is no RST sent by the server, it simply waits.

The same happens in the Swift code and it seems like the client hits the timeout then, before Swift has a chance to try another server.
I think we need a better way to handle errors like this.

If you stop the container and try uploading the image, it will succeed (using the two remaining servers).

Comment 9 Christian Schwede (cschwede) 2020-02-06 15:49:14 UTC
*** Bug 1668370 has been marked as a duplicate of this bug. ***