Description of problem: Currently we are using these proxy timeout settings for all services: defaults log global maxconn 4096 mode tcp retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s Both glance, heat has an API call which haves the service to connect to an external service: for Ex.: time glance --os-image-api-version 1 image-create --disk-format=qcow2 --container-format=bare --copy-from=http://example.com 504 Gateway Time-out: The server didn't respond in time. (HTTP N/A) real 1m1.020s user 0m0.531s sys 0m0.095s glance does not needs to save the full url, before it returns, but it wants have at least a HEAD. BTW, Long responding API calls also possible, when you for ex. asking something about 10k item. In my case the server was not responded in time, because first it tried on IPv6 (I have no external IPv6 connectivity, just local), but it can be delayed by any other reasons. (iptables DROP ..) I would not notice the issue, beside it is slow if the proxy timeout would be 3 minute. Actual results: The haproxy is inpatient, or the services does not have built in response deadlines. haproxy disconnects from the backend server before it generates the response, and returns with 504 to the client instead of the real response. The Glance api log has this kind of traces : 2016-11-03 07:33:35.043 12918 INFO eventlet.wsgi.server [req-749fa599-f55a-432b-aac7-966fcb5381c4 cab7f2d4f87e46a493775b3fc87be0d4 2ab73c577de146f2816b311485b63340 - default default] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 512, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 453, in write wfile.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 385, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 379, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 366, in _send_loop return send_method(data, *args) error: [Errno 104] Connection reset by peer When I directly connect to the backend server, It worked in 130 sec. Expected results: haproxy does not breaks the connection to backend server, when the server is able to response. 1 minute without response does not means the server is dead. 3 minute also does not means it, but in the above case it would be better. Alternatively, all service must be somehow convinced to ALLWAYS response within 50 sec (less than the proxy timeout), even if it is an 503. Not the haproxy's responsibility to kill a (potentially) valid in-progress request.
Ryan, can you please check current OSP10 haproxy.conf and see what changes could/should be done?
The timeout that is coming into play here is the 'timeout server 1m'. We've had other requests to increase this timeout in the past, but ultimately we choose not to because it never seems to be enough. If we increase the timeout to 2 minutes, next month somebody will want it to be 3 minutes, and next year 5 minutes. If glance and heat need longer timeouts, I suggest we set them in the proxy definition (listen or frontend block). This will override the timeout from defaults, but only for those proxies.
Another 504 hit: https://bugzilla.redhat.com/show_bug.cgi?id=1394155 What would happen if you just delete that option ?
Think in a different way for minute, what if I say proxy has to be at least as patient as the client ?
(In reply to Attila Fazekas from comment #4) > Think in a different way for minute, what if I say proxy has to be at least > as patient as the client ? I don't understand this question. Patient in what way? Are you talking about the 'timeout client'?
If the client itself is not giving up on waiting for the server, why the intermediate proxy would break the connection ? Why the proxy does not wait as long as the client is willing to wait ? Why the proxy thinks he can judge an openstack api service, and punishing it by breaking the client connection because of the server being slow ? <joking> If we would like to keep following the already implemented not patient pattern, I would recommend also automatically deleting the services when they are not responding in time, and I also recommend decreasing the time limit to 5 sec instead of 60sec. </joking> I read the haproxy doc, looks like the `doc` likes this limited timeout thing what we are doing now, but I have doubts about really it is the right way for any openstack service.
*** This bug has been marked as a duplicate of bug 1289315 ***
*** Bug 1593811 has been marked as a duplicate of this bug. ***