Bug 1269488

Summary:	ha-proxy causes 'Connection reset by peer' when reloading configuration
Product:	OpenShift Container Platform	Reporter:	Marek Schmidt <maschmid>
Component:	Networking	Assignee:	Ben Bennett <bbennett>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, bbennett, bmeng, ccoleman, dzhukous, eparis, jdiaz, jkaur, jwendell, maschmid, mlazar, pep, rohara, slaskawi, tdawson, xtian, zzhao
Version:	3.0.0	Flags:	bbennett: needinfo-
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Erroneous documentation. Consequence: The pod did not have enough privilege to edit iptables. Fix: Updated the docs with the correct procedure. Result: It works.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-09-27 09:30:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1303130, 1267746

Description Marek Schmidt 2015-10-07 12:21:30 UTC

Description of problem:

Whenever there is any haproxy configuration change (e.g. new route added, new pods scaled up, etc.), some of the client connections will be "reset by peer".

Version-Release number of selected component (if applicable):

OSE 3.0.2.0

rcm-img-docker01.build.eng.bos.redhat.com:5001/openshift3/ose-haproxy-router:v3.0.2.0

How reproducible:

Not easily

Mostly in a load test  (only a small fraction of client requests seem to be affected, also the test machine must not be too near, as presumably the connection reset occurs in some small window when the connection is opened. My test has about 175ms pings to the OSE router)

Steps to Reproduce:
1. deploy any HTTP application  (e.g. cakephp-example )
2. create another route definition  (e.g. oc get route -o json > route.json   and edit the names and hostnames to not conflict with the cakephp route

3. run apache bench from a machine not too close to the OSE instances (my test machine has 175ms pings to the OSE router).

ab -v 2 -r -n 20000 -c 64 http://cakephp-example-foo.cloudapps.example.com/ > ab.log 

4. Randomly run   'oc create -f route.json && oc delete route'

Actual results:

ab will report

...
apr_socket_recv: Connection reset by peer (104)
...

immediately after any router configuration change

Expected results:

no connection resets on router changes

Additional info:

Comment 2 Clayton Coleman 2015-10-29 17:59:19 UTC

Does cakephp respond to graceful deletion correctly?  What docker image and app are you running when you experience this?

Comment 3 Ben Bennett 2015-10-29 18:26:49 UTC

Looks like we are doing the right thing in our scripts and requesting a soft reload (with -sf $old_pid).  What that does is make haproxy bind a second daemon to the same port, and when it is listening, it signals to the old daemon to finish the requests it's handling, but not listen for more.

UNFORTUNATELY there's a problem with the SYN packets getting put in the wrong queue... so connections get reset.

The issue is well described by:
  http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html

But that fix is rather... intricate.  There are others to add iptables rules to drop the SYNs while haproxy reloads.  But the haproxy devs appear to be aware of the issue and are looking at fd passing to resolve this.

So... do we want to implement a hack to handle this?

Searching for 'haproxy soft reload "connection reset by peer"' gives good results.

Comment 5 Ben Bennett 2015-10-30 17:47:03 UTC

To summarize what I have found so far:
 - OpenShift 2 seems to have exhibited the same behavior... there is no difference between the way that OS2 and OS3 cause the reload to happen
 - We don't have permission (when the router is run with host-network=true and the privileged scc it is running under has: allowHostNetwork: true; allowPrivilegedContainer: true)
 - That this behavior is known and documented in the haproxy management.txt file:
  http://www.haproxy.org/download/1.6/doc/management.txt


I'm investigating what it would take to implement the iptables solution in a container.  So far it looks ugly:
 - Need to set the SCC to have:
    allowedCapabilities:
    - NET_ADMIN
 - Need to edit the RC to have:
    spec:
      template:
        spec:
          containers:
            securityContext:
              capabilities:
                add:
                - NET_ADMIN
 - Then in the reload script for haproxy:
    iptables -I INPUT -p tcp -m multiport --dports $PORTS --syn -j DROP
    sleep 1
    /usr/sbin/haproxy -f $config_file -p $pid_file -sf $old_pid
    iptables -D INPUT -p tcp -m multiport --dport $PORTS --syn -j DROP

Comment 7 Matej Lazar 2015-12-09 13:58:10 UTC

It seems our EAP deployment is affected by this same issue.
We are running OSE 3.0.

To reproduce:
for i in {1..5000}; do  curl http://our.openshift.redhat.com/pnc-rest/rest/running-build-records/1193 && echo " ($(date)) \n" ; done

When we add or delete dummy route, we get "Connection reset by peer".
The error does not show up every time but it is reproducible in at least 10% of route changes.

Comment 10 Ben Bennett 2016-01-04 20:52:49 UTC

PR is in progress at https://github.com/openshift/origin/pull/6472

Comment 13 Ben Bennett 2016-03-11 13:55:26 UTC

Resolved by: https://github.com/openshift/origin/pull/6472

This fix prevents traffic to haproxy getting dropped if it connects while the reload is in progress.

You need to change your router to have an environment variable set:
oc set env dc/router -c router DROP_SYN_DURING_RESTART=true

Once that has been set, and the router has restarted, any subsequent reload will have an iptables change in place to eat the SYN packets to make the hand-over not drop packets. The downside is that it will make the reloads seem to take longer. The kernel networking team has a bug open on the root cause.

Comment 14 zhaozhanqi 2016-03-14 03:13:27 UTC

this issue still can be reproduced

Tested on devenv_rhel_3075 with router images

openshift/origin-haproxy-router          latest              b5436007264f        44 hours ago

steps:

1. Create hello-openshift pod/service/route
2. using ab to stress the URL
3. Create another route during the step 2

[root@ip-172-18-0-105 ~]# ab -v 2 -r -n 2000000 -c 64 http://hello-service-default.router.default.svc.cluster.local/ >htllo.log
Completed 200000 requests
Completed 400000 requests
Completed 600000 requests
Completed 800000 requests
Completed 1000000 requests
apr_socket_recv: Connection reset by peer (104)
apr_socket_recv: Connection reset by peer (104)
apr_socket_recv: Connection reset by peer (104)
apr_socket_recv: Connection reset by peer (104)
apr_socket_recv: Connection reset by peer (104)
apr_socket_recv: Connection reset by peer (104)
Completed 1200000 requests
Completed 1400000 requests
Completed 1600000 requests
Completed 1800000 requests
Completed 2000000 requests
Finished 2000000 requests

Comment 15 zhaozhanqi 2016-03-14 03:31:46 UTC

BTW: forgot to mention in the comment 13, I had set 'oc set env dc/router -c router DROP_SYN_DURING_RESTART=true' in the testing

Comment 16 Ben Bennett 2016-03-14 14:03:11 UTC

(In reply to zhaozhanqi from comment #15)
> BTW: forgot to mention in the comment 13, I had set 'oc set env dc/router -c
> router DROP_SYN_DURING_RESTART=true' in the testing

Did you restart the router after setting that environment variable?

Comment 17 zhaozhanqi 2016-03-15 02:27:28 UTC

yes, Ben.  when setting an env variable to dc/router. The router will be re-deploy automatically.

Comment 18 Josep 'Pep' Turro Mauri 2016-03-31 13:05:21 UTC

(In reply to Ben Bennett from comment #13)
> The kernel networking team has a bug open on the root cause.

For completeness: do you have a bz id?

Comment 19 Ben Bennett 2016-05-03 15:16:22 UTC

Doc PR https://github.com/openshift/openshift-docs/pull/1987

Comment 20 Ben Bennett 2016-05-27 14:56:21 UTC

Kernel bug - https://bugzilla.redhat.com/show_bug.cgi?id=1203000

Comment 21 Ben Bennett 2016-05-27 14:57:25 UTC

Can you make sure that you followed all the steps in the doc PR to get it set up?  It needs to run in the privileged SCC to be able to use iptables.

Comment 22 zhaozhanqi 2016-05-30 03:43:18 UTC

@Ben Bennett

Just test this using privileged scc when creating router. this issue did not reproduced.

BTW. since router is using hostnetwork scc as default. So I doubt some customer still can meet this issue when using hostnetwork scc router.

Comment 23 zhaozhanqi 2016-05-30 03:52:22 UTC

@Ben Bennett

seems this issue still can be reproduced even if using privileged scc

the weird things: cannot initialize iptables even if it's a root user.

sh-4.2# id      
uid=0(root) gid=0(root) groups=0(root)
sh-4.2# iptables-save
iptables-save v1.4.21: Cannot initialize: Permission denied (you must be root)

Comment 24 Ben Bennett 2016-05-31 17:29:41 UTC

@zhaozhanqi: You need to have CAP_NET_ADMIN... but privileged should give you that.  If you are getting that error, then it is not set up correctly.

Comment 26 Jonh Wendell 2016-06-17 13:38:27 UTC

Same here... I followed the workaround suggestion but it didn't work...

I'm still getting errors like 'Remote host closed connection during handshake' due to connection being dropped by router...

Comment 28 Ben Bennett 2016-08-17 20:03:38 UTC

I'm working on this at the moment and something with the capabilities has changed since the version I tested with.  I'm investigating alternatives for how we can make this work now.

Comment 29 Ben Bennett 2016-08-18 17:56:05 UTC

Added https://github.com/openshift/origin/pull/10514 to support 'true' for DROP_SYN_DURING_RESTART (as the docs stated, but really only '1' was supported)

Fixed the docs with https://github.com/openshift/openshift-docs/pull/2680


For any customers on 3.2, the correct steps are:

$ oadm policy add-scc-to-user privileged -z router

$ oc patch dc router -p '{"spec":{"template":{"spec":{"containers":[{"name":"router","securityContext":{"privileged":true}}],"securityContext":{"runAsUser": 0}}}}}'

$ oc set env dc/router -c router DROP_SYN_DURING_RESTART=1

Comment 30 Troy Dawson 2016-08-19 21:21:02 UTC

This has been merged into ose and is in OSE v3.3.0.23 or newer.
If this is going to be backported to older versions, please let me know or clone this bugzilla for the older versions.

Comment 31 zhaozhanqi 2016-08-22 10:31:32 UTC

Checked this bug on those two version:

1)# openshift version
openshift v3.3.0.23-dirty
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

with router imager v3.3.0.23 (id=3502a6052613)

2)
# openshift version
openshift v3.2.1.13-1-gc2a90e1
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-haproxy-router    v3.2.1.13           f8e807bd101b

and in my testing the issue does not be reproduced, and also I use hostnetwork scc for router, seems we do not specified 'privileged' scc for v3.3.0.23

# ab -v 2 -r -n 2000000 -c 64 http://service-unsecure-zzhao.0822-3yz.qe.rhcloud.com/ > hello.log
Completed 200000 requests
Completed 400000 requests
Completed 600000 requests
Completed 800000 requests
Completed 1000000 requests
Completed 1200000 requests
Completed 1400000 requests
Completed 1600000 requests
Completed 1800000 requests
Completed 2000000 requests
Finished 2000000 requests

Comment 32 zhaozhanqi 2016-08-23 02:53:17 UTC

Verified this bug according to comment 31

Comment 34 errata-xmlrpc 2016-09-27 09:30:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Comment 36 Ben Bennett 2018-02-08 16:07:32 UTC

This is fixed in 3.9 by https://bugzilla.redhat.com/show_bug.cgi?id=1464657