Bug 1405440

Summary:	HAProxy is forcefully restarted due to not responding to /healthz probe when under high load
Product:	OpenShift Container Platform	Reporter:	Jiří Mencák <jmencak>
Component:	Networking	Assignee:	Phil Cameron <pcameron>
Networking sub component:	router	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, bbennett, bperkins, ccoleman, eparis, jeder, jkaur, jmencak, pcameron, ramr, tdawson
Version:	3.4.0
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	aos-scalability-34
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Max connections is too low see openshift-docs PR 3609 Consequence: pod restarts Fix: increase default value Result:	Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-04-12 19:07:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jiří Mencák 2016-12-16 14:07:51 UTC

Description of problem:

HAProxy has a default (maxconn) limit of 2000 sessions. It is the HAProxy's per-process max number of concurrent connections. When this limit is reached, HAProxy stops listening for new connections including :1936/healthz probes. This triggers a health-check failure and HAProxy is forcefully restarted.

Version-Release number of selected component (if applicable):
$ oc version
oc v3.4.0.34+87d9d8d
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-35-3.us-west-2.compute.internal:8443
openshift v3.4.0.34+87d9d8d
kubernetes v1.4.0+776c994

$ oc rsh router-6-zoksy
sh-4.2$ haproxy -vv
HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau <willy>

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

How reproducible:
Always when tested with JMeter, vegeta and wrk2 test clients and choosing appropriate test options. First discovered on a 10 node EC2 cluster with m4.xlarge instances (https://mojo.redhat.com/docs/DOC-1086531). Minimal reproducible environment I successfully tested the restart on was a 2 node EC2 cluster.

Steps to Reproduce:
1. Create an EC2 m4.xlarge OCP 3.4.0.34 environment with a master+router and a worker node serving an nginx pod. I've used https://github.com/jmencak/projects/blob/master/haproxy/apps/nginx/nginx.json which also creates a service and a route towards the pod.
2. Get a HTTP load testing tool of your choice, I suggest vegeta or wrk2. For options I used with vegeta, please see the mojo page above. With wrk2 (https://github.com/giltene/wrk2 or https://github.com/jmencak/wrk2), I caused a HAProxy reload with these options:

./wrk \
-t300 -c3000 -d300s \
-R10000 \
http://nginx-route-example-1-nginx-1.router.default.svc.cluster.local/1024.html

The wrk2 load testing tool was started on the node where the nginx pod ran. I could probably lower the rate and the number of connections, but this made HAProxy restart within a minute.

Actual results:
DoS: HAProxy was restarted even though it was perfectly fine.

Expected results:
Do not restart HAProxy when there is no need to.

Additional info:
After raising the limit of maxconn to 20480 as I've seen in one OpenStack installation, I wasn't able to cause the restarts within this or the 10n EC2 environments. I've seen other Linux distributions to configure the default limit of HAProxy to 32000. While raising the limit might be a temporary solution, perhaps having some other way of establishing whether HAProxy is doing fine might be the correct approach to this issue?

Comment 1 Jeremy Eder 2016-12-19 13:20:58 UTC

2k is really low.  Jiri, would you please file another BZ for raising maxconn according to your research please?

---

I believe our health check config for haproxy allows for 30 seconds of failed health checks to the stats listener before restarting the pod.  Lengthening that delay trades reactivity to truly-failed haproxy pods for potential resilience to this issue.

# oc describe pod router-1-rrk87 -n default|grep ness
    Liveness:           http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:          http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3

If we're going to use this method as our health check for haproxy, then I think haproxy should be configured to always allow sessions to be created against the stats listener, even after maxconn is reached (we're looking for a config mix that allows this, might need code changes to haproxy).

The current state means kube will restart haproxy at the worst possible moment, when an app is under high load (hopefully for a good reason).

Comment 2 Jiří Mencák 2016-12-20 10:11:20 UTC

Linking associated BZ to raise the connection limit.

https://bugzilla.redhat.com/show_bug.cgi?id=1406327

Comment 3 Ben Bennett 2016-12-20 15:55:28 UTC

For the stats we need to use: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#3.1-stats%20maxconn (and perhaps also 'stats timeout')

I'm not sure if we need to expose this as a tunable knob, or if we should just increase it.

Comment 5 Jiří Mencák 2017-01-02 14:27:16 UTC

I believe that increasing 'stats maxconn' would not help in this case.  Check out the function "void listener_accept(int fd)" in haproxy-1.5.18/src/listener.c and look for global.maxconn within that function.  The listener simply stops accepting new connections including those targeting the stats listener (at :1936).

How about changing the health-check to target HAProxy's stats UNIX domain socket in stream mode.

Something like this:

          exec:
            command:
            - /bin/sh
            - -c
            - echo show info | socat - UNIX-CONNECT:/var/lib/haproxy/run/haproxy.sock
              | grep -q "^Name:\s*HAProxy"

instead of

          httpGet:
            host: localhost
            path: /healthz
            port: 1936
            scheme: HTTP

Gave it a quick test and it seem to be working fine.  No more unnecessary HAProxy restarts.

Comment 6 Eric Paris 2017-01-03 18:33:54 UTC

If haproxy is not responding to /healthz it is broken/not working. The whole point of haproxy is to respond to requests. I feel like the only 'bug' here is that when haproxy is so taxed and is failing to do its one and only job the only thing our system can do is restart the container. Which likely doesn't help a lot.

Eventually having kubernetes take other reactions might make sense. Vertical resize of haproxy? Send a page to the guy on duty? Something else?

But to me it seems like 'fixing' the base image to allow more connections is a good thing to do. Changes to the health check feels like a bad idea. haproxy is failing, the health check should fail...

Comment 7 Jeremy Eder 2017-01-04 01:51:54 UTC

(In reply to Eric Paris from comment #6)
> If haproxy is not responding to /healthz it is broken/not working. The whole
> point of haproxy is to respond to requests.

At least IMO, what's wrong here is that there's no way to tell haproxy that kubernetes health-checks are "special" as described in c#1.  Literally the worst thing our platform can do to haproxy is restart it while under legitimate load.

> I feel like the only 'bug' here
> is that when haproxy is so taxed and is failing to do its one and only job
> the only thing our system can do is restart the container. Which likely
> doesn't help a lot.
>
> Eventually having kubernetes take other reactions might make sense. Vertical
> resize of haproxy? Send a page to the guy on duty? Something else?
> 
> But to me it seems like 'fixing' the base image to allow more connections is
> a good thing to do.

Indeed.  I think the increase in maxconns via https://bugzilla.redhat.com/show_bug.cgi?id=1406327 will significantly reduce the likelihood of hitting this in the real world, and is low risk.  But we still need a real fix.

> Changes to the health check feels like a bad idea.
> haproxy is failing, the health check should fail...

Subtle yet important difference:  it's busy, not failed.  Busy is (mostly) transient and failed is forever.

Jiri, would you mind kicking off a thread on the haproxy mailing list please?  Perhaps there's something we're missing?
http://www.haproxy.org/#tact

Comment 8 Clayton Coleman 2017-01-18 17:40:55 UTC

Rejecting connections to the main ports is "healthy" behavior.  Rejecting connections to the main ports because of a broken configuration is "unhealthy" behavior.  What configuration can we do that allows both of those to be distinguished?

Comment 9 Jeremy Eder 2017-01-18 18:44:54 UTC

Upstream post:  https://www.mail-archive.com/haproxy@formilux.org/msg24553.html

Comment 10 Ben Bennett 2017-01-27 16:33:55 UTC

*** Bug 1406327 has been marked as a duplicate of this bug. ***

Comment 11 Phil Cameron 2017-01-27 16:58:05 UTC

There is a process maxconn, frontend maxcon and server maxcoo
nn. The frontend and server can be set for each frontend and server. By default  
all are 2000.

Unless otherwise specified the global maxconn applies everywhere.


Proposal:
Add oadm --router --max-connections= (default 20000) option.
This will generate an environment variable ROUTER_MAX_CONNECTIONS
The default when the env variable is not present is 20000.

Does this satisfy the current needs?

Comment 12 Ben Bennett 2017-01-27 18:11:46 UTC

Seems like a good start.  That will set the global maxconn?

Comment 13 Jeremy Eder 2017-01-30 18:43:07 UTC

Hi, can someone point us to the github commit that changed the router default to 20,000 please?

Comment 14 Phil Cameron 2017-01-30 18:49:47 UTC

Jeremy,
PR 12716 
This is a fix for this bug. oadm router --max-connections=20000
Or edit env var ROUTER_MAX_CONNECTIONS
If you don't have an env the haproxy-template defaults to 20000
PTAL at the proposed fix.

Comment 15 Jeremy Eder 2017-01-31 15:25:21 UTC

Awesome, that looks good for covering both of these BZs.

Question: Where is a user supposed to set ROUTER_MAX_CONNECTIONS?  In their shell environment before running the command?  Somewhere in /etc/sysconfig?

Wasn't immediately clear to me.

Comment 16 Jeremy Eder 2017-01-31 15:27:40 UTC

last question:  is this configurable in openshift-ansible somehow?

Comment 17 Phil Cameron 2017-01-31 15:33:37 UTC

jeder
The DC for the router is where the ROUTER_MAX_CONNECTIONS environment variable goes. See DOCS PR 3609

oadm router will create it. You can add it to DC for existing routers. If its not there, 20000 is used.

Comment 18 Jeremy Eder 2017-01-31 15:36:16 UTC

IMO, this needs to be plumbed up through installer, as well.  Any chance you could look into that, to round-out this new feature?

What happens during upgrades?  Is the default changed when a user goes from 3.4->3.5?

Comment 19 Phil Cameron 2017-01-31 15:36:47 UTC

jeder

I don't know enough about ansible to really comment. It is not a default in any of the config files.

When you create a router
oadm router <name> --max-connections=12345 
will create it.

Comment 20 Phil Cameron 2017-01-31 15:42:55 UTC

jeder

As for upgrades, 3.5 router image uses "maxconn 20000" unless told otherwise (via ROUTER_MAX_CONNECTIONS environment variable).

3.4 and earlier use 2000.

Comment 21 Jeremy Eder 2017-01-31 17:16:44 UTC

Perfect, thanks.  So upgrades will automatically go from 2000->20,000.

I filed an issue for ansible work to plumb ROUTER_MAX_CONNECTIONS through:
https://github.com/openshift/openshift-ansible/issues/3233

Comment 22 Phil Cameron 2017-02-02 18:18:49 UTC

jeder
It appears that increasing the default causes the tests to fail on systems with limited resources. Moving to 20000 by default on upgrade will break otherwise working systems. I changed the PR to default to 2000 which is the current default. The value can be changed as described in openshift-docs PR 3609

Comment 23 Jeremy Eder 2017-02-02 18:35:44 UTC

Hmm, what resource limit are we hitting in those environments?  Why not just also increase that one?  What kind of environments are these, just the CI one for openshift?

Comment 24 Phil Cameron 2017-02-02 18:48:42 UTC

jeder #23

I don't know for sure. The jenkins tests are failing. Rajat ran into 'sysctl fs.nr_open' and/or 'sysctl fs.file-max' being too small. See PR 12716 also docs PR 3609.

I don''t know how to find the setting in the current test environment. The tests work for me on lab machines and I have fs.nr = 1048576, fs.file is 13094283.

Comment 25 Jeremy Eder 2017-02-02 19:19:10 UTC

There has got to be a way to actually debug that test failure.

Comment 26 Ben Bennett 2017-02-03 14:26:38 UTC

Related to the RFE https://bugzilla.redhat.com/show_bug.cgi?id=1418905

Comment 27 Ben Bennett 2017-02-03 15:53:26 UTC

@jeder: Phil and I dug through the log file line-by-line and found that the environment passed to the router config has MAX_CONNS of 0.  Further investigation showed that the cluster for that test was created with 'oc cluster up'.  Digging into that, we found that the defaults are set in 'oc cluster up' and passed in to create the cluster.  I propose that we treat 0 as 'use the default' since the data structure will have 0 when not otherwise initialized, and 0 is a nonsensical value.

Comment 28 Phil Cameron 2017-02-03 16:07:28 UTC

@jeder: Also we decided to change the default to 20000 since that was not the issue and it appears that configurations that we care about have high enough limits.

Comment 29 Jeremy Eder 2017-02-03 16:31:23 UTC

Ack, thank you!

Comment 30 openshift-github-bot 2017-02-07 04:07:45 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/119f9b1583f88f5d49fe373850c878f82c8ceb51
Make haproxy maxconn configurable

The haproxy maxconn (maximum connections) is by default 20000. This
was previously 2000. This change makes that configurable through the
oadm router --max-connections= option when creating a router.  For
existing routers the value can be set in the ROUTER_MAX_CONNECTIONS
environment variable. If ROUTER_MAX_CONNECTIONS is missing, the
deafult (20000) is used.

openshift-docs PR 3609

bug 1405440
https://bugzilla.redhat.com/show_bug.cgi?id=1405440

Comment 31 Troy Dawson 2017-02-08 22:18:27 UTC

This has been merged into ocp and is in OCP v3.5.0.18 or newer.

Comment 33 zhaozhanqi 2017-02-09 05:38:58 UTC

Verified this bug on OCP v3.5.0.18

the 'maxconn' will be got from `--max-connections` or 'oc env dc router ROUTER_MAX_CONNECTIONS'

Comment 34 openshift-github-bot 2017-03-07 08:09:18 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/5b708a582d25b103f187207b7ac93db553192c67
Fix of BUG 1405440

Using TCPSocketAction as the liveness probe, which will not be affected by
the connection limit set in HAProxy's config file. This is a TRUE fix for
BUG 1405440.

Comment 35 openshift-github-bot 2017-03-11 22:18:34 UTC

Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/0d8009fa6deb5383a74639af795e80aec6ef473f
Revert "Fix of BUG 1405440"

This reverts commit 5b708a582d25b103f187207b7ac93db553192c67.

Comment 36 Jeremy Eder 2017-03-15 14:18:12 UTC

This BZ is in verified state but the fix has been reverted, so I think the BZ state needs to be updated (to ASSIGNED, perhaps?)

https://github.com/openshift/origin/pull/13331#issue-213182468

pecameron commented 6 days ago • edited
The fix to use a TCP connection check to check whether the HAProxy
process is alive or not doesn't work without a iptables rule for port
1936. The original test using HTTPGet works because HTTPGet supports
a Host field that can be set with "localhost" when host networking is
used. The TCPSocketAction does not support a Host field.

Rolling back the fix until a new fix is developed.

bug 1430729

Comment 37 Ben Bennett 2017-03-15 14:27:03 UTC

@Jeremy: I'm not convinced.  The first fix (upping max connections) is still in.  It's only the later change that has been reverted.  Given that the new max connections change will probably resolve the problem, I am okay leaving this in VERIFIED.  The other change should probably have gone in under a separate bug anyway to avoid this kind of confusion.

Comment 39 openshift-github-bot 2017-03-15 15:34:16 UTC

Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/82ea1c37c5cce9aaeeee19d0826e36b312e6365f
Make haproxy maxconn configurable

The haproxy maxconn (maximum connections) is by default 2000. This
was previously 2000. This change makes that configurable through the
oadm router --max-connections= option when creating a router.  For
existing routers the value can be set in the ROUTER_MAX_CONNECTIONS
environment variable. If ROUTER_MAX_CONNECTIONS is missing, the
deafult (20000) is used.

origin PR 12716

bug 1405440
https://bugzilla.redhat.com/show_bug.cgi?id=1405440

Comment 41 errata-xmlrpc 2017-04-12 19:07:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884