1085115 – Rewrite with Node Proxy you to run out of file descriptors waiting for sockets to time out (TIME_WAIT)

Bug 1085115 - Rewrite with Node Proxy you to run out of file descriptors waiting for sockets to time out (TIME_WAIT)

Summary: Rewrite with Node Proxy you to run out of file descriptors waiting for socket...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	2.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Brenton Leanhardt
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:	1084617
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-07 21:24 UTC by Eric Rich
Modified:	2019-06-13 07:58 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:	Build Name: 22607, Administration Guide-2-1.0 Build Date: 26-02-2014 11:20:16 Topic ID: 22883-575248 [Specified]
Last Closed:	2016-04-15 19:30:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Use proxypass for vhost frontend proxy (794 bytes, patch) 2014-04-08 15:23 UTC, Brenton Leanhardt	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1110035	unspecified	CLOSED	apps return 503 errors during and after scaling	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Article)	1203843	None	None	None	Never
Red Hat Knowledge Base (Solution)	24154	None	None	None	Never

Internal Links: 1084617 1110035

Description Eric Rich 2014-04-07 21:24:03 UTC

Title: Changing Front-End HTTP Server Plug-in Configuration

Describe the issue:
When using OpenShift you run out of sockets (file descriptors) under high load because mod_rewrite (the default proxy engine does not do connection pooling) 

Suggestions for improvement:

Have vhost use mod_proxy or provide a 3rd proxy option. 

Additional information:

switching to mod_proxy through vhost is the correct solution dose not work as this does not do use mod_proxy but uses mod_rewrite.

Comment 3 Brenton Leanhardt 2014-04-08 15:19:30 UTC

Hi Eric,

Take a look at https://github.com/openshift/origin-server/pull/4850 to see the discussion on how a rewriterule in the vhost plugin was moved to proxypass.  We still don't have keepalives turned on yet but I will supply a patch that will work against the latest 2.0 deployment so they can try out the latest upstream code.

Comment 4 Brenton Leanhardt 2014-04-08 15:23:35 UTC

Created attachment 884117 [details]
Use proxypass for vhost frontend proxy

This patch requires the vhost frontend plugin.  First follow the steps documented here:

https://access.redhat.com/site/documentation/en-US/OpenShift_Enterprise/2/html-single/Deployment_Guide/#Changing_Front-end_HTTP_Server_Plug-in_Configuration

You can ignore the line that says to edit 000000_default.conf.  All that should be required is to restart httpd at that point.  We're in the process of shipping a change to that documentation.

Once the vhost plugin is live (you may want to test first) you can apply this patch on the Node:

cd /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-frontend-apache-vhost-*/
patch -p4 < /tmp/4850.patch
service ruby193-mcollective restart

You can even try turning on keepalives with the following change to the patch:

- f.puts("ProxyPass #{tpath} #{proxy_proto}://#{uri}/")
+ f.puts("ProxyPass #{tpath} #{proxy_proto}://#{uri}/ keepalive=On")

Comment 5 Eric Rich 2014-04-08 17:07:48 UTC

I wonder if this would simplify the patch more and require less work in maintaing the code. 

- f.puts("RewriteRule ^#{path}(/.*)?$ #{proxy_proto}://#{uri}$1 [P,NS]")
+ f.puts("ProxyPassMatch #{path}(/.*)?$ #{proxy_proto}://#{uri}$1)

I'm building this from http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypassmatch however I wonder if it puts us in a situation where we have to be concerned about the 'Security Warning': Take care when constructing the target URL of the rule, considering the security impact from allowing the client influence over the set of URLs to which your server will act as a proxy. Ensure that the scheme and hostname part of the URL is either fixed, or does not allow the client undue influence.

Comment 13 Brenton Leanhardt 2014-05-01 20:49:55 UTC

This is still high priority for us we're just improving the way we track these sorts of bugs.

Comment 19 Luke Meyer 2014-06-17 20:29:37 UTC

I opened https://bugzilla.redhat.com/show_bug.cgi?id=1085115 to investigate this upstream. I don't think it's a simple matter of reducing TIME_WAIT connections as I've seen 503s even with relatively low numbers of those. I've also seen the problem with minimal concurrency (ab -c 2). Perhaps the overarching problem is exhaustion of open sockets, and TIME_WAIT connections are just one component in that count.

Comment 20 Luke Meyer 2014-06-17 20:31:44 UTC

I meant https://bugzilla.redhat.com/show_bug.cgi?id=1110035 of course.

Comment 21 Eric Rich 2014-06-17 21:54:12 UTC

(In reply to Luke Meyer from comment #19)
> I opened https://bugzilla.redhat.com/show_bug.cgi?id=1085115 to investigate
> this upstream. I don't think it's a simple matter of reducing TIME_WAIT
> connections as I've seen 503s even with relatively low numbers of those.
> I've also seen the problem with minimal concurrency (ab -c 2). Perhaps the
> overarching problem is exhaustion of open sockets, and TIME_WAIT connections
> are just one component in that count.

Luke, It was confirmed in case 01059675 that the root cause of this issue was due to port exhaustion set by the following section of the node configuation of the openshift.sh script: 

# Turn some sysctl knobs.
configure_sysctl_on_node()
{
  set_sysctl kernel.sem '250  32000 32  4096' 'Accomodate many httpd instances for OpenShift gears.'

  set_sysctl net.ipv4.ip_local_port_range '15000 35530' 'Move the ephemeral port range to accomodate the OpenShift port proxy.'

  set_sysctl net.netfilter.nf_conntrack_max 1048576 'Increase the connection tracking table size for the OpenShift port proxy.'

  set_sysctl net.ipv4.ip_forward 1 'Enable forwarding for the OpenShift port proxy.'

  set_sysctl net.ipv4.conf.all.route_localnet 1 'Allow the OpenShift port proxy to route using loopback addresses.'
}

>> Mainly this line: set_sysctl net.ipv4.ip_local_port_range '15000 35530' 'Move the ephemeral port range to accomodate the OpenShift port proxy.'

When you exhaust your ports you run into this issue. 

Keep in mind that the EAP cartridge behave differently than other cartridges making it so that you see this more quickly (with gear that have less that 2GB of RAM). see https://bugzilla.redhat.com/show_bug.cgi?id=1085115#c14 for more information on this. 

However the issue is the same at around 15k open ports (most in TW) you run out of ports. So your option with the rewrite proxy are to enable tcp_tw_reuse or to switch to the vhost plugin, and allow for mod_proxy to enable keepalive for tcp connections (to which your backend application can take advantage of, provided it has the resources[this refers to my comments about the EAP cartridge]).

Comment 22 Luke Meyer 2014-06-18 17:03:50 UTC

I'm really not convinced it's just TIME_WAIT that's a problem. The problem might go away in particular cases if you enable TIME_WAIT connection reuse, but I'm not sure that's wise to do globally, and I'm not sure it's the only factor in play. Discussing that upstream.

Keepalive from the frontend to gears might seem like a good idea, but I'm concerned it may come with a different set of problems.

First, you need to enable the right conditions for it to even happen. Meaning, use the vhost frontend (which as of 2.1 does ProxyPass instead of rewrite), the worker MPM, and a cartridge that doesn't disable keepalive (as you've noted, EAP will under "low" memory conditions in the gear).

Then you need connection pooling to work (which is why we need worker MPM; the size of the pool is limited to the number of threads in a process). Without connection pooling, you'd really only have keepalives to one backend; a continuous stream of requests to two apps would mean you're back to closing the old connection and reopening a new one all the time.

With connection pooling, you have all the synchronization hassles that come with multithreading, mutexes, and resource contention. This would have to be a notable performance hit at the frontend, and you'd have to set your ProxyPass options very carefully to keep from timing out waiting on a connection to be available, trying to reuse a dead connection, etc. (When I worked httpd support, our engineer's first recommendation on any proxy problem tended to be to disable proxy keepalives. It just got rid of so many problems. May not be as relevant in the controlled node environment.)

Finally, connection pooling isn't really built to handle a lot of backends. If you only have one or two active apps, sure, you'll be able to reuse most connections. But if you have a connection pool of 25 (standard ThreadsPerChild under worker) and 50 backend apps getting continuous traffic, you won't be able to reuse pooled connections much at all, you'll be back to lots of TIME_WAITs *and* you'll still have the pool synchronization hit (so, all downside, no upside). That might seem like an extreme scenario but high density is part of what customers are hoping to achieve. You could tune around it some by increasing ThreadsPerChild appropriately, but that's lower-level than most customers probably want to delve.

I'm just saying, it's not a panacea. We may have some tough trade-offs to make here.

Comment 23 Randall Theobald 2014-06-18 17:16:56 UTC

At least make a valiant effort at giving the customer decent documentation on the possible options and a discussion of the pros and cons of each approach. The TIME_WAIT issue (running out of ephemeral ports) is a major headache for any customer trying to do any basic load testing. On my initial attempt, anything above 150 requests per second would lead to 503s (when there was no connection pooling or tw_reuse).

I had to ramp up my httpd.conf worker settings anyway in order to be able to handle any sizable load (more than a few dozen concurrent requests). And don't discount the amount of processing power saved by reusing connections. For the best case of only serving one backend app, the savings can be fairly sizable (for me it was about 30% total CPU reduction for a light JSP workload at 500 requests per second). Obviously the benefits will diminish as the number of worker threads per app is exhausted.

Comment 24 Eric Rich 2014-06-24 16:58:42 UTC

(In reply to Randall Theobald from comment #23)
> At least make a valiant effort at giving the customer decent documentation
> on the possible options and a discussion of the pros and cons of each
> approach. The TIME_WAIT issue (running out of ephemeral ports) is a major
> headache for any customer trying to do any basic load testing. On my initial
> attempt, anything above 150 requests per second would lead to 503s (when
> there was no connection pooling or tw_reuse).
> 
> I had to ramp up my httpd.conf worker settings anyway in order to be able to
> handle any sizable load (more than a few dozen concurrent requests). And
> don't discount the amount of processing power saved by reusing connections.
> For the best case of only serving one backend app, the savings can be fairly
> sizable (for me it was about 30% total CPU reduction for a light JSP
> workload at 500 requests per second). Obviously the benefits will diminish
> as the number of worker threads per app is exhausted.

The https://access.redhat.com/labs/lbconfig/ lab's tool can help give advice (dynamically) on the setting for httpd load balancing. 

Note the tool was designed for flat (or typical) workload (IE: not openshift), however you assume 1 httpd server, X cores, Worker, mod_proxy, 1 JBoss server, 1 core, 1 JVM, and all of the boolean options you get something close to the URL: 

https://access.redhat.com/labs/lbconfig/#/?apache_cores=4&apache_instances_per_server=1&apache_servers=1&jboss_cores=1&jboss_jvms=1&jboss_servers=1&long_running=true&mpmtype=Worker&modtype=mod_proxy&jboss_version=6&firewall=true&same_server=true&long_running_num=10

^^ Note this URL uses 4 Core's for httpd. 

Which should give you the optimal configuration. Simply replace apache_cores=4 with the number core's your server has on the node and this should answer the question of what is the optimal configuration.

Comment 26 Samuel Mendenhall 2014-06-24 17:25:55 UTC

Adding Ian Hands as the need info as he is the primary maintainer now of lbconfig.

Ian, I believe they want re-write to be added as one of the configurations.

Comment 32 Miciah Dashiel Butler Masters 2015-09-25 16:36:44 UTC

We shipped tuned-profiles-openshift-node, which sets tw_reuse=1, in RHBA-2015:0220 "Red Hat OpenShift Enterprise 2.2.4 bug fix and enhancement update".  (The related Bugzilla report is bug 889539.)

From the discussion in bug 1110035 comment 9 and bug 1110035 comment 10, it seems there are two remaining issues, one being "a few milliseconds of outage at most per scaling event" and the other being the corner case of scaling from 4 gears to 1 in a regular scalable application or from 5 gears to 2 in an HA application, neither of which is relevant when using auto-scaling (which scales up or down 1 gear at a time).

I have two questions:

• Is shipping tuned-profiles-openshift-node, which sets tw_reuse=1, a satisfactory resolution to the problem that tw_reuse=1 addresses? (Currently the installer does not install tuned-profiles-openshift-node.)

• If we have solve the problem that tw_reuse=1 address satisfactorily, can we close this Bugzilla report, or are we considering this Bugzilla report to encompass the issues mentioned in my second paragraph as well?

Note You need to log in before you can comment on or make changes to this bug.