Bug 1030458 - node-web-proxy consumes lots of CPU when open files limit is reached
Summary: node-web-proxy consumes lots of CPU when open files limit is reached
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Software Collections
Classification: Red Hat
Component: nodejs
Version: nodejs010
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Tomas Hrcka
QA Contact: Miroslav Hradílek
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-14 13:49 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-04 07:15:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0620 0 normal SHIPPED_LIVE nodejs010 bug fix and enhancement update 2014-06-04 10:54:02 UTC

Description Andy Grimm 2013-11-14 13:49:12 UTC
Description of problem:

Yesterday, Tim Kramer discovered that the CPU usage of node-web-proxy on an OpenShift Online node was much higher than normal.  Using strace, we found that the process was making thousands of accept() calls per second, all of which were failing because the process was out of file descriptors.

I was able to work around the problem by adjusting the process's limit:

echo -n "Max open files=4096:4096" > /proc/88047/limits

at which point it quickly handled existing connections at settled down to only having about 15 open files.

Version-Release number of selected component (if applicable):

openshift-origin-node-proxy-1.16.1-1.el6oso.noarch

How reproducible:

We have not yet attempted to reproduce this.  I suspect the if you set the ulimit to an artificially low number like 32, you could probably reproduce with a fairly small number of concurrent connections through the proxy.  

Actual result:

It seems that the system was in a state where it was not servicing existing  connections at all, yet still trying to accept new ones.

Expected result:

I would expect that when the process runs out of file descriptors, it should still be able service existing connections (or error out and close them) and simply reject incoming connections until enough file descriptors are closed to handle new connections.

Comment 1 Andy Grimm 2013-11-15 19:22:40 UTC
Possibly related to this, I found that node-web-proxy is not closing down some connections where the client has disconnected.  I have several nodes with more than 100 sockets in CLOSE_WAIT state, and they don't ever appear to go away.  One node has 605 such connections.

Comment 2 Mike McGrath 2013-12-06 18:08:41 UTC
This is causing outages about every other week or so in Online (at least for cloud9, possibly others)

Comment 3 Andy Grimm 2013-12-18 20:47:42 UTC
So, I found this:

https://github.com/einaros/ws/issues/180

which seems related to the file descriptor leak.

and this:

https://github.com/joyent/node/issues/5504

which seems related to the high CPU utilization (which we now see at times independent of hitting the fd limit)

Comment 4 Andy Grimm 2013-12-18 21:04:04 UTC
Moving this to software collections.  We're actually seeing suspiciously similar behavior in both the OpenShift code which uses node.js and in our users' node.js based apps.  All are currently using nodejs010-nodejs-0.10.5-6.el6.x86_64 .

Comment 11 errata-xmlrpc 2014-06-04 07:15:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0620.html


Note You need to log in before you can comment on or make changes to this bug.