Bug 1030458 - node-web-proxy consumes lots of CPU when open files limit is reached
node-web-proxy consumes lots of CPU when open files limit is reached
Status: CLOSED ERRATA
Product: Red Hat Software Collections
Classification: Red Hat
Component: nodejs (Show other bugs)
nodejs010
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: Tomas Hrcka
Miroslav Hradílek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-14 08:49 EST by Andy Grimm
Modified: 2016-11-07 22:47 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-04 03:15:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andy Grimm 2013-11-14 08:49:12 EST
Description of problem:

Yesterday, Tim Kramer discovered that the CPU usage of node-web-proxy on an OpenShift Online node was much higher than normal.  Using strace, we found that the process was making thousands of accept() calls per second, all of which were failing because the process was out of file descriptors.

I was able to work around the problem by adjusting the process's limit:

echo -n "Max open files=4096:4096" > /proc/88047/limits

at which point it quickly handled existing connections at settled down to only having about 15 open files.

Version-Release number of selected component (if applicable):

openshift-origin-node-proxy-1.16.1-1.el6oso.noarch

How reproducible:

We have not yet attempted to reproduce this.  I suspect the if you set the ulimit to an artificially low number like 32, you could probably reproduce with a fairly small number of concurrent connections through the proxy.  

Actual result:

It seems that the system was in a state where it was not servicing existing  connections at all, yet still trying to accept new ones.

Expected result:

I would expect that when the process runs out of file descriptors, it should still be able service existing connections (or error out and close them) and simply reject incoming connections until enough file descriptors are closed to handle new connections.
Comment 1 Andy Grimm 2013-11-15 14:22:40 EST
Possibly related to this, I found that node-web-proxy is not closing down some connections where the client has disconnected.  I have several nodes with more than 100 sockets in CLOSE_WAIT state, and they don't ever appear to go away.  One node has 605 such connections.
Comment 2 Mike McGrath 2013-12-06 13:08:41 EST
This is causing outages about every other week or so in Online (at least for cloud9, possibly others)
Comment 3 Andy Grimm 2013-12-18 15:47:42 EST
So, I found this:

https://github.com/einaros/ws/issues/180

which seems related to the file descriptor leak.

and this:

https://github.com/joyent/node/issues/5504

which seems related to the high CPU utilization (which we now see at times independent of hitting the fd limit)
Comment 4 Andy Grimm 2013-12-18 16:04:04 EST
Moving this to software collections.  We're actually seeing suspiciously similar behavior in both the OpenShift code which uses node.js and in our users' node.js based apps.  All are currently using nodejs010-nodejs-0.10.5-6.el6.x86_64 .
Comment 11 errata-xmlrpc 2014-06-04 03:15:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0620.html

Note You need to log in before you can comment on or make changes to this bug.