Bug 1030458

Summary: node-web-proxy consumes lots of CPU when open files limit is reached
Product: Red Hat Software Collections Reporter: Andy Grimm <agrimm>
Component: nodejsAssignee: Tomas Hrcka <thrcka>
Status: CLOSED ERRATA QA Contact: Miroslav HradĂ­lek <mhradile>
Severity: medium Docs Contact:
Priority: high    
Version: nodejs010CC: jgoulding, mhradile, mmcgrath
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-04 07:15:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Andy Grimm 2013-11-14 13:49:12 UTC
Description of problem:

Yesterday, Tim Kramer discovered that the CPU usage of node-web-proxy on an OpenShift Online node was much higher than normal.  Using strace, we found that the process was making thousands of accept() calls per second, all of which were failing because the process was out of file descriptors.

I was able to work around the problem by adjusting the process's limit:

echo -n "Max open files=4096:4096" > /proc/88047/limits

at which point it quickly handled existing connections at settled down to only having about 15 open files.

Version-Release number of selected component (if applicable):

openshift-origin-node-proxy-1.16.1-1.el6oso.noarch

How reproducible:

We have not yet attempted to reproduce this.  I suspect the if you set the ulimit to an artificially low number like 32, you could probably reproduce with a fairly small number of concurrent connections through the proxy.  

Actual result:

It seems that the system was in a state where it was not servicing existing  connections at all, yet still trying to accept new ones.

Expected result:

I would expect that when the process runs out of file descriptors, it should still be able service existing connections (or error out and close them) and simply reject incoming connections until enough file descriptors are closed to handle new connections.

Comment 1 Andy Grimm 2013-11-15 19:22:40 UTC
Possibly related to this, I found that node-web-proxy is not closing down some connections where the client has disconnected.  I have several nodes with more than 100 sockets in CLOSE_WAIT state, and they don't ever appear to go away.  One node has 605 such connections.

Comment 2 Mike McGrath 2013-12-06 18:08:41 UTC
This is causing outages about every other week or so in Online (at least for cloud9, possibly others)

Comment 3 Andy Grimm 2013-12-18 20:47:42 UTC
So, I found this:

https://github.com/einaros/ws/issues/180

which seems related to the file descriptor leak.

and this:

https://github.com/joyent/node/issues/5504

which seems related to the high CPU utilization (which we now see at times independent of hitting the fd limit)

Comment 4 Andy Grimm 2013-12-18 21:04:04 UTC
Moving this to software collections.  We're actually seeing suspiciously similar behavior in both the OpenShift code which uses node.js and in our users' node.js based apps.  All are currently using nodejs010-nodejs-0.10.5-6.el6.x86_64 .

Comment 11 errata-xmlrpc 2014-06-04 07:15:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0620.html