Red Hat Bugzilla – Bug 1030458
node-web-proxy consumes lots of CPU when open files limit is reached
Last modified: 2016-11-07 22:47:26 EST
Description of problem:
Yesterday, Tim Kramer discovered that the CPU usage of node-web-proxy on an OpenShift Online node was much higher than normal. Using strace, we found that the process was making thousands of accept() calls per second, all of which were failing because the process was out of file descriptors.
I was able to work around the problem by adjusting the process's limit:
echo -n "Max open files=4096:4096" > /proc/88047/limits
at which point it quickly handled existing connections at settled down to only having about 15 open files.
Version-Release number of selected component (if applicable):
We have not yet attempted to reproduce this. I suspect the if you set the ulimit to an artificially low number like 32, you could probably reproduce with a fairly small number of concurrent connections through the proxy.
It seems that the system was in a state where it was not servicing existing connections at all, yet still trying to accept new ones.
I would expect that when the process runs out of file descriptors, it should still be able service existing connections (or error out and close them) and simply reject incoming connections until enough file descriptors are closed to handle new connections.
Possibly related to this, I found that node-web-proxy is not closing down some connections where the client has disconnected. I have several nodes with more than 100 sockets in CLOSE_WAIT state, and they don't ever appear to go away. One node has 605 such connections.
This is causing outages about every other week or so in Online (at least for cloud9, possibly others)
So, I found this:
which seems related to the file descriptor leak.
which seems related to the high CPU utilization (which we now see at times independent of hitting the fd limit)
Moving this to software collections. We're actually seeing suspiciously similar behavior in both the OpenShift code which uses node.js and in our users' node.js based apps. All are currently using nodejs010-nodejs-0.10.5-6.el6.x86_64 .
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.