Bug 1077353 - multiple nodejs processes running in a gear
Summary: multiple nodejs processes running in a gear
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ben Parees
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1116817
TreeView+ depends on / blocked
 
Reported: 2014-03-17 19:27 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1116817 (view as bug list)
Environment:
Last Closed: 2014-10-10 00:46:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andy Grimm 2014-03-17 19:27:24 UTC
Description of problem:

I saw three cases today where a gear had multiple nodejs supervisor processes running.  The result was that the second instance's child process kept dying, since they could not bind to port 8080.  They kept retrying, consuming the gear's entire CPU quota.

Version-Release number of selected component (if applicable):

openshift-origin-cartridge-nodejs-1.22.4-1.el6oso.noarch

Comment 1 Michal Fojtik 2014-03-17 20:25:24 UTC
Andy: Do you have more details? Does the apps use hot_deploy?

Comment 2 Michal Fojtik 2014-04-11 10:40:26 UTC
Andy, ping? ;-)

Comment 4 Andy Grimm 2014-04-16 19:40:59 UTC
It looks like two of the apps where I'm currently seeing this got unidled twice concurrently.  It's not clear what happened with the third; it was started at 19:40:44 and restarted at 19:41:49.  Maybe the first set of processes didn't die?  

The upcoming fix for BZ 1061926 may fix at least two of these three occurrences.

Comment 5 Ben Parees 2014-06-27 20:29:46 UTC
It looks like this could happen if someone removed the pid file and then issued a restart (the cart logic will just start another instance if the pidfile is not found).

A number of our carts share this logic, but nodejs may be the only one that auto-restarts due to the bind failure.

I will look into making the "is started" checking more robust.

Comment 6 Ben Parees 2014-07-01 14:48:35 UTC
Adding logic to recreate the pid file if it does not exist, prior to checking if the process is started.

https://github.com/openshift/origin-server/pull/5562

Comment 7 Wenjing Zheng 2014-07-02 06:53:04 UTC
Verified on devenv_4932, there is no multiple nodejs process as below:

1. Create a nodejs-0.10 app
2. SSH into gear, delete the cartridge.pid file under $OPENSHIFT_NODEJS_PID_DIR and check the process:
[n10-d.dev.rhcloud.com 53b3df4040b38ce446000001]\> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
1000      7138     1  0 06:30 ?        00:00:00 node /opt/rh/nodejs010/root/usr
1000      7139     1  0 06:30 ?        00:00:00 /usr/bin/logshifter -tag nodejs
1000      7158  7138  0 06:30 ?        00:00:00 node server.js
1000      9028  9015  0 06:34 ?        00:00:00 sshd: 53b3df4040b38ce446000001@
1000      9029  9028  1 06:34 pts/2    00:00:00 /bin/bash --init-file /usr/bin/
1000      9252  9029  0 06:34 pts/2    00:00:00 ps -ef
3. restart gear and re-check the process
[n10-d.dev.rhcloud.com 53b3df4040b38ce446000001]\> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
1000     11887     1  0 06:39 ?        00:00:00 node /opt/rh/nodejs010/root/usr/bin/supervisor
1000     11888     1  0 06:39 ?        00:00:00 /usr/bin/logshifter -tag nodejs
1000     11914 11887  0 06:39 ?        00:00:00 node server.js
1000     12017 12004  0 06:39 ?        00:00:00 sshd: 53b3df4040b38ce446000001@pts/2
1000     12018 12017  3 06:39 pts/2    00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i
1000     12230 12018  0 06:39 pts/2    00:00:00 ps -ef


Note You need to log in before you can comment on or make changes to this bug.