Description of problem: I saw three cases today where a gear had multiple nodejs supervisor processes running. The result was that the second instance's child process kept dying, since they could not bind to port 8080. They kept retrying, consuming the gear's entire CPU quota. Version-Release number of selected component (if applicable): openshift-origin-cartridge-nodejs-1.22.4-1.el6oso.noarch
Andy: Do you have more details? Does the apps use hot_deploy?
Andy, ping? ;-)
It looks like two of the apps where I'm currently seeing this got unidled twice concurrently. It's not clear what happened with the third; it was started at 19:40:44 and restarted at 19:41:49. Maybe the first set of processes didn't die? The upcoming fix for BZ 1061926 may fix at least two of these three occurrences.
It looks like this could happen if someone removed the pid file and then issued a restart (the cart logic will just start another instance if the pidfile is not found). A number of our carts share this logic, but nodejs may be the only one that auto-restarts due to the bind failure. I will look into making the "is started" checking more robust.
Adding logic to recreate the pid file if it does not exist, prior to checking if the process is started. https://github.com/openshift/origin-server/pull/5562
Verified on devenv_4932, there is no multiple nodejs process as below: 1. Create a nodejs-0.10 app 2. SSH into gear, delete the cartridge.pid file under $OPENSHIFT_NODEJS_PID_DIR and check the process: [n10-d.dev.rhcloud.com 53b3df4040b38ce446000001]\> ps -ef UID PID PPID C STIME TTY TIME CMD 1000 7138 1 0 06:30 ? 00:00:00 node /opt/rh/nodejs010/root/usr 1000 7139 1 0 06:30 ? 00:00:00 /usr/bin/logshifter -tag nodejs 1000 7158 7138 0 06:30 ? 00:00:00 node server.js 1000 9028 9015 0 06:34 ? 00:00:00 sshd: 53b3df4040b38ce446000001@ 1000 9029 9028 1 06:34 pts/2 00:00:00 /bin/bash --init-file /usr/bin/ 1000 9252 9029 0 06:34 pts/2 00:00:00 ps -ef 3. restart gear and re-check the process [n10-d.dev.rhcloud.com 53b3df4040b38ce446000001]\> ps -ef UID PID PPID C STIME TTY TIME CMD 1000 11887 1 0 06:39 ? 00:00:00 node /opt/rh/nodejs010/root/usr/bin/supervisor 1000 11888 1 0 06:39 ? 00:00:00 /usr/bin/logshifter -tag nodejs 1000 11914 11887 0 06:39 ? 00:00:00 node server.js 1000 12017 12004 0 06:39 ? 00:00:00 sshd: 53b3df4040b38ce446000001@pts/2 1000 12018 12017 3 06:39 pts/2 00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i 1000 12230 12018 0 06:39 pts/2 00:00:00 ps -ef