Bug 1010464

Summary: Allow users to disable node.js supervisor or prune file watches
Product: OpenShift Online Reporter: Andy Grimm <agrimm>
Component: ImageAssignee: Paul Morie <pmorie>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: agrimm, dmcphers, jgoulding, jhonce, nduong, ricky.hussmann, trevor, xtian, yadu
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-30 00:48:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Grimm 2013-09-20 20:12:36 UTC
Description of problem:

node.js supervisor provides several functions.  One of these functions is that it watches files for changes.  The nodejs is configured to watch a very broad set of files (presumably to avoid the need for restarts).  There seems to be a performance issue with this, though, and it's also conceivable that a gear will exceed fs.inotify.max_user_watches, which means it won't perform its job as expected.

I would suggest that we bring back the user's ability to simply not use supervisor, but barring that, we should at least give them a way to control which files are watched.

See also:
http://stackoverflow.com/questions/5394620/node-js-how-does-fs-watchfile-work
https://bugzilla.redhat.com/show_bug.cgi?id=1010461

Version-Release number of selected component (if applicable):

openshift-origin-cartridge-nodejs-1.15.7-1.el6oso.noarch

Comment 1 Jhon Honce 2013-09-20 20:23:22 UTC
If this is too much work for bug, please transfer to a Trello card

Comment 2 Jhon Honce 2013-09-24 23:18:44 UTC
https://github.com/openshift/origin-server/pull/3700 allows tuning of the pool interval is that sufficient?

Comment 3 Andy Grimm 2013-09-25 14:46:02 UTC
I will leave it to users impacted by these two bugs to say whether resolving 1010461 reduces the severity of this one, but I think this issue still needs to be addressed separately.

Comment 6 Paul Morie 2013-11-18 22:45:35 UTC
Fork AMI created

Comment 7 Yan Du 2013-11-19 10:55:59 UTC
Test on fork_ami_nodejs_supervisor_939, met below error when git push nodejs-0.10 app. It is ok with nodejs-0.6 app.

root@openshift-ubuntu:~/test/n10a# git add .; git commit -ama3; git push
[master b377c03] a3
Counting objects: 3, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 244 bytes, done.
Total 2 (delta 1), reused 0 (delta 0)
remote: Stopping NodeJS cartridge
remote: Warning: Application 'n10a' unable to stop. Use force-stop to kill.
remote: An error occurred executing 'gear prereceive' (exit code: 141)
remote: Error message: Failed to execute: 'control stop' for /var/lib/openshift/528b4258ea6bd992e4000045/nodejs
remote: 
remote: For more details about the problem, try running the command again with the '--trace' option.
To ssh://528b4258ea6bd992e4000045.rhcloud.com/~/git/n10a.git/
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'ssh://528b4258ea6bd992e4000045.rhcloud.com/~/git/n10a.git/'


BTW, Could you help to give some steps about how to verify this bug? 
Thanks:)

Comment 8 Paul Morie 2013-11-19 19:33:41 UTC
I've made a small code change and started a new fork ami job for this issue.  I wasn't able to reproduce the issue w/ nodejs-0.10.  As for how to test this issue, basically what this change amounts to is that until the hot_deploy marker is added, nodejs will run without supervisor.  When the marker is added, the node process is stopped and supervisor started.  Subsequent pushes will not require a restart until the marker is removed.  When the marker is removed, the supervisor process is stopped and nodejs restarted without supervisor.

Comment 9 Yan Du 2013-11-20 08:12:48 UTC
Hi, Paul Morie

"When the marker is added, the node process is stopped and supervisor started."

Does this mean after hot_deploy marker added, the node server.js process will stop, and just the supervisor works in app?

But when I test the issue on fork_ami_nodejs_supervisor_941, 
seems after added hot_deploy marker and push, check the app process, both supervisor and node server.js are running.

[n6-dyy.dev.rhcloud.com 528c532b663e9eea80000001]\> ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
1001     29467 29453  0 01:52 ?        00:00:00 sshd: 528c532b663e9eea80000001@pts/0
1001     29474 29467  0 01:52 pts/0    00:00:00 /bin/bash --init-file /usr/bin/rhcsh -i
1001     30322     1  0 01:53 ?        00:00:00 node /usr/bin/supervisor -e node|js|coffee -p 1000 -- server.js
1001     31144 30322  0 01:55 ?        00:00:00 node server.js
1001     31581 29474  0 01:55 pts/0    00:00:00 ps -ef

Comment 10 Paul Morie 2013-11-21 02:29:32 UTC
Very good question.  I noticed the same thing, but if you check the pid of the node process before you add the hot_deploy marker and after, you will notice the pid changes.  Supervisor and node run as separate processes so to have them both running is expected.

Comment 11 Yan Du 2013-11-21 09:09:36 UTC
thanks Paul Morie

Test on fork_ami_nodejs_supervisor_941

1. After nodejs-0.6/1.0 app created, nodejs will run without supervisor.
2. Add hot_deploy marker, supervisor started, App won't restart during deploying. Although both the supervisor and nodejs running when check the process in app, but nodejs PID changed after every deploying finished.
3. Remove hot_deploy marker, there is no supervisor running and app can restart normally.

Move bug to verified.

Comment 12 openshift-github-bot 2013-11-21 21:32:22 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/83c292ffbb4d225523b0181b4ccec2cf03f27af2
Fix bug 1010464: only use supervisor is hot_deploy is enabled