Description of problem: Create scalable php app one by one on the ose-2.0 env, when creating the 1756th app, failure is seen. Version-Release number of selected component (if applicable): 2.0/2013-11-26.1 How reproducible: always Steps to Reproduce: 1.Set up an ose-2.0 env, create scalable php app one by one. The following failure log could be seen in mcollective log: I, [2013-12-17T17:16:47.330960 #4006] INFO -- : openshift.rb:150:in `execute_action' Finished executing action [configure] (157) I, [2013-12-17T17:16:47.388136 #4006] INFO -- : openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (157) ------ Shell command '/usr/sbin/lsof -i @127.5.97.129:8080' exceeded timeout of 233 ------) Found the timeout defined in openshift-origin-node/utils/shell_exec.rb is 3600, but in fact, this timeout error was throw out in 6 mintues after sending the creating app action. Actual results: Expected results: No such timeout error Additional info: This failure information is not the same with when doing such test on ose-1.2 as Bug 1021601 said, and the zombie process issue metioned in Bug 1021601 didn't exsit on ose-2.0.
I think the reason you would not see a zombie process in ose-2.0 is that https://bugzilla.redhat.com/show_bug.cgi?id=1018009 was fixed.
Encounter the same issue when testing jbosseap-6 cartridge. Create scalable jbosseap app one by one, when creating the 44th app, always failed with the "Shell command timeout" issue, [root@broker scalability]# rhc app create app44 jbosseap-6 -p redhat -l user44 -s --no-git Application Options ------------------- Domain: name44 Cartridges: jbosseap-6 Gear Size: default Scaling: yes Creating application 'app44' ... Unable to complete the requested operation due to: An invalid exit code (157) was returned from the server node.scalability.com. This indicates an unexpected problem during the execution of your request.. Reference ID: a11d3adb4d3d5de697550fbc08c2e008 Error log in mcollective.log: I, [2013-12-20T15:22:50.416082 #24184] INFO -- : openshift.rb:150:in `execute_action' Finished executing action [post-configure] (157) I, [2013-12-20T15:22:50.432202 #24184] INFO -- : openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (157) ------ Shell command '/sbin/runuser -s /bin/sh 52b3ef4978e213a7cb006b18 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c1,c21' /bin/sh -c \"set -e; /var/lib/openshift/52b3ef4978e213a7cb006b18/jbosseap/bin/control start \""' exceeded timeout of 234 ------)
It seems nodejs-0.10 also has similar issue. 2.0/2013-11-26.1 puddle Create scalable nodejs app one by one, when there're already 500 more scalable nodejs application deployed on the node, timeout issue would appear frequently. Something interesting is the timeout issue on nodejs would occur on many occasions during the creation. Here's an error collection from the mcollective logs: [root@node log]#grep "exceeded timeout" ruby193-mcollective.log* ... ruby193-mcollective.log.1:Shell command '/sbin/runuser -s /bin/sh 52cc7f3d78e2138f30003e2c -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c694' /bin/sh -c \"/var/lib/openshift/52cc7f3d78e2138f30003e2c/nodejs/bin/setup --version 0.10\""' exceeded timeout of 8 ruby193-mcollective.log.1:Shell command '/sbin/runuser -s /bin/sh 52cc7fa678e213f5630177a3 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c695' /bin/sh -c \"/var/lib/openshift/52cc7fa678e213f5630177a3/nodejs/bin/install --version 0.10\""' exceeded timeout of 2 ruby193-mcollective.log.1:Shell command '/usr/sbin/lsof -i @127.5.88.130:8080 -i @127.5.88.131:8080' exceeded timeout of 233 ruby193-mcollective.log.2:Shell command '/sbin/runuser -s /bin/sh 52cc521e78e213f5630175e6 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c689' /bin/sh -c \"/var/lib/openshift/52cc521e78e213f5630175e6/haproxy/bin/setup --version 1.4\""' exceeded timeout of 9 ruby193-mcollective.log.2:Shell command '/sbin/runuser -s /bin/sh 52cc548978e2138f30003c0d -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c690' /bin/sh -c \"/var/lib/openshift/52cc548978e2138f30003c0d/nodejs/bin/install --version 0.10\""' exceeded timeout of 5 ruby193-mcollective.log.3:I, [2014-01-08T00:30:46.296278 #8027] INFO -- : openshift.rb:335:in `rescue in oo_app_create' Shell command '/sbin/runuser -s /bin/sh 52cc2ab578e213f563017430 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c682' /bin/sh -c \"/usr/bin/ssh-keygen -N '' -f /var/lib/openshift/52cc2ab578e213f563017430/.openshift_ssh/id_rsa\""' exceeded timeout of 71 ruby193-mcollective.log.3:Shell command '/sbin/runuser -s /bin/sh 52cc2c6778e213f56301744b -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c682' /bin/sh -c \"/var/lib/openshift/52cc2c6778e213f56301744b/nodejs/bin/setup --version 0.10\""' exceeded timeout of 12 ...
OpenShift Enterprise v2 has officially reached EoL. This product is no longer supported and bugs will be closed. Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3. https://www.openshift.com/container-platform/ More information can be found here: https://access.redhat.com/support/policy/updates/openshift/