Bug 1044432

Summary: [performance]Failed to create scalable php app when many apps were deployed on the node
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: ContainersAssignee: Brenton Leanhardt <bleanhar>
Status: CLOSED EOL QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 2.2.0CC: anli, gpei, jgoulding, libra-onpremise-devel, rthrashe
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-13 22:38:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gaoyun Pei 2013-12-18 10:18:45 UTC
Description of problem:
Create scalable php app one by one on the ose-2.0 env, when creating the 1756th app, failure is seen.

Version-Release number of selected component (if applicable):
2.0/2013-11-26.1

How reproducible:
always

Steps to Reproduce:
1.Set up an ose-2.0 env, create scalable php app one by one.

The following failure log could be seen in mcollective log:
I, [2013-12-17T17:16:47.330960 #4006]  INFO -- : openshift.rb:150:in `execute_action' Finished executing action [configure] (157)
I, [2013-12-17T17:16:47.388136 #4006]  INFO -- : openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (157)
------
Shell command '/usr/sbin/lsof -i @127.5.97.129:8080' exceeded timeout of 233


------)


Found the timeout defined in openshift-origin-node/utils/shell_exec.rb is 3600, but in fact, this timeout error was throw out in 6 mintues after sending the creating app action.


Actual results:

Expected results:
No such timeout error

Additional info:
This failure information is not the same with when doing such test on ose-1.2 as Bug 1021601 said, and the zombie process issue metioned in Bug 1021601 didn't exsit on ose-2.0.

Comment 2 Andy Grimm 2013-12-18 16:33:55 UTC
I think the reason you would not see a zombie process in ose-2.0 is that https://bugzilla.redhat.com/show_bug.cgi?id=1018009 was fixed.

Comment 3 Gaoyun Pei 2013-12-20 08:56:51 UTC
Encounter the same issue when testing jbosseap-6 cartridge.

Create scalable jbosseap app one by one, when creating the 44th app, always failed with the "Shell command timeout" issue,
[root@broker scalability]# rhc app create app44 jbosseap-6 -p redhat -l user44 -s --no-git
Application Options
-------------------
  Domain:     name44
  Cartridges: jbosseap-6
  Gear Size:  default
  Scaling:    yes

Creating application 'app44' ... 
Unable to complete the requested operation due to: An invalid exit code (157) was returned from the server node.scalability.com.  This indicates
an unexpected problem during the execution of your request..
Reference ID: a11d3adb4d3d5de697550fbc08c2e008


Error log in mcollective.log:
I, [2013-12-20T15:22:50.416082 #24184]  INFO -- : openshift.rb:150:in `execute_action' Finished executing action [post-configure] (157)
I, [2013-12-20T15:22:50.432202 #24184]  INFO -- : openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (157)
------
Shell command '/sbin/runuser -s /bin/sh 52b3ef4978e213a7cb006b18 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c1,c21' /bin/sh -c \"set -e; /var/lib/openshift/52b3ef4978e213a7cb006b18/jbosseap/bin/control start \""' exceeded timeout of 234


------)

Comment 6 Gaoyun Pei 2014-01-09 10:46:57 UTC
It seems nodejs-0.10 also has similar issue.

2.0/2013-11-26.1 puddle

Create scalable nodejs app one by one, when there're already 500 more scalable nodejs application deployed on the node, timeout issue would appear frequently.

Something interesting is the timeout issue on nodejs would occur on many occasions during the creation. Here's an error collection from the mcollective logs:

[root@node log]#grep "exceeded timeout" ruby193-mcollective.log*
...
ruby193-mcollective.log.1:Shell command '/sbin/runuser -s /bin/sh 52cc7f3d78e2138f30003e2c -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c694' /bin/sh -c \"/var/lib/openshift/52cc7f3d78e2138f30003e2c/nodejs/bin/setup --version 0.10\""' exceeded timeout of 8
ruby193-mcollective.log.1:Shell command '/sbin/runuser -s /bin/sh 52cc7fa678e213f5630177a3 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c695' /bin/sh -c \"/var/lib/openshift/52cc7fa678e213f5630177a3/nodejs/bin/install --version 0.10\""' exceeded timeout of 2
ruby193-mcollective.log.1:Shell command '/usr/sbin/lsof -i @127.5.88.130:8080 -i @127.5.88.131:8080' exceeded timeout of 233
ruby193-mcollective.log.2:Shell command '/sbin/runuser -s /bin/sh 52cc521e78e213f5630175e6 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c689' /bin/sh -c \"/var/lib/openshift/52cc521e78e213f5630175e6/haproxy/bin/setup --version 1.4\""' exceeded timeout of 9
ruby193-mcollective.log.2:Shell command '/sbin/runuser -s /bin/sh 52cc548978e2138f30003c0d -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c690' /bin/sh -c \"/var/lib/openshift/52cc548978e2138f30003c0d/nodejs/bin/install --version 0.10\""' exceeded timeout of 5
ruby193-mcollective.log.3:I, [2014-01-08T00:30:46.296278 #8027]  INFO -- : openshift.rb:335:in `rescue in oo_app_create' Shell command '/sbin/runuser -s /bin/sh 52cc2ab578e213f563017430 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c682' /bin/sh -c \"/usr/bin/ssh-keygen -N '' -f /var/lib/openshift/52cc2ab578e213f563017430/.openshift_ssh/id_rsa\""' exceeded timeout of 71
ruby193-mcollective.log.3:Shell command '/sbin/runuser -s /bin/sh 52cc2c6778e213f56301744b -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c682' /bin/sh -c \"/var/lib/openshift/52cc2c6778e213f56301744b/nodejs/bin/setup --version 0.10\""' exceeded timeout of 12
...

Comment 8 Rory Thrasher 2017-01-13 22:38:57 UTC
OpenShift Enterprise v2 has officially reached EoL.  This product is no longer supported and bugs will be closed.

Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3.  https://www.openshift.com/container-platform/

More information can be found here: https://access.redhat.com/support/policy/updates/openshift/