Bug 1021601 - [performance]Fail to create scalable app when more and more app are deployed onto node and the failure will leave a zombie process
[performance]Fail to create scalable app when more and more app are deployed ...
Status: CLOSED EOL
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
2.2.0
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Brenton Leanhardt
libra bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-21 11:39 EDT by Johnny Liu
Modified: 2017-01-13 17:37 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-13 17:37:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Johnny Liu 2013-10-21 11:39:26 EDT
Description of problem:
Create scalable app one by one, when creating the 1962th app, failure is seen, and a zombie process will be generated every time the failure happened.

[root@node ~]# ps -ef|grep lsof
root      43987  83136 13 03:44 ?        00:01:44 [lsof] <defunct>
root      49196  83136 29 03:51 ?        00:01:48 [lsof] <defunct>
root      53657  80400  0 03:57 pts/2    00:00:00 grep lsof
root      84013  83136  0 Sep24 ?        00:01:48 [lsof] <defunct>
root      88537  83136  0 Sep24 ?        00:01:43 [lsof] <defunct>
root      93748  83136  0 Sep24 ?        00:01:51 [lsof] <defunct>
root      98268  83136  0 Sep24 ?        00:01:56 [lsof] <defunct>


Version-Release number of selected component (if applicable):
1.2/2013-08-23.3

How reproducible:
Always

Steps to Reproduce:
1.Create scalable app one by one
2.
3.

Actual results:
Failed to create the 1962th app.


The following error log is seen in mcollective log:
E, [2013-09-25T03:55:19.766496 #83136] ERROR -- : openshift.rb:171:in `rescue in with_container_from_args' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:161:in `select'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:161:in `block in read_results'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:159:in `read_results'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:124:in `block (2 levels) in oo_spawn'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:93:in `pipe'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:93:in `block in oo_spawn'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:92:in `pipe'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:92:in `oo_spawn'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:763:in `addresses_bound?'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:703:in `create_private_endpoints'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:251:in `block in configure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/cgroups.rb:70:in `with_no_cpu_limits'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:237:in `configure'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/application_container.rb:97:in `configure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:586:in `block in oo_configure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:166:in `with_container_from_args'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:585:in `oo_configure'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:93:in `execute_action'
/opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:65:in `cartridge_do_action'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch'
I, [2013-09-25T03:55:19.766735 #83136]  INFO -- : openshift.rb:100:in `execute_action' Finished executing action [configure] (-1)
I, [2013-09-25T03:55:19.766917 #83136]  INFO -- : openshift.rb:73:in `cartridge_do_action' cartridge_do_action failed (-1)
------
execution expired
------)

After a little debug from code, found that:
when the number of processes in node is getting more and more, lsof will take a long time to verify the listening port is available, that will lead a timeout, as a result, this will generate a zombie process.

Expected results:
No error is seen, and no zombie process is generated.

Additional info:
Comment 2 Andy Grimm 2013-12-18 11:31:37 EST
I suspect this bug is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1018009
Comment 3 Andy Grimm 2013-12-18 11:32:42 EST
sorry, I should have said the reason for the _zombie_ processes is the same as the bug mentioned in the previous comment.  The overall bug here is a separate issue.
Comment 4 Johnny Liu 2013-12-18 21:09:37 EST
(In reply to Andy Grimm from comment #3)
> sorry, I should have said the reason for the _zombie_ processes is the same
> as the bug mentioned in the previous comment.  The overall bug here is a
> separate issue.

Yeah, agree. This bug is mainly used to track ose-1.2 issue.

QE already open BZ#1044432 for tracking ose-2.0 issue. I guess you already noticed that bug.
Comment 5 Rory Thrasher 2017-01-13 17:37:24 EST
OpenShift Enterprise v2 has officially reached EoL.  This product is no longer supported and bugs will be closed.

Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3.  https://www.openshift.com/container-platform/

More information can be found here: https://access.redhat.com/support/policy/updates/openshift/

Note You need to log in before you can comment on or make changes to this bug.