Hide Forgot
Description of problem: Create scalable app one by one, when creating the 1962th app, failure is seen, and a zombie process will be generated every time the failure happened. [root@node ~]# ps -ef|grep lsof root 43987 83136 13 03:44 ? 00:01:44 [lsof] <defunct> root 49196 83136 29 03:51 ? 00:01:48 [lsof] <defunct> root 53657 80400 0 03:57 pts/2 00:00:00 grep lsof root 84013 83136 0 Sep24 ? 00:01:48 [lsof] <defunct> root 88537 83136 0 Sep24 ? 00:01:43 [lsof] <defunct> root 93748 83136 0 Sep24 ? 00:01:51 [lsof] <defunct> root 98268 83136 0 Sep24 ? 00:01:56 [lsof] <defunct> Version-Release number of selected component (if applicable): 1.2/2013-08-23.3 How reproducible: Always Steps to Reproduce: 1.Create scalable app one by one 2. 3. Actual results: Failed to create the 1962th app. The following error log is seen in mcollective log: E, [2013-09-25T03:55:19.766496 #83136] ERROR -- : openshift.rb:171:in `rescue in with_container_from_args' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:161:in `select' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:161:in `block in read_results' /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:159:in `read_results' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:124:in `block (2 levels) in oo_spawn' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:93:in `pipe' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:93:in `block in oo_spawn' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:92:in `pipe' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/shell_exec.rb:92:in `oo_spawn' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:763:in `addresses_bound?' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:703:in `create_private_endpoints' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:251:in `block in configure' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/utils/cgroups.rb:70:in `with_no_cpu_limits' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/v2_cart_model.rb:237:in `configure' /opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-node-1.9.14.3/lib/openshift-origin-node/model/application_container.rb:97:in `configure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:586:in `block in oo_configure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:166:in `with_container_from_args' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:585:in `oo_configure' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:93:in `execute_action' /opt/rh/ruby193/root/usr/libexec/mcollective/mcollective/agent/openshift.rb:65:in `cartridge_do_action' /opt/rh/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:86:in `handlemsg' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:126:in `block (2 levels) in dispatch' /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout' /opt/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb:125:in `block in dispatch' I, [2013-09-25T03:55:19.766735 #83136] INFO -- : openshift.rb:100:in `execute_action' Finished executing action [configure] (-1) I, [2013-09-25T03:55:19.766917 #83136] INFO -- : openshift.rb:73:in `cartridge_do_action' cartridge_do_action failed (-1) ------ execution expired ------) After a little debug from code, found that: when the number of processes in node is getting more and more, lsof will take a long time to verify the listening port is available, that will lead a timeout, as a result, this will generate a zombie process. Expected results: No error is seen, and no zombie process is generated. Additional info:
I suspect this bug is the same as https://bugzilla.redhat.com/show_bug.cgi?id=1018009
sorry, I should have said the reason for the _zombie_ processes is the same as the bug mentioned in the previous comment. The overall bug here is a separate issue.
(In reply to Andy Grimm from comment #3) > sorry, I should have said the reason for the _zombie_ processes is the same > as the bug mentioned in the previous comment. The overall bug here is a > separate issue. Yeah, agree. This bug is mainly used to track ose-1.2 issue. QE already open BZ#1044432 for tracking ose-2.0 issue. I guess you already noticed that bug.
OpenShift Enterprise v2 has officially reached EoL. This product is no longer supported and bugs will be closed. Please look into the replacement enterprise-grade container option, OpenShift Container Platform v3. https://www.openshift.com/container-platform/ More information can be found here: https://access.redhat.com/support/policy/updates/openshift/