Bug 923369
Summary: | restart stopped application ends in REST service returning 500 "internal server error" | ||
---|---|---|---|
Product: | OKD | Reporter: | Andre Dietisheim <adietish> |
Component: | Pod | Assignee: | Bill DeCoste <wdecoste> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | libra bugs <libra-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.x | CC: | dmcphers, jhou, wdecoste |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | https://github.com/openshift/origin-server/pull/1780 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-04-02 13:46:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andre Dietisheim
2013-03-19 16:42:25 UTC
could not reproduce it using 2 devenv's agupta gave me. Build #27 was stable again (blue) and #28 failed again on the very same error ("could not restart stopped application"). I'm wondering: is this timing related? issue is not reproducible in todays DEVENV. Thus closing The issue started to occurr again in the latest build #34: https://ci.dev.openshift.redhat.com/jenkins/job/openshift-java-client-devenv/34/testReport/com.openshift.internal.client/ApplicationResourceIntegrationTest/shouldRestartStoppedApplication/ Any idea why this keeps flickering? Is this timing related? Based on my debugging, the findings are: The failure is caused when the app_ctl_impl.sh script fails to kill the jboss pid as referenced in the log snippet below. ^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x00000005e76580> (pid:3220) ^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_client.custom_request('cartridge_do', {:cartridge=>"jbossas-7", :action=>"stop", :args=>{"--with-app-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-app-name"=>"1363975865282", "--with-container-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-container-name"=>"1363975865282", "--with-namespace"=>"1363975618879", "--with-request-id"=>"d849586e24d217b414cceb2f1264ccd2", "--cart-name"=>"jbossas-7"}}, ip-10-62-79-134, {'identity' => ip-10-62-79-134}) (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.301^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: [#<MCollective::RPC::Result:0x0000000455d8c8 @agent="openshift", @action="cartridge_do", @results={:sender=>"ip-10-62-79-134", :statuscode=>1, :statusmsg=>"cartridge_do_action failed -1. Output Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :data=>{:time=>nil, :output=>"Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :exitcode=>-1}}>] (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.303^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: MCollective Response Time (execute_direct: stop): 3.709499049s (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Control action 'stop' returned an error. rc=121 (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Sending SIGTERM to jboss:19877 ... (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: kill -TERM 20091 (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: /usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Failed to stop jbossas-7 (pid:3220) I have been able to reproduce it by starting, restarting, and then stopping the application in sequence. Roughly 1 out of every 5 attempts results in a failure. rhc app start app1 -ppwd; rhc app restart app1 -ppwd; rhc app stop app1 -ppwd; abstract-httpd and abstract-jboss seem to differ in the mechanisms for waiting for a process to stop and pid to disappear. I've recreated some bad behavior. After a stop successfully returns the jboss process is still active for a short period of time: [root@ip-10-12-17-55 ~]# rhc app stop -p 123456 -a as7 RESULT: as7 stopped [root@ip-10-12-17-55 ~]# ps -ef | grep jboss 501 22759 1 28 10:04 ? 00:00:09 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as7-bdecoste94e.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7 root 23111 11682 0 10:04 pts/3 00:00:00 grep jboss [root@ip-10-12-17-55 ~]# ps -ef | grep jboss root 23115 11682 0 10:04 pts/3 00:00:00 grep jboss Modified killtree logic Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/4f5f210aae55c874d443553effb23957d5313d0a Bug 923369 Verified on devenv_2993 1. Create jbossas application 2. SSH into gear [as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss 501 6557 1 0 21:56 ? 00:00:00 /bin/sh /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/bin/standalone.sh 501 6854 6557 10 21:57 ? 00:00:11 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as1-2993t.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7 3. Stop app 4. On gear side, verify process is killed immediately [as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss 501 7961 6262 0 21:58 pts/0 00:00:00 grep jboss 5. Restart the stopped app, restarted successfully Stop and restart app in sequence, tried 5 times, didn't reproduce. so moving this bug to verified. |