Description of problem: If you try to restart an application that is stopped, the REST service is returning 500 "internal server error" Version-Release number of selected component (if applicable): latest DEVENV code (happens since a few hours), protocol v1.0 How reproducible: always, integration tests #24, #25 and #26 reproduced this in a row: https://ci.dev.openshift.redhat.com/jenkins/job/openshift-java-client-devenv/26/testReport/com.openshift.internal.client/ApplicationResourceIntegrationTest/shouldRestartStoppedApplication/ Steps to Reproduce: 1. make sure you have an application that is stopped (create an application and stop it via: curl --user "<USER>:<PASS>" -H "Accept: application/xml; version=1.0" https://openshift.redhat.com/broker/rest/domains/<DOMAIN>/applications/<APPLICATION>/events -X POST --form "event=stop") 2. restart it (curl --user "<USER>:<PASS>" -H "Accept: application/xml; version=1.0" https://openshift.redhat.com/broker/rest/domains/<DOMAIN>/applications/<APPLICATION>/events -X POST --form "event=restart") 3. Actual results: HTTP status code 500, response body: {"data":null,"errors":{},"messages":[{"exit_code":-1,"field":null,"severity":"error","text":"\nReference ID: 877bbfd174ddd2b8fec93e4fb560b17a"}],"status":"internal_server_error","supported_api_versions":[1.0,1.1,1.2,1.3,1.4],"type":null,"version":"1.0"} Expected results: application gets started Additional info: PROD, STG, INT and even claytons DEVENV dont reproduce this. They restart the application just fine.
could not reproduce it using 2 devenv's agupta gave me. Build #27 was stable again (blue) and #28 failed again on the very same error ("could not restart stopped application"). I'm wondering: is this timing related?
issue is not reproducible in todays DEVENV. Thus closing
The issue started to occurr again in the latest build #34: https://ci.dev.openshift.redhat.com/jenkins/job/openshift-java-client-devenv/34/testReport/com.openshift.internal.client/ApplicationResourceIntegrationTest/shouldRestartStoppedApplication/ Any idea why this keeps flickering? Is this timing related?
Based on my debugging, the findings are: The failure is caused when the app_ctl_impl.sh script fails to kill the jboss pid as referenced in the log snippet below. ^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x00000005e76580> (pid:3220) ^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_client.custom_request('cartridge_do', {:cartridge=>"jbossas-7", :action=>"stop", :args=>{"--with-app-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-app-name"=>"1363975865282", "--with-container-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-container-name"=>"1363975865282", "--with-namespace"=>"1363975618879", "--with-request-id"=>"d849586e24d217b414cceb2f1264ccd2", "--cart-name"=>"jbossas-7"}}, ip-10-62-79-134, {'identity' => ip-10-62-79-134}) (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.301^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: [#<MCollective::RPC::Result:0x0000000455d8c8 @agent="openshift", @action="cartridge_do", @results={:sender=>"ip-10-62-79-134", :statuscode=>1, :statusmsg=>"cartridge_do_action failed -1. Output Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :data=>{:time=>nil, :output=>"Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :exitcode=>-1}}>] (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.303^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: MCollective Response Time (execute_direct: stop): 3.709499049s (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Control action 'stop' returned an error. rc=121 (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Sending SIGTERM to jboss:19877 ... (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: kill -TERM 20091 (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: /usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process (pid:3220) ^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Failed to stop jbossas-7 (pid:3220)
I have been able to reproduce it by starting, restarting, and then stopping the application in sequence. Roughly 1 out of every 5 attempts results in a failure. rhc app start app1 -ppwd; rhc app restart app1 -ppwd; rhc app stop app1 -ppwd; abstract-httpd and abstract-jboss seem to differ in the mechanisms for waiting for a process to stop and pid to disappear.
I've recreated some bad behavior. After a stop successfully returns the jboss process is still active for a short period of time: [root@ip-10-12-17-55 ~]# rhc app stop -p 123456 -a as7 RESULT: as7 stopped [root@ip-10-12-17-55 ~]# ps -ef | grep jboss 501 22759 1 28 10:04 ? 00:00:09 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as7-bdecoste94e.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7 root 23111 11682 0 10:04 pts/3 00:00:00 grep jboss [root@ip-10-12-17-55 ~]# ps -ef | grep jboss root 23115 11682 0 10:04 pts/3 00:00:00 grep jboss
Modified killtree logic
Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/4f5f210aae55c874d443553effb23957d5313d0a Bug 923369
Verified on devenv_2993 1. Create jbossas application 2. SSH into gear [as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss 501 6557 1 0 21:56 ? 00:00:00 /bin/sh /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/bin/standalone.sh 501 6854 6557 10 21:57 ? 00:00:11 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as1-2993t.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7 3. Stop app 4. On gear side, verify process is killed immediately [as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss 501 7961 6262 0 21:58 pts/0 00:00:00 grep jboss 5. Restart the stopped app, restarted successfully Stop and restart app in sequence, tried 5 times, didn't reproduce. so moving this bug to verified.