Bug 923369 - restart stopped application ends in REST service returning 500 "internal server error"
Summary: restart stopped application ends in REST service returning 500 "internal serv...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Bill DeCoste
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-19 16:42 UTC by Andre Dietisheim
Modified: 2015-05-15 02:16 UTC (History)
3 users (show)

Fixed In Version: https://github.com/openshift/origin-server/pull/1780
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-04-02 13:46:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker JBIDE-13818 0 Major Closed openshift-java-client: cannot restart stopped application (WATCHER ISSUE) 2017-10-26 14:33:11 UTC

Description Andre Dietisheim 2013-03-19 16:42:25 UTC
Description of problem:
If you try to restart an application that is stopped, the REST service is returning 500 "internal server error"

Version-Release number of selected component (if applicable):
latest DEVENV code (happens since a few hours), protocol v1.0

How reproducible:
always, integration tests #24, #25 and #26 reproduced this in a row:
https://ci.dev.openshift.redhat.com/jenkins/job/openshift-java-client-devenv/26/testReport/com.openshift.internal.client/ApplicationResourceIntegrationTest/shouldRestartStoppedApplication/

Steps to Reproduce:
1. make sure you have an application that is stopped
(create an application and stop it via: curl --user "<USER>:<PASS>" -H "Accept: application/xml; version=1.0" https://openshift.redhat.com/broker/rest/domains/<DOMAIN>/applications/<APPLICATION>/events -X POST --form "event=stop")
2. restart it
(curl --user "<USER>:<PASS>" -H "Accept: application/xml; version=1.0" https://openshift.redhat.com/broker/rest/domains/<DOMAIN>/applications/<APPLICATION>/events -X POST --form "event=restart")
3.
  
Actual results:
HTTP status code 500, response body: 
 {"data":null,"errors":{},"messages":[{"exit_code":-1,"field":null,"severity":"error","text":"\nReference ID: 877bbfd174ddd2b8fec93e4fb560b17a"}],"status":"internal_server_error","supported_api_versions":[1.0,1.1,1.2,1.3,1.4],"type":null,"version":"1.0"}

Expected results:
application gets started

Additional info:
PROD, STG, INT and even claytons DEVENV dont reproduce this. They restart the application just fine.

Comment 1 Andre Dietisheim 2013-03-19 21:42:03 UTC
could not reproduce it using 2 devenv's agupta gave me. Build #27 was stable again (blue) and #28 failed again on the very same error ("could not restart stopped application"). I'm wondering: is this timing related?

Comment 2 Andre Dietisheim 2013-03-20 10:26:15 UTC
issue is not reproducible in todays DEVENV. Thus closing

Comment 3 Andre Dietisheim 2013-03-20 18:39:52 UTC
The issue started to occurr again in the latest build #34:
https://ci.dev.openshift.redhat.com/jenkins/job/openshift-java-client-devenv/34/testReport/com.openshift.internal.client/ApplicationResourceIntegrationTest/shouldRestartStoppedApplication/
Any idea why this keeps flickering? Is this timing related?

Comment 4 Abhishek Gupta 2013-03-22 21:02:34 UTC
Based on my debugging, the findings are:

The failure is caused when the app_ctl_impl.sh script fails to kill the jboss pid as referenced in the log snippet below.

^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x00000005e76580> (pid:3220)
^[[0;37m2013-03-22 14:13:33.595^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: rpc_client.custom_request('cartridge_do', {:cartridge=>"jbossas-7", :action=>"stop", :args=>{"--with-app-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-app-name"=>"1363975865282", "--with-container-uuid"=>"ee0f90b6931b11e294a512313d2d6078", "--with-container-name"=>"1363975865282", "--with-namespace"=>"1363975618879", "--with-request-id"=>"d849586e24d217b414cceb2f1264ccd2", "--cart-name"=>"jbossas-7"}}, ip-10-62-79-134, {'identity' => ip-10-62-79-134}) (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220)
^[[0;37m2013-03-22 14:13:37.301^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: [#<MCollective::RPC::Result:0x0000000455d8c8 @agent="openshift", @action="cartridge_do", @results={:sender=>"ip-10-62-79-134", :statuscode=>1, :statusmsg=>"cartridge_do_action failed -1. Output Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :data=>{:time=>nil, :output=>"Control action 'stop' returned an error. rc=121\nSending SIGTERM to jboss:19877 ...\nkill -TERM 20091\n/usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process\nFailed to stop jbossas-7\n", :exitcode=>-1}}>] (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220)
^[[0;37m2013-03-22 14:13:37.303^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: MCollective Response Time (execute_direct: stop): 3.709499049s  (Request ID: d849586e24d217b414cceb2f1264ccd2) (pid:3220)
^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Control action 'stop' returned an error. rc=121 (pid:3220)
^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Sending SIGTERM to jboss:19877 ... (pid:3220)
^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: kill -TERM 20091 (pid:3220)
^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: /usr/libexec/openshift/cartridges/jbossas-7/info/bin/app_ctl_impl.sh: line 34: kill: (20091) - No such process (pid:3220)
^[[0;37m2013-03-22 14:13:37.304^[[0m [^[[0;37mDEBUG^[[0m] DEBUG: server results: Failed to stop jbossas-7 (pid:3220)

Comment 5 Abhishek Gupta 2013-03-22 21:09:36 UTC
I have been able to reproduce it by starting, restarting, and then stopping the application in sequence. Roughly 1 out of every 5 attempts results in a failure.

rhc app start app1 -ppwd; rhc app restart app1 -ppwd; rhc app stop app1 -ppwd;

abstract-httpd and abstract-jboss seem to differ in the mechanisms for waiting for a process to stop and pid to disappear.

Comment 6 Bill DeCoste 2013-03-23 14:05:52 UTC
I've recreated some bad behavior. After a stop successfully returns the jboss process is still active for a short period of time:

[root@ip-10-12-17-55 ~]# rhc app stop -p 123456 -a as7
RESULT:
as7 stopped
[root@ip-10-12-17-55 ~]# ps -ef | grep jboss
501      22759     1 28 10:04 ?        00:00:09 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as7-bdecoste94e.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/0af121d693c111e286d412313d2926cd//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/0af121d693c111e286d412313d2926cd/jbossas-7/jbossas-7
root     23111 11682  0 10:04 pts/3    00:00:00 grep jboss
[root@ip-10-12-17-55 ~]# ps -ef | grep jboss
root     23115 11682  0 10:04 pts/3    00:00:00 grep jboss

Comment 7 Bill DeCoste 2013-03-23 16:20:20 UTC
Modified killtree logic

Comment 9 Jianwei Hou 2013-03-25 02:09:17 UTC
Verified on devenv_2993

1. Create jbossas application
2. SSH into gear
[as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss
501       6557     1  0 21:56 ?        00:00:00 /bin/sh /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/bin/standalone.sh
501       6854  6557 10 21:57 ?        00:00:11 /usr/lib/jvm/jre-1.7.0/bin/java -D[Standalone] -server -XX:+TieredCompilation -Xmx256m -XX:MaxPermSize=102m -XX:+AggressiveOpts -Dorg.apache.tomcat.util.LOW_MEMORY=true -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djboss.node.name=as1-2993t.dev.rhcloud.com -Djgroups.bind_addr=127.0.250.129 -Dorg.apache.coyote.http11.Http11Protocol.COMPRESSION=on -Dorg.jboss.boot.log.file=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/log/boot.log -Dlogging.configuration=file:/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/standalone/configuration/logging.properties -jar /var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7/jboss-modules.jar -mp /var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/standalone/configuration/modules::/var/lib/openshift/514fada669ef2b087f000001//jbossas-7/jbossas-7/modules -jaxpmodule javax.xml.jaxp-provider org.jboss.as.standalone -Djboss.home.dir=/var/lib/openshift/514fada669ef2b087f000001/jbossas-7/jbossas-7
3. Stop app
4. On gear side, verify process is killed immediately
[as1-2993t.dev.rhcloud.com 514fada669ef2b087f000001]\> ps -ef|grep jboss
501       7961  6262  0 21:58 pts/0    00:00:00 grep jboss
5. Restart the stopped app, restarted successfully
Stop and restart app in sequence, tried 5 times, didn't reproduce. so moving this bug to verified.


Note You need to log in before you can comment on or make changes to this bug.