Bug 800188 - Failed app create doesn't always clean up properly
Summary: Failed app create doesn't always clean up properly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Dan McPherson
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 767033
TreeView+ depends on / blocked
 
Reported: 2012-03-05 22:32 UTC by Thomas Wiest
Modified: 2015-05-15 01:47 UTC (History)
9 users (show)

Fixed In Version: devenv_1858
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-25 18:27:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thomas Wiest 2012-03-05 22:32:06 UTC
Description of problem:
When an app fails to create properly, it doesn't always get cleaned up completely.

For instance, one failed app create left around this file:

/etc/httpd/conf.d/libra/224cd723b07d476d9fbb9ddd95c2458e_nagiosmonitor_chkexsrv2.conf

As you can see, this app no longer existed on the system:

[root@ex-std-node2 ~]# grep -ri 224cd723b07d476d9fbb9ddd95c2458e /etc/passwd
[root@ex-std-node2 ~]#

However, even more disturbing is that this file tied the URL to an incorrect IP and thus when we tried to add an app with the same URL (as we do in our monitoring), this config file redirected apache to the wrong internal IP address for the proxy pass.

We've also seen where the application directory is left around, even though the user and apache conf files have been removed.

For instance:
[root@ex-std-node2 ~]# ll -d /var/lib/libra/3ad58fce1a3644fc9be230f94d56e864
drwxr-x---. 3 root 6821 4096 Mar  5 03:59 /var/lib/libra/3ad58fce1a3644fc9be230f94d56e864
[root@ex-std-node2 ~]# grep 3ad58fce1a3644fc9be230f94d56e864 /etc/passwd
[root@ex-std-node2 ~]# ll -d /etc/httpd/conf.d/libra/3ad58fce1a3644fc9be230f94d56e864*
ls: cannot access /etc/httpd/conf.d/libra/3ad58fce1a3644fc9be230f94d56e864*: No such file or directory
[root@ex-std-node2 ~]#


In this case, rhc-accept-node correctly alerted us to this problem. However it should still be properly cleaned up by the failed create app.


*** IMPORTANT ***

As part of this fix, please add a check to rhc-accept-node that ensures that all  /etc/httpd/conf.d/libra/$uuid_*.conf files have a corresponding /etc/passwd entry for their $uuid. This way we can ensure consistency.

*** IMPORTANT ***


Version-Release number of selected component (if applicable):
rhc-node-0.87.14-1.el6_2.x86_64


How reproducible:
App creation has to fail, so pretty hard to reproduce.


Steps to Reproduce:
1. unsure as to how to reliably reproduce this, however we've seen it quite a bit.

  
Actual results:
Sometimes when app creation fails it doesn't full clean up the failed app.


Expected results:
when app creation fails it should always cleanup all parts of the failed app.


Additional info:

Comment 1 Rajat Chopra 2012-03-14 21:26:55 UTC
This is hard to reproduce. Not been able to do so yet.
As part of other bug fixes, whats better handled now is the deletion of the gear (unix user, home dir etc).
As far as the entry in httpd conf files, its really the responsibility of the deconfigure hook. I am still combing through the hooks to see what could be handled differently.

Lazy man's proposal for the meanwhile - shall we put some cleaner code in mcollective as we destroy gears? so if there is a destroy-gear call with the uuid, we just go and do a rm -rf /etc/httpd/conf.d/stickshift/$uuid_* ??

Comment 2 Thomas Wiest 2012-03-15 03:30:14 UTC
Thanks Rajat. I know this is hard to repro. I hate sporadic bugs like this.

For our part, I've added a check to rhc-accept-node that will cause failures when the proxy pass files are left around. This will enable our monitoring to alert us when this specific case is happening and I can then tell you how often it happens and hopefully show you the problem.

Right now, unfortunately we've just been running across these left over fragments or sometimes seeing our monitoring checks fail because of these left over fragments.

Comment 3 Thomas Wiest 2012-03-26 15:15:19 UTC
The check that I added to rhc-accept-node went live with the last release and the first day we cleaned up all of the errors that we saw.

We have a monitoring check for rhc-accept-node that tells us when it starts failing. When it fails, we go in and cleanup / fix whatever caused it to fail.

This bug is still around because we're still seeing new errors from partially destroyed apps.

Here's the latest failure from rhc-accept-node:

FAIL: httpd config file c2e8f2b6db114b54852882551ab21f25_openshiftnagios_chkexsrv2.conf doesn't have an associated user

As you can see, this partially destroyed app is actually from our monitoring check_create_app check.

Basically that check just creates an app, makes sure it's there, and then removes the app.

I'm going to manually clean this one up. For next time, let me know if there's any information that you would like me to collect from the machine before I clean up the app.

Comment 4 Dan McPherson 2012-03-28 16:39:16 UTC
I think this bug is the same as:

https://bugzilla.redhat.com/show_bug.cgi?id=807638


I gave clear steps on how to create and what the issue is in there.

Comment 5 Rajat Chopra 2012-03-28 20:52:31 UTC
Fixed with rev#dbe8ad2f7cf5b5869871feb58228a84e2d42b563

The steps given in https://bugzilla.redhat.com/show_bug.cgi?id=807638
indeed created leftovers. The fix does not let that happen.

Comment 6 Xiaoli Tian 2012-03-29 03:55:09 UTC
Tested this on devenv_1679 by make an creation failure intentionally, it's
fixed now.

Comment 7 Thomas Wiest 2012-04-02 14:37:49 UTC
This bug definitely still exists with the new code that's currently in STG.

rhc-node-0.89.2-1.el6_2.x86_64

This morning there are 22 rhc-accept-node failures on ex-std-nodes 1 and 2 in STG.

The errors look like this (the file is different for each error, of course):
FAIL: httpd config file 015d9c4cea6549039d953cfe13a5f0fc_jizhao37777_wsgi.conf doesn't have an associated user

Friday there were 0 rhc-accept-node failures on these nodes. This leads me to believe that QE's testing over the weekend led to these failures.

Re-opening this bug.

Comment 8 Mike McGrath 2012-04-13 21:59:12 UTC
Just saw this again in stg with the new upgrade:


[root@ex-std-node1 ~]# rhc-accept-node 
FAIL: httpd config file 45b16df29d0646c1aae8205059e01e7d_testfotios_45b16df29d.conf doesn't have an associated user
FAIL: httpd config file dc82422b7221489388a549a2126bb73b_testfotios_testscale.conf doesn't have an associated user

Comment 9 Thomas Wiest 2012-04-20 15:50:27 UTC
App creates are still failing in PROD with the latest code.

We found this app: e7ce1e7004f34fe1a456fae7d190ac51
No mongo entry
Gear home dir only had a .env and a .tmp directory in it

Here's the relevant log entry from mcollective on the host:

D, [2012-04-20T08:31:08.922103 #1095] DEBUG -- : libra.rb:60:in `cartridge_do_action' cartridge_do_acti
on call / request = #<MCollective::RPC::Request:0x7fea3a10ab40
 @action="cartridge_do",
 @agent="libra",
 @caller="cert=mcollective-public",
 @data=
  {:cartridge=>"stickshift-node",
   :args=>
    "--with-app-uuid 'e7ce1e7004f34fe1a456fae7d190ac51' --with-container-uuid 'e7ce1e7004f34fe1a456fae7
d190ac51' -i '6441' --named 'chkexsrv2' --with-namespace 'openshiftnagios'",
   :action=>"app-create",
   :process_results=>true},
 @sender="mcollect.cloud.redhat.com",
 @time=1334925068,
 @uniqid="8bed61646e8b332950010b4577d28209">

D, [2012-04-20T08:31:08.922361 #1095] DEBUG -- : libra.rb:61:in `cartridge_do_action' cartridge_do_acti
on validation = stickshift-node app-create --with-app-uuid 'e7ce1e7004f34fe1a456fae7d190ac51' --with-co
ntainer-uuid 'e7ce1e7004f34fe1a456fae7d190ac51' -i '6441' --named 'chkexsrv2' --with-namespace 'openshi
ftnagios'
D, [2012-04-20T08:31:09.723126 #1095] DEBUG -- : libra.rb:102:in `cartridge_do_action' cartridge_do_act
ion (0)
------

------)


Note: This is the ONLY entry for this gear uuid in the mcollective log.

Comment 10 Rajat Chopra 2012-04-23 20:16:40 UTC
It would really help, if in such cases, a corresponding snippet of broker logs is provided (wrt that gear uuid or the application). That would help to see why proper deconfigure/destroy hooks are not called on certain gears.

Comment 11 Thomas Wiest 2012-04-27 15:33:42 UTC
Oh, ok, no prob.

We got an alert in our monitoring that gear 4a0e3154d50e4b8bb62a39e851eda756 was in a bad state.

After investigating I determined that this was a half created gear.

Here's it's directory:
# ll -a
total 32
drwxr-x---.   4 root 4a0e3154d50e4b8bb62a39e851eda756  4096 Apr 27 04:20 .
drwxr-x--x. 211 root root                             20480 Apr 27 09:36 ..
drwxr-x---.   2 root 4a0e3154d50e4b8bb62a39e851eda756  4096 Apr 27 04:20 .env
d---------.   2 root root                              4096 Apr 27 04:20 .tmp




This is still a problem with the latest code in STG as of this morning.

Here is the log from the ex-srv:

Started POST "/broker/rest/domains/bmeng5s/applications/py1s/events" for 10.77.7.46 at Fri Apr 27 04:20:
30 -0400 2012
  Processing by AppEventsController#create as JSON
  Parameters: {"broker_auth_iv"=>"[FILTERED]", "broker_auth_key"=>"[FILTERED]", "application_id"=>"py1s"
, "domain_id"=>"bmeng5s", "event"=>"scale-up"}
MongoDataStore.find(CloudUser, bmeng+5, bmeng+5)

Adding user bmeng+5...inside base_controller
MongoDataStore.find(CloudUser, bmeng+5, bmeng+5)

DEBUG: find_available_impl: district_uuid: 61ecace7bada4b2b9d14ddc0fc511ef0
DEBUG: rpc_get_fact: fact=active_capacity
DEBUG: rpc_exec: rpc_client=#<MCollective::RPC::Client:0x7f4608891638>
Next server: ex-std-node2.stg.rhcloud.com active capacity: 33.75
Current server: ex-std-node2.stg.rhcloud.com active capacity: 33.75
Next server: ex-std-node1.stg.rhcloud.com active capacity: 31.25
Current server: ex-std-node1.stg.rhcloud.com active capacity: 31.25
CURRENT SERVER: ex-std-node1.stg.rhcloud.com
DEBUG: find_available_impl: current_server: ex-std-node1.stg.rhcloud.com: 31.25
MongoDataStore.reserve_district_uid(61ecace7bada4b2b9d14ddc0fc511ef0)

DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f4608bcab88>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"app-create", :cartridge=>"stickshift-node", :args=>"--with-app-uuid 'b2dd26b7809640b084e15f434294613d' --with-container-uuid '4a0e3154d50e4b8bb62a39e851eda756' -i '4505' --named 'py1s' --with-namespace 'bmeng5s'"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f46089acab8 @action="cartridge_do", @agent="libra", @results={:statuscode=>0, :data=>{:exitcode=>1, :output=>"CLIENT_ERROR: \nCLIENT_ERROR: Could not add job 'jbosstest-build' in Jenkins server:\nCLIENT_ERROR:    \nCLIENT_ERROR: You'll need to correct this error before attempting to embed the Jenkins client again.\n"}, :sender=>"ex-std-node1.stg.rhcloud.com", :statusmsg=>"OK"}>]
uninitialized constant GroupInstance::NodeException
Completed 500 Internal Server Error in 10353ms

NoMethodError (undefined method `code' for #<NameError: uninitialized constant GroupInstance::NodeException>):
  



Here is the log from the ex-node (note: this is the only output with this gear uuid in the log):

D, [2012-04-27T04:20:34.233545 #21673] DEBUG -- : libra.rb:60:in `cartridge_do_action' cartridge_do_act
ion call / request = #<MCollective::RPC::Request:0x7f5cc75c65e0
 @action="cartridge_do",
 @agent="libra",
 @caller="cert=mcollective-public",
 @data=
  {:cartridge=>"stickshift-node",
   :action=>"app-create",
   :process_results=>true,
   :args=>
    "--with-app-uuid 'b2dd26b7809640b084e15f434294613d' --with-container-uuid '4a0e3154d50e4b8bb62a39e8
51eda756' -i '4505' --named 'py1s' --with-namespace 'bmeng5s'"},
 @sender="mcollect.cloud.redhat.com",
 @time=1335514834,
 @uniqid="df2dd5b0d80d5840d73d9ebcfd0e2517">

D, [2012-04-27T04:20:34.233803 #21673] DEBUG -- : libra.rb:61:in `cartridge_do_action' cartridge_do_act
ion validation = stickshift-node app-create --with-app-uuid 'b2dd26b7809640b084e15f434294613d' --with-c
ontainer-uuid '4a0e3154d50e4b8bb62a39e851eda756' -i '4505' --named 'py1s' --with-namespace 'bmeng5s'

Comment 12 Rajat Chopra 2012-05-02 16:30:28 UTC
More fix with rev#d47868db10f5c84d0613c6fee93690aa0d2a0046

Situation  : make gear create fail towards the end (non-zero exit code). This half-creates the gear, which fails the action on app. Everything else recovers, but the created gear is never destroyed.

Comment 13 Johnny Liu 2012-05-03 10:03:36 UTC
According to comment 12, reproduce this bug on devenv-stage_157.
Step:
1. Modify /usr/lib/ruby/gems/1.8/gems/stickshift-node-0.7.4/bin/ss-app-create, replace the following line:
exit 0
to 
exit 1
2. Try to create app
3. Failed to create app, and the app gear is left in a bad state.
[root@ip-10-100-229-82 stickshift]# ls -la c5356021a43f403eaf01dc74b69b62b6
total 16
drwxr-x---. 4 root c5356021a43f403eaf01dc74b69b62b6 4096 May  3 04:56 .
drwxr-x--x. 6 root root                             4096 May  3 04:56 ..
drwxr-x---. 2 root c5356021a43f403eaf01dc74b69b62b6 4096 May  3 04:56 .env
d---------. 2 root root                             4096 May  3 04:56 .tmp


Verified this bug on devenv_1752, and PASS.
1. Modify ss-app-create to make it failed on purpose as reproduced steps.
2. Whatever create scalable app or non-scalable, even if it failed, no leftover for this app is seen.


$ create_php_app 
Submitting form:
debug: true
rhlogin: jialiu
Contacting https://ec2-107-21-67-181.compute-1.amazonaws.com
Creating application: phptest in jialiu
Contacting https://ec2-107-21-67-181.compute-1.amazonaws.com
Problem reported from server. Response code was 500.

DEBUG:


Exit Code: 1
broker_c: namespacerhloginsshapp_uuiddebugaltercartridgecart_typeactionapp_nameapi
api_c: placeholder
API version:    1.1.3

RESULT:
Unable to create gear on node

$ curl -k -X POST -H 'Accept: application/xml' -d name=myapp -d cartridge=php-5.3 -d scale=true --user jialiu:214214 https://ec2-107-21-67-181.compute-1.amazonaws.com/broker/rest/domains/jialiu/applications
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <type nil="true"></type>
  <data>
    <datum nil="true"></datum>
  </data>
  <version>1.0</version>
  <messages>
    <message>
      <exit-code nil="true"></exit-code>
      <severity>error</severity>
      <text>Failed to create application myapp due to:Unable to create gear on node</text>
      <field nil="true"></field>
    </message>
  </messages>
  <status>internal_server_error</status>
  <supported-api-versions>
    <supported-api-version>1.0</supported-api-version>
  </supported-api-versions>
</response>

Comment 14 Thomas Wiest 2012-05-12 17:32:27 UTC
This is still a problem in STG with the latest code:

rhc-node-0.92.5-1.el6_2.x86_64
rhc-broker-0.92.8-1.el6_2.noarch


Here's the partially destroyed app:

[ccb8bc46eddd4fcd9cc302a617948c60]# ls -la
total 36
drwxr-x---.   4 root ccb8bc46eddd4fcd9cc302a617948c60  4096 May 12 00:15 .
drwxr-x--x. 278 root root                             24576 May 12 13:03 ..
drwxr-x---.   2 root ccb8bc46eddd4fcd9cc302a617948c60  4096 May 12 00:15 .env
d---------.   3 root root                              4096 May 12 00:25 .tmp
[ccb8bc46eddd4fcd9cc302a617948c60]#


This is from the ex-srv broker logs:
* Note: I've e-mailed this directly to Rajat since it might have sensitive information in it.


This is from the ex-node mcollective logs:

D, [2012-05-12T00:15:06.029439 #10488] DEBUG -- : libra.rb:303:in `cartridge_do_action' cartridge_do_ac
tion call / request = #<MCollective::RPC::Request:0x7f1431d0ce50
 @action="cartridge_do",
 @agent="libra",
 @caller="cert=mcollective-public",
 @data=
  {:cartridge=>"stickshift-node",
   :action=>"app-create",
   :args=>
    {"--named"=>"wsgitest",
     "--with-uid"=>2879,
     "--with-app-uuid"=>"ccb8bc46eddd4fcd9cc302a617948c60",
     "--with-container-uuid"=>"ccb8bc46eddd4fcd9cc302a617948c60",
     "--with-namespace"=>"jialiu1"},
   :process_results=>true},
 @sender="mcollect.cloud.redhat.com",
 @time=1336796105,
 @uniqid="22fa0fad26984a0860e76b787d90aa20">

D, [2012-05-12T00:15:06.029962 #10488] DEBUG -- : libra.rb:304:in `cartridge_do_action' cartridge_do_ac
tion validation = stickshift-node app-create --namedwsgitest--with-uid2879--with-app-uuidccb8bc46eddd4f
cd9cc302a617948c60--with-container-uuidccb8bc46eddd4fcd9cc302a617948c60--with-namespacejialiu1
D, [2012-05-12T00:15:06.030238 #10488] DEBUG -- : libra.rb:59:in `ss_app_create' COMMAND: ss-app-create
D, [2012-05-12T00:15:07.088597 #10488] DEBUG -- : amqp.rb:91:in `receive' Received message

[...snip...]

D, [2012-05-12T00:15:07.226191 #10488] DEBUG -- : libra.rb:303:in `cartridge_do_action' cartridge_do_ac
tion call / request = #<MCollective::RPC::Request:0x7f1431ce3190
 @action="cartridge_do",
 @agent="libra",
 @caller="cert=mcollective-public",
 @data=
  {:cartridge=>"stickshift-node",
   :action=>"app-destroy",
   :args=>
    {"--with-app-uuid"=>"ccb8bc46eddd4fcd9cc302a617948c60",
     "--with-container-uuid"=>"ccb8bc46eddd4fcd9cc302a617948c60"},
   :process_results=>true},
 @sender="mcollect.cloud.redhat.com",
 @time=1336796107,
 @uniqid="0cac21641ae2920fae565b8597b32501">

D, [2012-05-12T00:15:07.226506 #10488] DEBUG -- : libra.rb:304:in `cartridge_do_action' cartridge_do_action validation = stickshift-node app-destroy --with-app-uuidccb8bc46eddd4fcd9cc302a617948c60--with-container-uuidccb8bc46eddd4fcd9cc302a617948c60
D, [2012-05-12T00:15:07.226847 #10488] DEBUG -- : libra.rb:86:in `ss_app_destroy' COMMAND: ss-app-destroy
D, [2012-05-12T00:15:07.227423 #10488] DEBUG -- : libra.rb:95:in `ss_app_destroy' ERROR: unable to destroy user account ccb8bc46eddd4fcd9cc302a617948c60

Comment 15 Xiaoli Tian 2012-05-14 07:54:38 UTC
I met this issue on stage as well, 

Failed to create scalephp2:
  rhc app create --app scalephp2 --type php-5.3 -s
   Password: 
   Creating application: scalephp2 in testssh0
   /usr/lib/ruby/gems/1.8/gems/rhc-0.92.11/lib/rhc-common.rb:445:in `create_app':           undefined method `uuid' for []:Array (NoMethodError)
	from /usr/lib/ruby/gems/1.8/gems/rhc-0.92.11/bin/rhc-app:226:in `create_app'
	from /usr/lib/ruby/gems/1.8/gems/rhc-0.92.11/bin/rhc-app:565
	from /usr/bin/rhc-app:19:in `load'
	from /usr/bin/rhc-app:19


But it's listed in rhc domain show:
scalephp2
    Framework: php-5.3
     Creation: 2012-05-14T03:39:25-04:00
         UUID: 285839a125414a50bd1b26557f1e6e69
      Git URL: ssh://285839a125414a50bd1b26557f1e6e69.rhcloud.com/~/git/scalephp2.git/
   Public URL: http://scalephp2-testssh0.stg.rhcloud.com/

 Embedded: 
      haproxy-1.4




But if you ping the url, it's unknown host:
#ping scalephp2-testssh0.stg.rhcloud.com
ping: unknown host scalephp2-testssh0.stg.rhcloud.com

Comment 16 Rajat Chopra 2012-05-15 17:57:47 UTC
Fix needed in unix_user.rb, 
two items :
  1. Handle the destroy in a step fashion.. looking at uuid, uid, filesystem separately and logging their failures separately
  2. Serializing unix_user.create and unix_user.destroy so that there is no race condition when mcollective barfs.

Comment 17 Anderson Silva 2012-05-29 15:02:55 UTC
Just for an update... Over the weekend of May 25th through May 29th there were over 30 instances this issue in PROD.

Most of them were related to some .conf being left behind on ex-{std,lg}-node.

AS

Comment 18 Anderson Silva 2012-05-30 17:32:58 UTC
FAIL: httpd config file 42b5b7ff73c04513816121cc33ac2d2f_imasen_cashloggerbldr.conf doesn't have an associated user
FAIL: httpd config file cfaea2b1e63c4616b04741b1e488549d_imasen_cashloggerbldr.conf doesn't have an associated user

This *_imasen_cashloggerbldr (user/namespace) has had over 6 failures in the last 2 days.

AS

Comment 19 Dan McPherson 2012-05-30 20:19:01 UTC
I think I have fixed a lot of the issues with 825354.  I would like to know how much better this issue is after this release is out.

Comment 20 Xiaoli Tian 2012-06-01 11:29:04 UTC
Meet this again on int.openshift.redhat.com

Failed to create an app :

[xiaoli@localhost int]$ rhc app create -a rubyap1 -t ruby-1.8 -l xtian+test1 -p 123456
Creating application: rubyap1 in z5eusdurut
Problem reported from server. Response code was 500.
Re-run with -d for more information.

But it's listed in domain info:

[xiaoli@localhost int]$ rhc domain show -l xtian+test1 -p 123456

User Info
=========
Namespace: z5eusdurut
  RHLogin: xtian+test1

Application Info
================
rubyap1
    Framework: ruby-1.8
     Creation: 2012-06-01T07:19:56-04:00
         UUID: 68e357b50b124be287f154319e73b76e
      Git URL: ssh://68e357b50b124be287f154319e73b76e.rhcloud.com/~/git/rubyap1.git/
   Public URL: http://rubyap1-z5eusdurut.int.rhcloud.com/

 Embedded: 
      None

Ssh to it will fail:
[xiaoli@localhost int]$ ssh 68e357b50b124be287f154319e73b76e.rhcloud.com
ssh: Could not resolve hostname rubyap1-z5eusdurut.int.rhcloud.com: Name or service not known

Comment 21 Jianwei Hou 2012-06-12 02:17:12 UTC
Meet this again on current stage. the information is shown in rhc domain show, but it can not be destroyed.

#rhc domain show
Password: ******
User Info
=========
Namespace: jhou
  RHLogin: jhou

Application Info
================
sa
    Framework: jbossas-7
     Creation: 2012-06-11T21:43:46-04:00
         UUID: 6929c459471145d58f462c092cefa699
      Git URL: ssh://6929c459471145d58f462c092cefa699.rhcloud.com/~/git/sa.git/
   Public URL: http://sa-jhou.stg.rhcloud.com/

 Embedded: 
      haproxy-1.4

[hjw@localhost test]$ rhc app destroy -a sa
Password: ******

!!!! WARNING !!!! WARNING !!!! WARNING !!!!
You are about to destroy the sa application.

This is NOT reversible, all remote data for this application will be removed.
Do you want to destroy this application (y/n): y
Problem reported from server. Response code was 400.
Re-run with -d for more information.

RESULT:
Application gears already at zero for 'jhou'

[hjw@localhost test]$ ssh 6929c459471145d58f462c092cefa699.rhcloud.com
ssh: Could not resolve hostname sa-jhou.stg.rhcloud.com: Name or service not known

Comment 22 Xiaoli Tian 2012-06-17 03:08:46 UTC
If the app failed to create like bug 832745 said, the failed app data is not removed from mongo, but can not be destoryed via client.

some log from broker:

DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'b69f0e064b' 'freedom3' 'b69f0e064bf64e88abc75bb14180668b'", :cartridge=>"php-5.3"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f1859680830 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"Waiting for stop to finish\n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]
DEBUG: Cartridge command php-5.3::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f185966b6b0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'b3dd6a8997' 'freedom3' 'b3dd6a8997eb409c83764530a2f0342d'", :cartridge=>"php-5.3"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f185961ffa8 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"Waiting for stop to finish\n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]
DEBUG: Cartridge command php-5.3::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f18596afef0>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'854dae0d92' 'freedom3' '854dae0d923545c690f9705895d3a7e7'", :cartridge=>"php-5.3"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f18597a1408 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"Waiting for stop to finish\n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]
DEBUG: Cartridge command php-5.3::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f185976bc90>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'fbcdc69857' 'freedom3' 'fbcdc69857254724b3bb94f7444903ed'", :cartridge=>"php-5.3"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f18596e6608 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"Waiting for stop to finish\n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]
DEBUG: Cartridge command php-5.3::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

DEBUG: Deconfiguring embedded application 'haproxy-1.4' in application 'phpapp5' on node 'ip-10-62-91-176'
DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f18596c4e18>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'phpapp5' 'freedom3' 'c0d1bec318104c7e9569cf9d29d35ec6'", :cartridge=>"embedded/haproxy-1.4"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f1859640f78 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"/usr/libexec/stickshift/cartridges/embedded/haproxy-1.4/info/hooks/deconfigure: line 62: kill: (14315) - No such process\nSSH_KEY_REMOVE: \n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]

DEBUG: Cartridge command embedded/haproxy-1.4::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

DEBUG: rpc_exec_direct: rpc_client=#<MCollective::RPC::Client:0x7f18597a0f30>
DEBUG: rpc_client.custom_request('cartridge_do', {:action=>"deconfigure", :args=>"'phpapp5' 'freedom3' 'c0d1bec318104c7e9569cf9d29d35ec6'", :cartridge=>"php-5.3"}, @id, {'identity' => @id})
DEBUG: [#<MCollective::RPC::Result:0x7f18596fbc88 @results={:sender=>"ip-10-62-91-176", :statusmsg=>"OK", :data=>{:exitcode=>0, :output=>"Waiting for stop to finish\n"}, :statuscode=>0}, @action="cartridge_do", @agent="libra">]
DEBUG: Cartridge command php-5.3::deconfigure exitcode = 0
MongoDataStore.save(CloudUser, xtian+b105, xtian+b105, #hidden)

MongoDataStore.save(Application, xtian+b105, phpapp5, #hidden)

Completed 500 Internal Server Error in 96585ms
StickShift::UserException (Application limit has reached for 'xtian+b105')


,

Comment 23 Dan McPherson 2012-06-18 18:09:26 UTC
Comments 20,21,22 appear to be different issues.  If you don't believe they are handled in other existing bugs please feel free to open new bugs.  But this bug is about issues where data is left on the node and there is nothing in mongo.

Comment 24 Dan McPherson 2012-06-19 03:37:08 UTC
I have made changes to address the issues the bug was opened for.  Would like to know if we have and additional cases of httpd conf or user home dirs left around.

Comment 25 Johnny Liu 2012-06-19 12:15:11 UTC
Re-test bug with devenv_1857, failed, httpd conf is left around after encounter app creation failure.


steps:
1. On instance:
# rhc-admin-ctl-user -l jialiu --setmaxgears 1
2. On client:
$ rhc-create-app -a myapp -t php-5.3 -px -s 
Creating application: myapp in jialiu
/usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-rest.rb:134:in `raise': exception object expected (TypeError)
	from /usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-rest.rb:134:in `process_error_response'
	from /usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-rest.rb:86:in `rescue in send'
	from /usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-rest.rb:71:in `send'
	from /usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-rest/domain.rb:30:in `add_application'
	from /usr/local/share/gems/gems/rhc-0.93.18/lib/rhc-common.rb:511:in `create_app'
	from /usr/local/share/gems/gems/rhc-0.93.18/bin/rhc-create-app:226:in `<top (required)>'
	from /usr/local/bin/rhc-create-app:23:in `load'
	from /usr/local/bin/rhc-create-app:23:in `<main>'

3. On instance:
[root@ip-10-85-3-53 stickshift]# pwd
/var/lib/stickshift
[root@ip-10-85-3-53 stickshift]# ls .httpd.d/
11e5d4d172074502b8f9d42e8bfeaec7_jialiu_myapp  bcbf670ed51d45a881b0a674cfc4d0ce_jialiu_bcbf670ed5

Comment 26 Dan McPherson 2012-06-19 14:00:04 UTC
Fixed in version says devenv_1858.  It was fixed after the last build came out.

Comment 27 Johnny Liu 2012-06-20 09:45:52 UTC
Verified this bug with devenv_1589, and PASS.


Note You need to log in before you can comment on or make changes to this bug.