Description of problem: Failed app creates sometimes leave around fragments. This morning we got some e-mail from STG that showed that rhc-last-access is failing with this error: EXCEPTION: No such file or directory - /var/lib/stickshift/2300b6a3f8ef4aa0834847ec326730c6/.env/OPENSHIFT_GEAR_DNS In total, there are 9 apps that are failing with this error in STG. When we go to look at these apps, here's what we see: * The app _does not_ have an entry in Mongo * The app _does not_ have a ProxyPass file * The app _has_ a user on the machine * The app _has_ a directory under /var/lib/stickshift Under the /var/lib/stickshift/uuid directory, it looks like this: [root@ex-std-node1 2300b6a3f8ef4aa0834847ec326730c6]# ll -a total 24 drwxr-x---. 4 root 2300b6a3f8ef4aa0834847ec326730c6 4096 Mar 31 05:08 . drwxr-x--x. 171 root root 12288 Apr 2 16:51 .. drwxr-x---. 2 root 2300b6a3f8ef4aa0834847ec326730c6 4096 Mar 31 05:08 .env d---------. 3 root root 4096 Mar 31 12:46 .tmp [root@ex-std-node1 2300b6a3f8ef4aa0834847ec326730c6]# Again, the other apps are in a similar situation. This is the only thing in the mcollective log referring to this uuid: D, [2012-03-31T05:08:39.403655 #10995] DEBUG -- : libra.rb:60:in `cartridge_do_action' cartridge_do_acti on call / request = #<MCollective::RPC::Request:0x7f03903a54b8 @action="cartridge_do", @agent="libra", @caller="cert=mcollective-public", @data= {:cartridge=>"stickshift-node", :args=> "--with-app-uuid '419a0a962d0243a980bc99892341b65f' --with-container-uuid '2300b6a3f8ef4aa0834847ec3 26730c6' -i '5185' --named 'scalephp' --with-namespace 'bmeng7s'", :action=>"app-create", :process_results=>true}, @sender="mcollect.cloud.redhat.com", @time=1333184919, @uniqid="c88b6e359f3e824eb5ea5f7dbf86c491"> D, [2012-03-31T05:08:39.403945 #10995] DEBUG -- : libra.rb:61:in `cartridge_do_action' cartridge_do_acti on validation = stickshift-node app-create --with-app-uuid '419a0a962d0243a980bc99892341b65f' --with-con tainer-uuid '2300b6a3f8ef4aa0834847ec326730c6' -i '5185' --named 'scalephp' --with-namespace 'bmeng7s' D, [2012-03-31T05:08:40.368987 #10995] DEBUG -- : libra.rb:102:in `cartridge_do_action' cartridge_do_act ion (0) ------ CART_DATA: PROXY_HOST=b5053c3948-bmeng8s.stg.rhcloud.com CART_DATA: PROXY_PORT=58906 CART_DATA: HOST=127.10.27.129 CART_DATA: PORT=8080 ------) Version-Release number of selected component (if applicable): rhc-node-0.89.2-1.el6_2.x86_64 How reproducible: Very sporadic, but since it's happening in STG with the latest release, I think the bug still exists. Steps to Reproduce: 1. unknown Actual results: failed app create leaves fragments. Expected results: nothing should be left around
Rajat, can you take a look at this and ensure we don't have a case deconfigure and destroy aren't being called.
Some more fixes with rev#04dff1e6fdb228ebea0aba2678658e8d94c5688c The app's gears were not being cleaned up properly in case mongo save failed. Keeping this open still for other issues lurking.
Found a code path where if max_gears limit is reached and the application is denied creation, it leaves gears behind. Fixed that with rev#1be5f271b378b8425702997b80e02f9ab346c464.
For patch 04dff1e6fdb228ebea0aba2678658e8d94c5688c, it has been verified in BUG 807045. For patch 1be5f271b378b8425702997b80e02f9ab346c464, verify it on devenv_stage_169, and PASS. Reproduce steps: 1). Launch instance with sprint 8 release version - devenv_stage_157 2). Create 2 apps, that mean only 1 gear is allowed to be created. 3). Try to create a scalable app that will consume 2 gears at least, that mean creation failure will happen due to max_gears is reached. 4). In instance, check /var/lib/stickshift, some gear leftover is seen.