Bug 809251

Summary: Failed app creates sometimes leave around fragments
Product: OKD Reporter: Thomas Wiest <twiest>
Component: ContainersAssignee: Rajat Chopra <rchopra>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 2.xCC: jialiu, mpatel
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-13 18:35:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 811192    

Description Thomas Wiest 2012-04-02 21:00:53 UTC
Description of problem:
Failed app creates sometimes leave around fragments.

This morning we got some e-mail from STG that showed that rhc-last-access is failing with this error:
EXCEPTION: No such file or directory - /var/lib/stickshift/2300b6a3f8ef4aa0834847ec326730c6/.env/OPENSHIFT_GEAR_DNS

In total, there are 9 apps that are failing with this error in STG.

When we go to look at these apps, here's what we see:
* The app _does not_ have an entry in Mongo
* The app _does not_ have a ProxyPass file
* The app _has_ a user on the machine
* The app _has_ a directory under /var/lib/stickshift


Under the /var/lib/stickshift/uuid directory, it looks like this:
[root@ex-std-node1 2300b6a3f8ef4aa0834847ec326730c6]# ll -a
total 24
drwxr-x---.   4 root 2300b6a3f8ef4aa0834847ec326730c6  4096 Mar 31 05:08 .
drwxr-x--x. 171 root root                             12288 Apr  2 16:51 ..
drwxr-x---.   2 root 2300b6a3f8ef4aa0834847ec326730c6  4096 Mar 31 05:08 .env
d---------.   3 root root                              4096 Mar 31 12:46 .tmp
[root@ex-std-node1 2300b6a3f8ef4aa0834847ec326730c6]#


Again, the other apps are in a similar situation.

This is the only thing in the mcollective log referring to this uuid:

D, [2012-03-31T05:08:39.403655 #10995] DEBUG -- : libra.rb:60:in `cartridge_do_action' cartridge_do_acti
on call / request = #<MCollective::RPC::Request:0x7f03903a54b8
 @action="cartridge_do",
 @agent="libra",
 @caller="cert=mcollective-public",
 @data=
  {:cartridge=>"stickshift-node",
   :args=>
    "--with-app-uuid '419a0a962d0243a980bc99892341b65f' --with-container-uuid '2300b6a3f8ef4aa0834847ec3
26730c6' -i '5185' --named 'scalephp' --with-namespace 'bmeng7s'",
   :action=>"app-create",
   :process_results=>true},
 @sender="mcollect.cloud.redhat.com",
 @time=1333184919,
 @uniqid="c88b6e359f3e824eb5ea5f7dbf86c491">

D, [2012-03-31T05:08:39.403945 #10995] DEBUG -- : libra.rb:61:in `cartridge_do_action' cartridge_do_acti
on validation = stickshift-node app-create --with-app-uuid '419a0a962d0243a980bc99892341b65f' --with-con
tainer-uuid '2300b6a3f8ef4aa0834847ec326730c6' -i '5185' --named 'scalephp' --with-namespace 'bmeng7s'
D, [2012-03-31T05:08:40.368987 #10995] DEBUG -- : libra.rb:102:in `cartridge_do_action' cartridge_do_act
ion (0)
------
CART_DATA: PROXY_HOST=b5053c3948-bmeng8s.stg.rhcloud.com
CART_DATA: PROXY_PORT=58906
CART_DATA: HOST=127.10.27.129
CART_DATA: PORT=8080

------)



Version-Release number of selected component (if applicable):
rhc-node-0.89.2-1.el6_2.x86_64


How reproducible:
Very sporadic, but since it's happening in STG with the latest release, I think the bug still exists.


Steps to Reproduce:
1. unknown

  
Actual results:
failed app create leaves fragments.


Expected results:
nothing should be left around

Comment 1 Dan McPherson 2012-04-02 22:18:24 UTC
Rajat, can you take a look at this and ensure we don't have a case deconfigure and destroy aren't being called.

Comment 2 Rajat Chopra 2012-04-02 23:10:08 UTC
Some more fixes with rev#04dff1e6fdb228ebea0aba2678658e8d94c5688c
The app's gears were not being cleaned up properly in case mongo save failed.

Keeping this open still for other issues lurking.

Comment 3 Rajat Chopra 2012-04-11 19:17:08 UTC
Found a code path where if max_gears limit is reached and the application is denied creation, it leaves gears behind. Fixed that with rev#1be5f271b378b8425702997b80e02f9ab346c464.

Comment 4 Johnny Liu 2012-04-13 16:15:01 UTC
For patch 04dff1e6fdb228ebea0aba2678658e8d94c5688c, it has been verified in BUG 807045.

For patch 1be5f271b378b8425702997b80e02f9ab346c464, verify it on devenv_stage_169, and PASS.

Reproduce steps:
1). Launch instance with sprint 8 release version - devenv_stage_157
2). Create 2 apps, that mean only 1 gear is allowed to be created.
3). Try to create a scalable app that will consume 2 gears at least, that mean creation failure will happen due to max_gears is reached.
4). In instance, check /var/lib/stickshift, some gear leftover is seen.