Bug 1038634

Summary: Scaled app, deployments retained over configured limit on a single gear
Product: OpenShift Online Reporter: David Boyer <dave>
Component: ImageAssignee: Andy Goldstein <agoldste>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.xCC: agoldste, bmeng, chunchen, dave
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1038753 (view as bug list) Environment:
Last Closed: 2014-01-30 00:52:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1038753    

Description David Boyer 2013-12-05 14:29:43 UTC
Description of problem:
On a scaled application with 3 gears (2 NodeJS, 1 MongoDB), the number of deployments to keep is set to 1.  However, 1 of the 2 NodeJS gears is retaining multiple deployments.  Eventually this leads the app to reach the quota file limit (40,000), preventing further deployments.

So far it is always the same gear retaining deployments, the other is correctly retaining a single one.

Version-Release number of selected component (if applicable):
rhc 1.16.9

How reproducible:
Unsure how easily reproducible this is.  Steps below show my theory as to what I believe caused it.

Steps to Reproduce:
1. Create a scaled application.
2. Set scaling settings to 1 gear minimum.
3. Set configuration to retain 1 deployment.
4. Set scaling settings to 2 gears minimum.
5. Make some deployments, check "2nd" gear's "app-deployment" folder.

Actual results:
Contains multiple deployments on the 2nd gear.

Expected results:
Should only contain 1 deployment on both gears.

Additional info:
Currently resolving by ssh'ing to the gear at fault, changing to the app deployment folder and running "rm -rf 2*" to remove files.

Comment 1 Andy Goldstein 2013-12-05 14:51:41 UTC
David, are you saying that app-deployments has more than 1 directory named 2013-…?

Comment 2 David Boyer 2013-12-05 14:53:38 UTC
That's right. But only in my 2nd gear.

(In reply to Andy Goldstein from comment #1)
> David, are you saying that app-deployments has more than 1 directory named
> 2013-…?

Comment 3 Andy Goldstein 2013-12-05 15:10:16 UTC
Thanks David, I have been able to reproduce what you're seeing. I'm investigating the root cause now. Thanks for letting us know about the issue.

Comment 4 David Boyer 2013-12-05 15:12:40 UTC
You're more than welcome and I'm kind of glad it wasn't something I did ;-) Good luck with the bug hunt.

Comment 5 Andy Goldstein 2013-12-05 16:06:47 UTC
Root cause identified; working on a fix now.

Comment 6 Andy Goldstein 2013-12-05 16:49:34 UTC
https://github.com/openshift/origin-server/pull/4290

Comment 7 openshift-github-bot 2013-12-05 21:10:47 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/754840e2b9dd32a1e30e05392af3cfbb195e3a8a
Prune from child gear app-deployments dir

Modify distribute implementation so it keeps the entire app-deployments
dir (excluding the "current" symlink) in sync between the proxy gear
and the child gears. Previously, each new deployment would create a new
deployment dir on the children in app-deployments without pruning older
deployment dirs, eventually filling up a gear's quota.

Bug 1038634

Comment 8 chunchen 2013-12-06 05:25:41 UTC
It's fixed, verified on devenv_4102.