Description of problem: We hit several apps in production which were leaving tmp/vendor. Version-Release number of selected component (if applicable): How reproducible: Not sure. Need to validate we do everything we can to clean up tmp/vendor when the build is complete. Technically we should be moving the contents from tmp/vendor and not copying so not sure why we hit this in the first place. Actual results: tmp/vendor exists for several gears Expected results: tmp/vendor shouldn't exist after the builds are complete Additional info: For the gears we looked at it didn't appear jenkins was involved.
As of today, tmp/vendor is cleaned up at 3 spots: in tidy() https://github.com/openshift/origin-server/blob/d1335eb6779c8e7906fe6ffb8908f788969b6e11/cartridges/openshift-origin-cartridge-ruby/bin/control#L48 in pre-repo-archive() https://github.com/openshift/origin-server/blob/d1335eb6779c8e7906fe6ffb8908f788969b6e11/cartridges/openshift-origin-cartridge-ruby/bin/control#L53 in build() https://github.com/openshift/origin-server/blob/d1335eb6779c8e7906fe6ffb8908f788969b6e11/cartridges/openshift-origin-cartridge-ruby/bin/control#L96 The only one that is involved in the build process is the last one, which only takes place if tmp/.bundle directory exists: https://github.com/openshift/origin-server/blob/d1335eb6779c8e7906fe6ffb8908f788969b6e11/cartridges/openshift-origin-cartridge-ruby/bin/control#L81 If this directory does not exist, we will not touch tmp/vendor. Is this not a correct assumption?
There may be nothing we can do. But an example of a case that's broken: User has used 995mb of their quota and their app size is 10mb (old app size before the latest push is 1mb let's say). So we: - Backup their bundles - try to git archive their git repo contents into app-root/repo - Fail because they don't have 10mb free - Never get to the build step were we remove tmp/vendor We basically need a rescue on build failures to be able to clean up tmp/vendor in this case.
In this particular example, we need to agree on a signal that the failure (disk quota reached) would send, and trap that signal on the 'control' side, and perform cleanup. This should happen on all cartridges, not just ruby. Seems to me that this is a much bigger issue than just a simple bug.
Story here: https://trello.com/card/clean-up-after-application-build-failure/50fc6bb487602f214b003f71/180