Bug 1289287

Summary: cron job to clean out heat.raw_templates
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: openstack-puppet-modulesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: augol, ealcaniz, emacchi, ggillies, hbrock, jcoufal, jguiditt, jjoyce, jschluet, mburns, mlopes, ochalups, rhel-osp-director-maint, sbaker, shardy, srevivo, vcojot, zbitter
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1313392 1313403 (view as bug list) Environment:
Last Closed: 2017-06-20 12:25:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1313403, 1313405, 1339488    

Description Dan Yocum 2015-12-07 20:16:35 UTC
Description of problem:

heat.raw_template grows w/o bound when constantly redploying the cloud esp. when keystone.token grows w/o bound causing deployments to fail!

Version-Release number of selected component (if applicable):
7.1


How reproducible:

every

Steps to Reproduce:
1. deploy overcloud
2. delete overcloud
3. deploy overcloud
4. wash, rinse, repeat

Actual results:

heat.raw_template has grown to 7.7GB

Expected results:

heat.raw_template should get purged of old templates.


Additional info:

Comment 2 Dan Yocum 2015-12-07 20:18:53 UTC
Specifically, the command to run is:

heat-manage purge_deleted

Comment 3 Steve Baker 2015-12-07 21:32:52 UTC
This cron job is something which the heat puppet module needs to configure, just like the keystone-manage token_flush cron job (which is also missing I think)

Comment 4 Steve Baker 2015-12-12 00:01:13 UTC
I'm moving this back to openstack-heat so that the cron job can be added in packaging.

Keystone's token flush cron needs to be created in the puppet module instead of packaging because knowledge of the configured token type is needed. No such config-time information is needed for the heat purged_deleted cron, so it can be created by the package.

Comment 5 Zane Bitter 2016-01-06 21:58:59 UTC
No way should a cron job be set up by the package; in general it is totally up to the operator to decide if/when/how they should purge deleted stacks from the DB. For a start, you only want to run this on one machine, not every machine that Heat is installed on.

If we're talking about this problem purely in the undercloud context then perhaps this should be assigned to instack-undercloud.

Comment 6 Steve Baker 2016-01-13 21:11:20 UTC
It looks like the best place for setting up a heat-manage purge_deleted cron job is in the heat puppet modules. Both the overcloud and undercloud heat needs this so that the db tables don't blow out.

Comment 7 Emilien Macchi 2016-01-14 21:58:10 UTC
I'm going to write Puppet code for that.

heat-manage purge_deleted [-g {days,hours,minutes,seconds}] [age]

What are the defaults you want to see?

Comment 8 Dan Yocum 2016-01-14 22:17:39 UTC
I think 1 day is sufficient.

Comment 9 Graeme Gillies 2016-01-15 00:32:30 UTC
I think 1 day for the undercloud is more than fine, for the overcloud it might want to be longer (30 days maybe?)

Comment 10 Zane Bitter 2016-02-01 13:31:15 UTC
Agree with Graeme, I think 30 days seems about right for the overcloud, but for the undercloud we probably want to clean them out a lot quicker so probably one or two days would be good there.

Comment 11 Steve Baker 2016-02-11 23:22:07 UTC
Yes, default to 30 days in the puppet module, and we'll set it to 1 on the undercloud

Comment 12 Steve Baker 2016-02-12 00:44:28 UTC
Emilien, could you please confirm that this change is what is required to install the cron job on the underdloud?

https://review.openstack.org/#/c/279338/

Comment 13 Steve Baker 2016-02-12 01:09:34 UTC
And here is the corresponding overcloud change https://review.openstack.org/#/c/279342/

Comment 14 Graeme Gillies 2016-02-22 00:36:07 UTC
It looks like this missed Director 7.3 can we please get this prioritised to make it into 7.4?

Regards,

Graeme

Comment 15 Emilien Macchi 2016-02-29 21:36:32 UTC
Steve, Graeme, to have the patches in OSP8, they'll have to be backport to stable/liberty and then rebased in the product.

Here is a patch for OPM backport: https://review.openstack.org/#/c/286290

I'll let you manage the TripleO patches.

Comment 16 Jason Guiditta 2016-03-01 14:46:04 UTC
Moving this to assigned, as the current patches are for master and liberty/OSP8.  I have clone the bug to OSP 8 to cover those, as well as split out the instack-undercloud portion so the fixes there can be tracked as well (for kilo7 and liberty/8), resulting in a total of 4 bugs for this issue

Comment 17 Jason Guiditta 2016-03-02 13:59:33 UTC
As the other half of this bug was closed deffered to osp8, I think this should be as well.  Mike, do you agree?

Comment 18 Zane Bitter 2016-03-02 14:21:03 UTC
Not necessarily; it's still useful to have a cron job clearing out deleted stacks older than 30 days, especially in the overcloud. The fact that we're not reducing the age to 1 day in the undercloud is neither here nor there.

Comment 19 Mike Burns 2016-03-02 14:59:18 UTC
Agree with Zane, this isn't dependent on the undercloud part being done and has value by itself.  Whether we actually fix it, though is a question for PM.

Comment 22 Edu Alcaniz 2016-06-06 06:53:34 UTC
Could you update this BZ please. 

Thanks
Edu Alcaniz

Comment 23 Emilien Macchi 2016-06-06 12:34:25 UTC
we need info from PM

Comment 24 Jason Guiditta 2016-06-06 13:35:14 UTC
Agreed, though at this point, I will say I have been told in general only major security issues are to be considered, as all changes are risky so far into lifecycle.

Comment 25 Dan Yocum 2016-06-06 14:18:29 UTC
Has it been addressed in v10?  If not, then update the version for this BZ.  If it has, then close this BZ since this should be documented in the Director Install and Config guide in the Tuning section.

Comment 27 Dan Yocum 2016-06-07 17:03:07 UTC
Is this addressed in v10?  If so, close it for v7 and move on.

When the heat.raw_templates table grows too large, sql queries time out, which cascades to rabbitmq queries timing out, which prevents triple-o from completing the provisioning successfully.

I'd say to just move on as long as the problem is documented in the tuning section of the guide (iirc, it is).

Comment 28 Steve Baker 2016-06-07 20:56:33 UTC
This should be fixed for v8 on both the undercloud and the overcloud. The only caveat for that would be an undercloud which is upgraded from v7, in that case they will need to still follow the documented workaround to manually create the cron entry.

Comment 29 Amit Ugol 2016-06-22 06:15:04 UTC
Hi Steve,
I see no cronjobs. I looked at crontab -e as root and as stack on all nodes and in the undercloud and its not there nor in /etc/cron.*

Comment 30 Steve Baker 2016-06-22 23:16:42 UTC
Is this with a fresh install of RHOS-8.0?

Comment 31 Steve Baker 2016-06-23 03:23:23 UTC
These changes need to be on the undercloud for this to work.
https://review.openstack.org/#/c/279338/
https://review.openstack.org/#/c/286290/

Comment 32 Amit Ugol 2016-07-26 04:56:15 UTC
this bug is about creating a cron job and it is created where needed. to actually test that the job is doing what its told is a different issue.

Comment 38 errata-xmlrpc 2017-06-20 12:25:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1538