Description of problem: heat updates raw_template when there is no change in raw_template Looking into the Heat code it looks that Heat should update the raw_template only where there is a change in the template (update stack) . The UPDATEs does not seem to update anything - same values are used over and over again. This lead to disk explosion in case of a large templates (100K-200K for a line) on Maria DB. Version-Release number of selected component (if applicable): How reproducible: OSP5 RHEL 6 Steps to Reproduce: 1. 2. 3. Actual results: when there are no changes in raw_teplates , stack updates . Expected results: Additional info: Its suspected that logic code is not functioning correctly and sending some extra updates. This is from the code Heat that we suspect sending extra updaes: ----------------------- def raw_template_update(context, template_id, values): raw_template_ref = raw_template_get(context, template_id) # get only the changed values values = dict((k, v) for k, v in values.items() if getattr(raw_template_ref, k) != v) if values: raw_template_ref.update_and_save(values) return raw_template_ref ----------------------- Reference Logs: http://10.65.231.4/sysreports/1554147/01504448/ 10-heat-api.log.gz 10-MariaDB.txt.gz 20-heat-engine.log.gz
I looked through the MariaDB log and confirmed that it appears to be rewriting the raw_template table with the exact same template and files that it already contained (i.e. the SET and WHERE sections are identical except for the modification time). I'm not sure yet why this is happening - the code you pasted above is intended to check for this possibility and avoid unnecessary updates. The most likely reason is that the template deserialised from the DB compares differently to the one in memory, even though they both serialise to the same JSON representation - but I can't see how that could occur (the obvious one - unicode vs str keys and values - compares correctly). The way to check that would be to compare the serialised versions instead: # get only the changed values values = dict((k, v) for k, v in values.items() if json.loads(getattr(raw_template_ref, k)) != json.loads(v))
I've reproduced locally and have filed an upstream bug.
Please ignore comment 10.
Created attachment 1072014 [details] raw_template_update patch I haven't reproduced locally yet, but can you please apply the attached patch then restart heat-engine and confirm if the updates are still occurring? I do have a local unit test which confirms that this patch doesn't cause a regression, but it sounds like you've reproduced in a test environment anyway.
Hi Steve , Also wanted to confirm if this patch applies for juno too ? The file looks similar . Regards, Jaison R
Created attachment 1072461 [details] Make ResourceDefinition round-trip stable to avoid extra writes The part of a ResourceDefinition that lists explicit dependencies was not round-trip stable. As a result, when we copied a new resource definition into the existing template during a stack update, we would end up rewriting the template unnecesarily (i.e. even though we check for changes) every time if depends_on was not specified in the resource originally. At the end of each update, we write the new template to the DB in its entirety, which removes these extra lines again, ensuring that we will experience the same problem on every update. This was causing a *lot* of unnecessary writes. This change ensures that the definition remains stable across a round-trip, so that no unnecessary changes appear in the template.
Created attachment 1073870 [details] Log all calls to raw_template_update() I'm attaching a patch to log all calls to raw_template_update(). I still can't reproduce this issue locally, but if we try it on the system where we are experiencing the problem, this should either give us a good idea of what the cause is if it's in this function or it will rule out this function as the source.
Created attachment 1073871 [details] Work with copies of the DB contents My current best guess is that the problem is *not* caused by raw_template_update(). Most likely we are not calling update_and_save() on the DB object, but rather making some innocuous-seeming change to the DB object itself and committing it as part of some other transaction. (I don't see the larger transaction in the MariaDB logs, but then I don't see *any* operations other than writes to the raw_template table in the logs - not even reads.) In particular, I think we wrote the code with the assumption that the files dict is immutable, but in some cases (a TemplateResource where the the template is not available and we have to fetch it by URL) we do actually update that dict. The templates being used are full of TemplateResources. I've attached a completely untested patch that ensures that the Template class works only with copies of the template data, and not the ones retrieved directly from the DB proxy object. Steve, please have a play around with this and check that it doesn't crash and burn horribly, and see if you can come up with a reproducer.
I believe I can make the following assertions about this bug now: - an UPDATE raw_template is triggered on heat resource-list for templates with template resources (likely for other calls too) - It is the files dict being modified which triggers the update (not the template dict) - It is caused by template data being written to the dict whether it has changed or not [1], which causes an UPDATE because the Json type is backed by a MutableDict [2] - This affects Juno and Kilo but not Liberty because we no longer use Mutable sqlalchemy types [3] I think the appropriate course would be to fix [1] in Juno, Kilo and Liberty. I will attach a WIP patch for Juno which solves the problem in my environment. [1] https://github.com/openstack/heat/blob/master/heat/engine/resources/template_resource.py#L206 [2] https://github.com/openstack/heat/blob/stable/juno/heat/db/sqlalchemy/types.py#L47 [3] https://github.com/openstack/heat/commit/27bdb8ce794bfc80b4d15043b2ff37bd2e52b332
Created attachment 1074247 [details] Only write to files dict if template data has changed
tested it better, thanks therve
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1900.html