Bug 1261512

Summary: heat stack create error "Data too long for column 'resource_properties' at row 1"
Product: Red Hat OpenStack Reporter: Marios Andreou <mandreou>
Component: openstack-heatAssignee: Zane Bitter <zbitter>
Status: CLOSED ERRATA QA Contact: Amit Ugol <augol>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: ddomingo, flfuchs, jcoufal, mandreou, mburns, ohochman, rhel-osp-director-maint, rhosp-bugs-internal, sbaker, shardy, yeylon
Target Milestone: z2Keywords: ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-2015.1.1-2.el7ost Doc Type: Bug Fix
Doc Text:
On each resource event, the Orchestration service stores the current properties of the resource to enable debugging and auditing of the stack's history. However, the Orchestration service can only store 64kB (encoded) of such information. As such, if a stack contained a resource with particularly large properties (for example, a SoftwareConfig with a large script), the Orchestration service could not record events for the resource in questions, thereby preventing the stack from being created. With this release, properties that are too large to store in the event table are elided. This, in turn, allows users to create stacks regardless of the size of each resources' properties.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-08 12:21:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log entries from heat-engine.log containing 'oslo_db' none

Description Marios Andreou 2015-09-09 13:46:54 UTC
Description of problem:

our overcloud_controller_pacemaker.pp manifest [1] has grown beyond 65535 chars, resulting in 

 DEBUG oslo_db.sqlalchemy.session [req-a797cf52-99e4-4a85-8af0-db44167ead0b - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/session.py:513\n2015-09-08 21:09:39.788 32609 ERROR oslo_messaging._drivers.impl_rabbit [-] Failed to consume message from queue: \n2015-09-08 21:09:40.778 2133 DEBUG oslo_db.sqlalchemy.session [req-1a71f3cf-993a-45b2-89db-96bf22664aca - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/session.py:513\n2015-09-08 22:00:37.468 2133 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (_mysql_exceptions.DataError) (1406, \"Data too long for column 'resource_properties' at row 1\") [SQL: u'INSERT INTO event (created_at, updated_at, stack_id, uuid, resource_action, resource_status, resource_name, physical_resource_id, resource_status_reason, resource_type, resource_properties) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (datetime.datetime(2015, 9, 9, 2, 0, 37, 466247), None, '8526154b-6ca6-424a-af5a-c91429707932', '17be30fd-e595-4dfa-9f3d-7bf2ada982c5', 'CREATE', 'IN_PROGRESS', 'ControllerPuppetConfigImpl', None, 'state changed', 'OS::Heat::SoftwareConfig', '\\x80\\x02}q\\x01(U\\x06inputsq\\x02NU\\x06configq\\x03X\\x17\\x05\\x01\\x00# Copyright 2015 Red Hat, Inc.\\n# All Rights Reserved.\\n#\\n# 

(the rest of the trace is the continuation of [1] in its entirety)

It seems heat is trying to store all of that as the resource_properties of an event like at [3]. Looking a bit closer, this column is type PickleType [4][5]. I am not sure what we can do about this type? Any ideas?

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/manifests/overcloud_controller_pacemaker.pp
[2] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/controller-config-pacemaker.yaml
[3] https://github.com/openstack/heat/blob/e4264917fa2f457c159415e5ea8e29411c5812a7/heat/engine/event.py#L42
[4] https://github.com/openstack/heat/blob/master/heat/db/sqlalchemy/models.py#L242
[5] http://docs.sqlalchemy.org/en/latest/core/type_basics.html?highlight=pickletype#sqlalchemy.types.PickleType
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Marios Andreou 2015-09-09 13:47:44 UTC
upstream bug report at https://bugs.launchpad.net/heat/+bug/1493858

Comment 5 Zane Bitter 2015-09-14 18:32:41 UTC
We have basically agreed on a medium-term upstream fix: https://review.openstack.org/#/c/222352/

Comment 9 Amit Ugol 2015-09-17 05:04:16 UTC
Care to add steps to reproduce please?

Comment 10 Marios Andreou 2015-09-17 06:17:40 UTC
Hi Amit, this was happening on all deploys until the fix landed (i.e. didn't need to do anything to reproduce, just deploy). Looks like the band aid landed already (openstack-heat-2015.1.1-2.el7ost fixed in version above, not sure if that is correct) so may not be able to reproduce anymore,

thanks, marios

Comment 11 Zane Bitter 2015-09-17 12:09:14 UTC
I reproduced this by just generating a template with a huge property and trying to create a stack with it. This Python code should create something you can test with:

templ_data = {
    'heat_template_version': '2013-05-23',
    'resources': {
        'huge': {
            'type': 'OS::Heat::SoftwareConfig',
            'properties': {'config': 'A' * (1 << 16)},
        }
    }
}
with open('huge.yaml', 'w') as op:
    import yaml
    op.write(yaml.safe_dump(templ_data))

Comment 12 Zane Bitter 2015-09-17 15:12:55 UTC
The original fix is replaced with the upstream version (which makes it easier for clients to detect the missing data) in openstack-heat-2015.1.1-4.el7ost

Comment 13 Amit Ugol 2015-09-20 11:53:15 UTC
above example with 16b works well. 1MB is the limit and above it the user gets a JSON size limit as expected.

Comment 14 Florian Fuchs 2015-09-24 13:25:30 UTC
I still found some occurrences of the "Data too long for column 'resource_properties' at row 1" error in a current undercloud  heat log. (HA deployments from ui are failing on that undercloud).

See the attached file. ("grep 'oslo_db' /var/log/heat/heat-engine.log > heat-engine-oslo_db.log")

The heat packages currently installed: 

openstack-heat-engine-2015.1.1-4.el7ost.noarch
openstack-heat-api-2015.1.1-4.el7ost.noarch

Comment 15 Florian Fuchs 2015-09-24 13:26:36 UTC
Created attachment 1076549 [details]
Log entries from heat-engine.log containing 'oslo_db'

Comment 16 Zane Bitter 2015-09-24 15:45:59 UTC
We now handle this issue by catching the exception and retrying, so it's possible that something is just logging the exception even though it is handled appropriately. So I wouldn't necessarily read too much into the presence of those log entried.

Comment 18 errata-xmlrpc 2015-10-08 12:21:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1865

Comment 19 Zane Bitter 2015-12-08 14:08:42 UTC
Note to avoid confusion: the log message described here *always* appears in the logs for TripleO. If you see it, that does *not* mean that the bug is still a problem. We now catch and handle the exception appropriately (by storing the event but omitting the huge data), but not before it is logged.