Bug 1303084 - String values for comma_delimited_list parameters can fail stack-update
Summary: String values for comma_delimited_list parameters can fail stack-update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Crag Wolfe
QA Contact: Jiri Stransky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-29 13:56 UTC by Jiri Stransky
Modified: 2016-04-26 14:31 UTC (History)
11 users (show)

Fixed In Version: openstack-heat-5.0.1-3.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, heat would attempt to validate old properties based on the current property's definitions. Consequently, during director upgrades where a property definition changed type, the process would fail with a 'TypeError' when heat tried to validate the old property value. With this fix, heat no longer tries to validate old property values. As a result, heat can now gracefully handle property schema definitions changes by only validating new property values.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:26:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Nova 404 errors seens with last deploy attempt (155.54 KB, text/plain)
2016-02-06 06:40 UTC, Crag Wolfe
no flags Details
entire heat-engine.log with upgrade error (4.41 MB, application/x-bzip)
2016-03-01 19:30 UTC, Crag Wolfe
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1538551 0 None None None 2016-01-29 13:57:28 UTC
OpenStack gerrit 286879 0 None None None 2016-03-02 01:01:09 UTC
Red Hat Product Errata RHEA-2016:0603 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 Enhancement Advisory 2016-04-08 00:53:53 UTC

Description Jiri Stransky 2016-01-29 13:56:41 UTC
Description of problem:

When trying to upgrade overcloud from latest OSP 7 poodle to OSP 8.0 (including undercloud upgrade), the stack update can fail with errors similar to:

ValueError: resources[0]: "u'clock.redhat.com'" is not a list

This particular error is on NtpServer parameter. Providing a new value for a NtpServer parameter via an environment file doesn't fix the issue, it still errors on the old value. See linked upstream bug for more details.

There are other parameters which cause the same problem to happen on stack-update, and it seems like it's always either newly introduced comma_delimited_lists (parameters which weren't present in the stack when it was deployed) or parameters which were strings but changed to comma_delimited_lists with the update.

Will try to add more info when i reproduce those issues.

Comment 2 Jiri Stransky 2016-02-01 09:48:10 UTC
Another parameter where this happens in TripleO is NeutronTunnelIdRanges or NeutronVniRanges. On the first upgrade attempt i see:

2016-02-01 09:40:24 [overcloud-Controller-m2dyrff7g74p]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 09:40:24 [0]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 09:40:25 [Controller]: UPDATE_FAILED  resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 09:40:25 [overcloud-Compute-yaggukg6qibn]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 09:40:26 [Compute]: UPDATE_FAILED  resources.Compute: TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 09:40:27 [BlockStorage]: UPDATE_COMPLETE  state changed
2016-02-01 09:40:27 [overcloud]: UPDATE_FAILED  resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list

However this error just goes away on 2nd update attempt. When i do heat stack-show on overcloud between the attempts, i see:

"NeutronTunnelIdRanges": "[u'1:1000']",
"NeutronVniRanges": "[u'1:1000']",

so it looks like Heat has updated this to arrays by itself, and the second upgrade attempt doesn't stop on this issue. A corresponding stack trace from heat-engine.log:

2016-02-01 04:40:24.937 15707 DEBUG heat.engine.scheduler [-] Task _resource_update from Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea04] Update running step /usr/lib/python2.7/site-packages/heat/engine/
scheduler.py:214
2016-02-01 04:40:24.937 15707 DEBUG heat.engine.scheduler [-] Task _run_to_completion from ResourceGroup "Controller" [22b083a5-6b63-48fa-bf62-1e57b75b0731] Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea0
4] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:214
2016-02-01 04:40:24.974 15707 INFO heat.engine.resource [-] UPDATE: ResourceGroup "Controller" [22b083a5-6b63-48fa-bf62-1e57b75b0731] Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea04]
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource Traceback (most recent call last):
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 620, in _action_recorder
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     yield
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 949, in update
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     prop_diff])
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 309, in wrapper
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     step = next(subtask)
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 664, in action_handler_task
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     while not check(handler_data):
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 400, in check_update_complete
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     if not checker.step():
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     next(self._runner)
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 388, in _run_to_completion
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     self).check_update_complete(updater):
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 442, in check_update_complete
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     cookie=cookie)
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 372, in _check_status_complete
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource     action=action)
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource ResourceFailure: resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list
2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource

Comment 3 Crag Wolfe 2016-02-03 07:46:04 UTC
Not yet proven, but may be worth trying this patch:
https://review.openstack.org/#/c/275544/1/heat/engine/properties.py

Comment 4 Crag Wolfe 2016-02-06 06:40:01 UTC
Created attachment 1121612 [details]
Nova 404 errors seens with last deploy attempt

Comment 5 Crag Wolfe 2016-02-06 06:43:39 UTC
I was able to reproduce the issue.

Also, I tried the patch to properties.py from Comment 3 on a fresh install (ie, first attempt at "openstack overcloud deploy" in the environment) and that seemed to fix the issue.

However, I ran into another issue that I don't think is related during my last deploy attempt -- uploaded relevant details in previous comment.

Comment 6 Zane Bitter 2016-02-29 21:20:46 UTC
I'm pretty sure this is already resolved by the fix for bug 1310879. (I also left a comment to this effect on the upstream bug.)

That also explains why it doesn't fail every time. It should happen when:

* There was a previous update failure; and
* Said update failure occurred _prior_ to the resource in question being updated

Can you retest with the openstack-heat-5.0.1-2.el7ost build to confirm that it's fixed?

Comment 7 Jiri Stransky 2016-03-01 15:45:51 UTC
Retested with unmodified openstack-heat-engine-5.0.1-2.el7ost.noarch and got the error again:

2016-03-01 15:08:47 [overcloud-Controller-r62lbjjc6oda]: UPDATE_IN_PROGRESS  Stack UPDATE started
2016-03-01 15:08:47 [0]: UPDATE_IN_PROGRESS  state changed
2016-03-01 15:08:47 [0]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-03-01 15:08:48 [overcloud-BlockStorage-pq2x3sffw2if]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2016-03-01 15:08:48 [overcloud-Controller-r62lbjjc6oda]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-03-01 15:08:48 [overcloud-Compute-armrkngq7g4u]: UPDATE_IN_PROGRESS  Stack UPDATE started
2016-03-01 15:08:48 [0]: UPDATE_IN_PROGRESS  state changed
2016-03-01 15:08:48 [0]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
2016-03-01 15:08:50 [overcloud-Compute-armrkngq7g4u]: UPDATE_FAILED  TypeError: resources[0]: "u'1:1000'" is not a list
Stack overcloud UPDATE_FAILED
Heat Stack update failed.


Looks like we do need Crag's patch too. After applying the patch and restarting heat-engine, i no longer hit this error.

Comment 8 Zane Bitter 2016-03-01 17:02:49 UTC
Ah, I've just noticed that in the Launchpad bug the exception is ValueError, but here it is TypeError. We may just need to catch both. Can you paste the traceback of the exception from heat-engine.log?

Comment 9 Zane Bitter 2016-03-01 17:06:27 UTC
Actually, I'll assume it's the same traceback from comment #2. It looks like we may have a couple of different errors here, one of which should already be fixed and the other we're still hitting.

Comment 10 Zane Bitter 2016-03-01 17:11:31 UTC
Ignore comment #9; the traceback in comment #2 is from the parent stack, so we can't actually see what the proximate cause is. It's still likely to be the same issue but with TypeError instead of ValueError. Can you upload the whole section of the log file covering the update that failed?

Comment 11 Crag Wolfe 2016-03-01 19:30:54 UTC
Created attachment 1132015 [details]
entire heat-engine.log with upgrade error

Comment 12 Crag Wolfe 2016-03-01 19:38:55 UTC
Stack trace from attached log:

2016-03-01 14:12:14.662 29747 INFO heat.engine.resource [-] UPDATE: TemplateResource "0" [17065956-ece1-4717-a38b-d67a986c312c] Stack "overcloud-Controller-rnd6grj6l3t7" [e7e9a121-50fb-467e-b140-2a2ad0d22b12]
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource Traceback (most recent call last):
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 638, in _action_recorder
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     yield
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 965, in update
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     before_props)
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 512, in update_template_diff_properties
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     changed_properties_set = set(k for k in after_props if prop_changed(k))
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 512, in <genexpr>
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     changed_properties_set = set(k for k in after_props if prop_changed(k))
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 497, in prop_changed
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     before = before_props.get(key)
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib64/python2.7/_abcoll.py", line 363, in get
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     return self[key]
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 456, in __getitem__
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     return self._get_property_value(key)
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 451, in _get_property_value
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     return prop.get_value(None, validate)
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 326, in get_value
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     _value = self._get_list(value, validate)
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource   File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 296, in _get_list
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource     raise TypeError(_('"%s" is not a list') % repr(value))
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource TypeError: "u'1:1000'" is not a list
2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource

Comment 13 Crag Wolfe 2016-03-02 00:01:18 UTC
This looks like the better approach: https://review.openstack.org/#/c/286874/1 -- works for me.

Comment 16 Jiri Stransky 2016-03-04 13:02:01 UTC
openstack-heat-5.0.1-3.el7ost works for me too, no patching needed anymore, thanks!

Comment 17 Jiri Stransky 2016-03-29 11:10:34 UTC
I ran overcloud major upgrade several times with openstack-heat-5.0.1-3.el7ost and never hit this issue again. Marking verified.

Comment 18 errata-xmlrpc 2016-04-07 21:26:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0603.html


Note You need to log in before you can comment on or make changes to this bug.