Description of problem: When trying to upgrade overcloud from latest OSP 7 poodle to OSP 8.0 (including undercloud upgrade), the stack update can fail with errors similar to: ValueError: resources[0]: "u'clock.redhat.com'" is not a list This particular error is on NtpServer parameter. Providing a new value for a NtpServer parameter via an environment file doesn't fix the issue, it still errors on the old value. See linked upstream bug for more details. There are other parameters which cause the same problem to happen on stack-update, and it seems like it's always either newly introduced comma_delimited_lists (parameters which weren't present in the stack when it was deployed) or parameters which were strings but changed to comma_delimited_lists with the update. Will try to add more info when i reproduce those issues.
Another parameter where this happens in TripleO is NeutronTunnelIdRanges or NeutronVniRanges. On the first upgrade attempt i see: 2016-02-01 09:40:24 [overcloud-Controller-m2dyrff7g74p]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 09:40:24 [0]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 09:40:25 [Controller]: UPDATE_FAILED resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 09:40:25 [overcloud-Compute-yaggukg6qibn]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 09:40:26 [Compute]: UPDATE_FAILED resources.Compute: TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 09:40:27 [BlockStorage]: UPDATE_COMPLETE state changed 2016-02-01 09:40:27 [overcloud]: UPDATE_FAILED resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list However this error just goes away on 2nd update attempt. When i do heat stack-show on overcloud between the attempts, i see: "NeutronTunnelIdRanges": "[u'1:1000']", "NeutronVniRanges": "[u'1:1000']", so it looks like Heat has updated this to arrays by itself, and the second upgrade attempt doesn't stop on this issue. A corresponding stack trace from heat-engine.log: 2016-02-01 04:40:24.937 15707 DEBUG heat.engine.scheduler [-] Task _resource_update from Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea04] Update running step /usr/lib/python2.7/site-packages/heat/engine/ scheduler.py:214 2016-02-01 04:40:24.937 15707 DEBUG heat.engine.scheduler [-] Task _run_to_completion from ResourceGroup "Controller" [22b083a5-6b63-48fa-bf62-1e57b75b0731] Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea0 4] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:214 2016-02-01 04:40:24.974 15707 INFO heat.engine.resource [-] UPDATE: ResourceGroup "Controller" [22b083a5-6b63-48fa-bf62-1e57b75b0731] Stack "overcloud" [58d13951-8a07-44be-8134-8b49e76aea04] 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource Traceback (most recent call last): 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 620, in _action_recorder 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource yield 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 949, in update 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource prop_diff]) 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 309, in wrapper 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource step = next(subtask) 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 664, in action_handler_task 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource while not check(handler_data): 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 400, in check_update_complete 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource if not checker.step(): 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 217, in step 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource next(self._runner) 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 388, in _run_to_completion 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource self).check_update_complete(updater): 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 442, in check_update_complete 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource cookie=cookie) 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 372, in _check_status_complete 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource action=action) 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource ResourceFailure: resources.Controller: TypeError: resources[0]: "u'1:1000'" is not a list 2016-02-01 04:40:24.974 15707 ERROR heat.engine.resource
Not yet proven, but may be worth trying this patch: https://review.openstack.org/#/c/275544/1/heat/engine/properties.py
Created attachment 1121612 [details] Nova 404 errors seens with last deploy attempt
I was able to reproduce the issue. Also, I tried the patch to properties.py from Comment 3 on a fresh install (ie, first attempt at "openstack overcloud deploy" in the environment) and that seemed to fix the issue. However, I ran into another issue that I don't think is related during my last deploy attempt -- uploaded relevant details in previous comment.
I'm pretty sure this is already resolved by the fix for bug 1310879. (I also left a comment to this effect on the upstream bug.) That also explains why it doesn't fail every time. It should happen when: * There was a previous update failure; and * Said update failure occurred _prior_ to the resource in question being updated Can you retest with the openstack-heat-5.0.1-2.el7ost build to confirm that it's fixed?
Retested with unmodified openstack-heat-engine-5.0.1-2.el7ost.noarch and got the error again: 2016-03-01 15:08:47 [overcloud-Controller-r62lbjjc6oda]: UPDATE_IN_PROGRESS Stack UPDATE started 2016-03-01 15:08:47 [0]: UPDATE_IN_PROGRESS state changed 2016-03-01 15:08:47 [0]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-03-01 15:08:48 [overcloud-BlockStorage-pq2x3sffw2if]: UPDATE_COMPLETE Stack UPDATE completed successfully 2016-03-01 15:08:48 [overcloud-Controller-r62lbjjc6oda]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-03-01 15:08:48 [overcloud-Compute-armrkngq7g4u]: UPDATE_IN_PROGRESS Stack UPDATE started 2016-03-01 15:08:48 [0]: UPDATE_IN_PROGRESS state changed 2016-03-01 15:08:48 [0]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list 2016-03-01 15:08:50 [overcloud-Compute-armrkngq7g4u]: UPDATE_FAILED TypeError: resources[0]: "u'1:1000'" is not a list Stack overcloud UPDATE_FAILED Heat Stack update failed. Looks like we do need Crag's patch too. After applying the patch and restarting heat-engine, i no longer hit this error.
Ah, I've just noticed that in the Launchpad bug the exception is ValueError, but here it is TypeError. We may just need to catch both. Can you paste the traceback of the exception from heat-engine.log?
Actually, I'll assume it's the same traceback from comment #2. It looks like we may have a couple of different errors here, one of which should already be fixed and the other we're still hitting.
Ignore comment #9; the traceback in comment #2 is from the parent stack, so we can't actually see what the proximate cause is. It's still likely to be the same issue but with TypeError instead of ValueError. Can you upload the whole section of the log file covering the update that failed?
Created attachment 1132015 [details] entire heat-engine.log with upgrade error
Stack trace from attached log: 2016-03-01 14:12:14.662 29747 INFO heat.engine.resource [-] UPDATE: TemplateResource "0" [17065956-ece1-4717-a38b-d67a986c312c] Stack "overcloud-Controller-rnd6grj6l3t7" [e7e9a121-50fb-467e-b140-2a2ad0d22b12] 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource Traceback (most recent call last): 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 638, in _action_recorder 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource yield 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 965, in update 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource before_props) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 512, in update_template_diff_properties 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource changed_properties_set = set(k for k in after_props if prop_changed(k)) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 512, in <genexpr> 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource changed_properties_set = set(k for k in after_props if prop_changed(k)) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 497, in prop_changed 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource before = before_props.get(key) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib64/python2.7/_abcoll.py", line 363, in get 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource return self[key] 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 456, in __getitem__ 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource return self._get_property_value(key) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 451, in _get_property_value 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource return prop.get_value(None, validate) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 326, in get_value 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource _value = self._get_list(value, validate) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 296, in _get_list 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource raise TypeError(_('"%s" is not a list') % repr(value)) 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource TypeError: "u'1:1000'" is not a list 2016-03-01 14:12:14.662 29747 ERROR heat.engine.resource
This looks like the better approach: https://review.openstack.org/#/c/286874/1 -- works for me.
openstack-heat-5.0.1-3.el7ost works for me too, no patching needed anymore, thanks!
I ran overcloud major upgrade several times with openstack-heat-5.0.1-3.el7ost and never hit this issue again. Marking verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0603.html