rhel-osp-director: Overcloud update from 7.2-> 7.3 fails: "resources.SwiftDevicesAndProxyConfig: Property controller_swift_proxy_memcaches_v6 not assigned" Environment: openstack-swift-container-2.3.0-2.el7ost.noarch python-swiftclient-2.4.0-1.el7ost.noarch openstack-swift-object-2.3.0-2.el7ost.noarch openstack-swift-proxy-2.3.0-2.el7ost.noarch instack-undercloud-2.1.2-37.el7ost.noarch openstack-swift-account-2.3.0-2.el7ost.noarch openstack-swift-plugin-swift3-1.7-3.el7ost.noarch openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch openstack-swift-2.3.0-2.el7ost.noarch Steps to reproduce: 1. Deploy overcloud 7.1 with director. 2. Register the nodes with Sat5 (pointing to 7.3). 3. Update the undercloud 4. Attempt to update the OC nodes. Result: After running some time and actually yum updating the nodes, the deployment fails: ... IN_PROGRESS IN_PROGRESS FAILED update finished with status FAILED Doing some debuging points to SwiftDevicesAndProxyConfig: [stack@undercloud ~]$ heat resource-list overcloud|grep -v COMPLE +-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | +-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ | SwiftDevicesAndProxyConfig | 515b477a-4261-4b10-bb53-4d9adae478eb | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | UPDATE_FAILED | 2016-01-21T21:53:39Z | +-------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+ Additional debugging shows: resources.SwiftDevicesAndProxyConfig: Property controller_swift_proxy_memcaches_v6 not assigned [stack@undercloud ~]$ heat resource-show overcloud SwiftDevicesAndProxyConfig +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | attributes | { | | | "config_id": "1f069129-cd23-4761-ac8a-769847c8163b" | | | } | | description | | | links | http://192.168.0.1:8004/v1/4484871019d94a2aa5933630632d3f47/stacks/overcloud/9799dd92-c319-4878-a605-bda9dd5ea6d0/resources/SwiftDevicesAndProxyConfig (self) | | | http://192.168.0.1:8004/v1/4484871019d94a2aa5933630632d3f47/stacks/overcloud/9799dd92-c319-4878-a605-bda9dd5ea6d0 (stack) | | | http://192.168.0.1:8004/v1/4484871019d94a2aa5933630632d3f47/stacks/overcloud-SwiftDevicesAndProxyConfig-dbbgtzmy7ewg/515b477a-4261-4b10-bb53-4d9adae478eb (nested) | | logical_resource_id | SwiftDevicesAndProxyConfig | | physical_resource_id | 515b477a-4261-4b10-bb53-4d9adae478eb | | required_by | ObjectStorageSwiftDeployment | | | ControllerSwiftDeployment | | resource_name | SwiftDevicesAndProxyConfig | | resource_status | UPDATE_FAILED | | resource_status_reason | ValueError: resources.SwiftDevicesAndProxyConfig: Property controller_swift_proxy_memcaches_v6 not assigned | | resource_type | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | | updated_time | 2016-01-21T21:53:39Z | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Expected result: completed update.
In 7.2, the resource looked like: SwiftDevicesAndProxyConfig: type: OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig properties: controller_swift_devices: {get_attr: [Controller, swift_device]} object_store_swift_devices: {get_attr: [ObjectStorage, swift_device]} controller_swift_proxy_memcaches: {get_attr: [Controller, swift_proxy_memcache]} In 7.3, we've added some *_v6 properties (which should always be set): SwiftDevicesAndProxyConfig: type: OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig properties: controller_swift_devices: {get_attr: [Controller, swift_device]} controller_swift_devices_v6: {get_attr: [Controller, swift_device_v6]} object_store_swift_devices: {get_attr: [ObjectStorage, swift_device]} object_store_swift_devices_v6: {get_attr: [ObjectStorage, swift_device_v6]} controller_swift_proxy_memcaches: {get_attr: [Controller, swift_proxy_memcache]} controller_swift_proxy_memcaches_v6: {get_attr: [Controller, swift_proxy_memcache_v6]} Also in 7.3 the *_v6 attributes are *new* outputs on the Controller resource. So I'm wondering if we're hitting some slight nuance here with the Update logic since this is failing during an update. Do we need to add an explicit depends_on: Controller on the SwiftDevicesAndProxyConfig resource? Would that ensure that Controller is updated first and provides the new outputs (assuming that's even what's triggering the error message)?
what's the original deploy command? what's the update command? can you also reproduce the issue running the update command again with --debug, capture the output and attach it?
the other thing to check is get the stack ID for the Controller ResourceGroup, then do heat stack-show <ID>
Deployment command: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --ntp-server x.x.x.x --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /home/stack/network-environment.yaml Update command: openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-vip.yaml -e network-environment.yaml -e custom.yaml
Created attachment 1117272 [details] the debug log from the update
i've confirmed that all of the outputs needed by the new SwiftDevicesAndProxyConfig were indeed on the nested Controller stacks. Since other calls to get_attr are working fine, shardy thinks that there is still something wrong in the templates.
note that it seems that this issue is not always 100% reproducable. there have been successful updates to 7.3 (using the same new templates). So, the issue seems transient. Maybe triggered by something timing related, or how Heat orders the resource updates.
(In reply to James Slagle from comment #2) > Also in 7.3 the *_v6 attributes are *new* outputs on the Controller > resource. So I'm wondering if we're hitting some slight nuance here with the > Update logic since this is failing during an update. This is a plausible class of bug, and worth investigating. The specific error message we're getting occurs when a property value is straight up not specified in the template - not just when the value is empty or something due to some problem in getting the value, but the key just plain does not exist in the template. i.e. no matter how badly get_attr fails, we shouldn't see this error because of that AFAICT. The obvious possible cause then would be a mismatch between the name of the parameter in the nested template and the name of the property specified in overcloud_without_merge_py.yaml. That doesn't appear to be the case, so we should at least suspect that Heat could be incorrectly trying to validate the old resource definition against the new nested template. Examining the traceback: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 528, in _action_recorder yield File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 794, in update before_props) File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 417, in update_template_diff_properties changed_properties_set = set(k for k in after_props File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 418, in <genexpr> if before_props.get(k) != File "/usr/lib64/python2.7/_abcoll.py", line 363, in get return self[key] File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 453, in __getitem__ return self._get_property_value(key) File "/usr/lib/python2.7/site-packages/heat/engine/properties.py", line 450, in _get_property_value raise ValueError(_('Property %s not assigned') % key) ValueError: Property controller_swift_proxy_memcaches_v6 not assigned The full statement at line 417 in update_template_diff_properties() reads: # Create a set of keys which differ (or are missing/added) changed_properties_set = set(k for k in after_props if before_props.get(k) != after_props.get(k)) so it's not actually possible to tell whether it's before_props or after_props that's failing, but it seems likely that it would be before_props and that we shouldn't attempt to read values from it for properties that were introduced in the update. A quick workaround would be to add a default value in the nested template, that would get past this check and the correct value would no doubt be used in the update. It would be interesting to know what version of Heat we're using here; openstack-heat-2015.1.2-7.el7ost fixes a bug (bug 1298811) that caused Heat to sneakily read local files when it shouldn't have, and since the client is being run from the undercloud server machine that could have been masking this bug. > Do we need to add an > explicit depends_on: Controller on the SwiftDevicesAndProxyConfig resource? No, the get_attr adds a dependency.
I can't imagine a mechanism for this to happen intermittently, unless it's just different people testing with different versions of Heat. Sasha confirmed that he found this on 2015.1.2-6. That is consistent with a possible cause that, happily, should already be fixed (by the patch for bug 1298811) in 2015.1.2-7: - When loading the *existing* stack, Heat erroneously reads the template for the nested stack from the local filesystem instead of from the DB. - Because the template has been modified on the local filesystem in preparation for the update, Heat believes that the *new* template (with extra parameters) was the definition of the old stack. - It calculates old_props using the schema thus obtained from the *new* template, combined with the property values stored in the DB. - Attempting to compare one of the new property values to the old one fails, because the property is present in the "old" (but actually new) schema, while there is no old value. Competing hypotheses don't seem convincing: - If the before_props are generated with the correct schema then we never get to the check that is failing, and the schema for that is derived from the *existing* stack right at the beginning of the update. It's hard to see this going wrong. - If it's actually after_props that is failing then it would have to not be getting the new property values from the template at all... if that could happen then I wouldn't expect any updates to ever do anything. So please retest with the latest build, because it's highly likely this is fixed already. Just reiterating, any easy workaround would be to supply all of the new parameters in the nested template (puppet/swift-devices-and-proxy-config-v6.yaml) with default values.
Please retest with 2015.1.2-7
FailedQA: Environment: openstack-heat-api-cfn-2015.1.2-7.el7ost.noarch openstack-heat-common-2015.1.2-7.el7ost.noarch openstack-heat-templates-0-0.8.20150605git.el7ost.noarch openstack-heat-engine-2015.1.2-7.el7ost.noarch openstack-heat-api-2015.1.2-7.el7ost.noarch Deployed with: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server x.x.x.x --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml Updated with: openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-vip.yaml -e network-environment.yaml [stack@instack ~]$ heat resource-list -n5 overcloud|grep -v COMPLE +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+ | SwiftDevicesAndProxyConfig | 5d292e57-c498-480d-8a49-4df0ada59147 | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | UPDATE_FAILED | 2016-02-01T23:25:08Z | | +-----------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-----------------------------------------------+ [stack@instack ~]$ heat resource-show overcloud SwiftDevicesAndProxyConfig +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Property | Value | +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | attributes | { | | | "config_id": "8a7f0821-0e15-47fe-829d-b13b73a81b06" | | | } | | description | | | links | http://192.0.2.1:8004/v1/b0f31da7e7004e7aa1b75d1a119d9001/stacks/overcloud/410fe874-1d6d-474f-830e-187bd45ea44a/resources/SwiftDevicesAndProxyConfig (self) | | | http://192.0.2.1:8004/v1/b0f31da7e7004e7aa1b75d1a119d9001/stacks/overcloud/410fe874-1d6d-474f-830e-187bd45ea44a (stack) | | | http://192.0.2.1:8004/v1/b0f31da7e7004e7aa1b75d1a119d9001/stacks/overcloud-SwiftDevicesAndProxyConfig-d24ksrub4pqy/5d292e57-c498-480d-8a49-4df0ada59147 (nested) | | logical_resource_id | SwiftDevicesAndProxyConfig | | physical_resource_id | 5d292e57-c498-480d-8a49-4df0ada59147 | | required_by | ObjectStorageSwiftDeployment | | | ControllerSwiftDeployment | | resource_name | SwiftDevicesAndProxyConfig | | resource_status | UPDATE_FAILED | | resource_status_reason | ValueError: resources.SwiftDevicesAndProxyConfig: Property controller_swift_proxy_memcaches_v6 not assigned | | resource_type | OS::TripleO::SwiftDevicesAndProxy::SoftwareConfig | | updated_time | 2016-02-01T23:25:08Z | +------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Ryan has confirmed that adding default parameter values to the templates in question is sufficient to work around the issue, so he is going to add defaults to the parameters in t-h-t in order to get this unblocked in the short term.
Thomas has established that this occurs when: * You update the stack with a child template that adds new mandatory properties * The update fails for any reason _before_ the point of updating this resource * You run another update
Verified: Environment: openstack-heat-engine-2015.1.2-9.el7ost.noarch Was able to successfully update 7.2 to 7.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-0266.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days