Osp-director-10: Overcloud Upgrade 9 -> 10 fails during the init stage command. Environment: ------------ instack-5.0.0-0.20160802165724.5aabf5c.el7ost.noarch instack-undercloud-5.0.0-0.20160818065636.41ef775.el7ost.noarch puppet-heat-9.1.0-0.20160815142726.d364553.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-33.el7ost.noarch openstack-heat-api-7.0.0-0.20160822053245.7c70288.el7ost.noarch openstack-heat-engine-7.0.0-0.20160822053245.7c70288.el7ost.noarch python-heatclient-1.3.0-0.20160802194627.44dfe53.el7ost.noarch openstack-tripleo-heat-templates-5.0.0-0.20160820164503.6c537d2.1.el7ost.noarch openstack-heat-api-cfn-7.0.0-0.20160822053245.7c70288.el7ost.noarch openstack-heat-common-7.0.0-0.20160822053245.7c70288.el7ost.noarch openstack-heat-templates-0.0.1-0.20160802165947.051822a.el7ost.noarch python-heat-tests-7.0.0-0.20160822053245.7c70288.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch Steps: -------- (1) Finish Undercloud Upgrade successful (2) follow the instruction to upgrade overcloud : https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade run the init stage command: -------------------- #!/usr/bin/bash . stackrc cat > overcloud-repos.yaml <<EOF parameter_defaults: UpgradeInitCommand: | set -e yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm rhos-release -P 10 -d # Workaround for bz-1361148 ! [ -e /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d ] || rm /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d EOF openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e /home/stack/overcloud-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-overcloud-compute-hostnames.yaml 2016-08-24 10:40:44 [48]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:45 [56]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:45 [49]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:45 [27]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:46 [40]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:46 [51]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:47 [12]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:47 [NodeUserData]: UPDATE_IN_PROGRESS state changed 2016-08-24 10:40:48 [34]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:48 [UpdateConfig]: UPDATE_IN_PROGRESS state changed 2016-08-24 10:40:48 [NodeAdminUserData]: UPDATE_IN_PROGRESS state changed 2016-08-24 10:40:49 [1]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:50 [15]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:51 [14]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:51 [NodeUserData]: UPDATE_COMPLETE state changed 2016-08-24 10:40:52 [32]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:52 [60]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:53 [NodeAdminUserData]: UPDATE_COMPLETE state changed 2016-08-24 10:40:53 [26]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:54 [17]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:54 [NovaCompute]: UPDATE_IN_PROGRESS state changed 2016-08-24 10:40:54 [46]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:55 [3]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:55 [24]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:56 [47]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:56 [UpdateConfig]: UPDATE_COMPLETE state changed 2016-08-24 10:40:56 [4]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:57 [5]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:57 [NovaCompute]: UPDATE_COMPLETE state changed 2016-08-24 10:40:58 [UpdateDeployment]: UPDATE_IN_PROGRESS state changed 2016-08-24 10:40:58 [31]: CREATE_IN_PROGRESS state changed 2016-08-24 10:40:58 [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped return func( 2016-08-24 10:40:59 [overcloud-Compute-uydvzmzehkxk-0-wge43u3s7l4v]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 n ot found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped return func( 2016-08-24 10:40:59 [overcloud-Compute-uydvzmzehkxk-0-wge43u3s7l4v]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 n ot found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped return func( 2016-08-24 10:40:59 [35]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:00 [0]: UPDATE_FAILED resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped 2016-08-24 10:41:00 [overcloud-Compute-uydvzmzehkxk]: UPDATE_FAILED resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped 2016-08-24 10:41:00 [21]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:01 [0]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:01 [45]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:02 [Compute]: UPDATE_FAILED resources.Compute: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 2016-08-24 10:41:02 [6]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:02 [9]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:02 [ControllerServiceChain]: CREATE_FAILED CREATE aborted 2016-08-24 10:41:03 [overcloud]: UPDATE_FAILED resources.Compute: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 2016-08-24 10:41:03 [ServiceChain]: CREATE_FAILED CREATE aborted 2016-08-24 10:41:03 [36]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:03 [overcloud-ControllerServiceChain-626ya4wtgir2]: CREATE_FAILED Resource CREATE failed: Operation cancelled 2016-08-24 10:41:04 [30]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:05 [44]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:05 [10]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:06 [43]: CREATE_IN_PROGRESS state changed 2016-08-24 10:41:07 [55]: CREATE_IN_PROGRESS state changed Stack overcloud UPDATE_FAILED Heat Stack update failed. [stack@undercloud72 ~]$ [stack@undercloud72 ~]$ heat stack-list WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead +--------------------------------------+------------+---------------+---------------------+---------------------+ | id | stack_name | stack_status | creation_time | updated_time | +--------------------------------------+------------+---------------+---------------------+---------------------+ | 9f5b4dec-a9f1-496c-8c55-19f09e735f65 | overcloud | UPDATE_FAILED | 2016-08-23T17:42:27 | 2016-08-24T11:07:52 | +--------------------------------------+------------+---------------+---------------------+---------------------+ [stack@undercloud72 ~]$ heat deployment-show 870771e7-6c54-4df8-a47e-d91e7ae41aa1 WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead { "status": "COMPLETE", "server_id": "6a5d918a-00a0-4ac2-9669-dd6e6604a15a", "config_id": "77833a6d-ff4e-4a09-8211-b9b83a583df3", "output_values": { "deploy_stdout": "Started yum_update.sh on server 6a5d918a-00a0-4ac2-9669-dd6e6604a15a at Mon Aug 29 02:09:05 EDT 2016\nNot running due to unset update_identifier\n", "deploy_stderr": "", "update_managed_packages": "false", "deploy_status_code": 0 }, "creation_time": "2016-08-23T17:55:17", "updated_time": "2016-08-23T17:57:58", "input_values": { "update_identifier": "" }, "action": "CREATE", "status_reason": "Outputs received", "id": "870771e7-6c54-4df8-a47e-d91e7ae41aa1" }
I've change the init stage command according the changes in : https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade to: #!/usr/bin/bash . stackrc cat > overcloud-repos.yaml <<EOF parameter_defaults: UpgradeInitCommand: | set -e yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm rhos-release -P 10 -d # Workaround for bz-1361148 ! [ -e /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d ] || rm /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d EOF $DEPLOY -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml \ -e /home/stack/overcloud-repos.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-overcloud-compute-hostnames.yaml results : ---------- Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped 2016-08-24 11:11:13 [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped return func( 2016-08-24 11:11:14 [StorageMgmtPort]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:14 [ExternalPort]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:15 [Controller]: UPDATE_FAILED resources.Controller: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id efdb9688-0279-4173-854f-c2be1c83fe3e not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", l 2016-08-24 11:11:15 [TenantPort]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:15 [Compute]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:15 [overcloud]: UPDATE_FAILED resources.Controller: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id efdb9688-0279-4173-854f-c2be1c83fe3e not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", l 2016-08-24 11:11:15 [0]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:16 [overcloud-Compute-uydvzmzehkxk]: UPDATE_FAILED Operation cancelled 2016-08-24 11:11:16 [ManagementPort]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:17 [InternalApiPort]: UPDATE_FAILED UPDATE aborted 2016-08-24 11:11:17 [overcloud-Controller-mrkekqec3nea-2-37cg6jdefkv6]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped return func( Stack overcloud UPDATE_FAILED Heat Stack update failed.
Hi, looks like the overcloud lost connectivity with undercloud: [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found Let's see if we can reproduce this one, as we had the systemctl timeout issue and finished the undercloud upgrade manually.
Oki, a upstream bug made its way into the latest puddle. Here are the related upstream bug and fix. This would be caused by too much nested stacks in tripleo which bring up corner case in heat which are not guaranty to work. The full description in launchpad.
Adding the first required review.
Verified a temp workaround (added to the Git doc) : curl -o software_deployment.py \ https://git.openstack.org/cgit/openstack/heat/plain/heat/engine/resources/openstack/heat/software_deployment.py?id=8fcebfae3c2a9e86bffb8a66f8bc84fbf4237d22 sudo cp software_deployment.py \ /usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/software_deployment.py sudo systemctl restart openstack-heat-engine.service
moving this to POST as the related changes linked above have merged upstream for a while now - as omri posted with comment #6 that fix worked to overcome the issue reported here
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html