Bug 1371316 - Osp-director-10: Overcloud Upgrade 9 -> 10 fails during the init stage command.
Summary: Osp-director-10: Overcloud Upgrade 9 -> 10 fails during the init stage command.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Zane Bitter
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 1337794
TreeView+ depends on / blocked
 
Reported: 2016-08-29 22:06 UTC by Omri Hochman
Modified: 2016-12-29 16:54 UTC (History)
11 users (show)

Fixed In Version: openstack-heat-7.0.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-14 15:54:28 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 360122 None None None 2016-08-31 21:42:13 UTC
OpenStack gerrit 360831 None None None 2016-08-31 21:42:43 UTC
Launchpad 1616550 None None None 2016-08-31 21:34:10 UTC

Description Omri Hochman 2016-08-29 22:06:08 UTC
Osp-director-10:  Overcloud Upgrade 9 -> 10 fails during the init stage command. 


Environment: 
------------
instack-5.0.0-0.20160802165724.5aabf5c.el7ost.noarch
instack-undercloud-5.0.0-0.20160818065636.41ef775.el7ost.noarch
puppet-heat-9.1.0-0.20160815142726.d364553.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-33.el7ost.noarch
openstack-heat-api-7.0.0-0.20160822053245.7c70288.el7ost.noarch
openstack-heat-engine-7.0.0-0.20160822053245.7c70288.el7ost.noarch
python-heatclient-1.3.0-0.20160802194627.44dfe53.el7ost.noarch
openstack-tripleo-heat-templates-5.0.0-0.20160820164503.6c537d2.1.el7ost.noarch
openstack-heat-api-cfn-7.0.0-0.20160822053245.7c70288.el7ost.noarch
openstack-heat-common-7.0.0-0.20160822053245.7c70288.el7ost.noarch
openstack-heat-templates-0.0.1-0.20160802165947.051822a.el7ost.noarch
python-heat-tests-7.0.0-0.20160822053245.7c70288.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch


Steps: 
--------
(1) Finish Undercloud Upgrade successful 
(2) follow the instruction to upgrade overcloud : https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade


run the init stage command:
--------------------
#!/usr/bin/bash

. stackrc

cat > overcloud-repos.yaml <<EOF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rhos-release -P 10 -d
    # Workaround for bz-1361148
    ! [ -e /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d ] || rm /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d
EOF

openstack overcloud deploy --templates --control-scale 3 --compute-scale 1    --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml -e /home/stack/overcloud-repos.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-overcloud-compute-hostnames.yaml




2016-08-24 10:40:44 [48]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:45 [56]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:45 [49]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:45 [27]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:46 [40]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:46 [51]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:47 [12]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:47 [NodeUserData]: UPDATE_IN_PROGRESS state changed
2016-08-24 10:40:48 [34]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:48 [UpdateConfig]: UPDATE_IN_PROGRESS state changed
2016-08-24 10:40:48 [NodeAdminUserData]: UPDATE_IN_PROGRESS state changed
2016-08-24 10:40:49 [1]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:50 [15]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:51 [14]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:51 [NodeUserData]: UPDATE_COMPLETE state changed
2016-08-24 10:40:52 [32]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:52 [60]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:53 [NodeAdminUserData]: UPDATE_COMPLETE state changed
2016-08-24 10:40:53 [26]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:54 [17]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:54 [NovaCompute]: UPDATE_IN_PROGRESS state changed
2016-08-24 10:40:54 [46]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:55 [3]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:55 [24]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:56 [47]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:56 [UpdateConfig]: UPDATE_COMPLETE state changed
2016-08-24 10:40:56 [4]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:57 [5]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:57 [NovaCompute]: UPDATE_COMPLETE state changed
2016-08-24 10:40:58 [UpdateDeployment]: UPDATE_IN_PROGRESS state changed
2016-08-24 10:40:58 [31]: CREATE_IN_PROGRESS state changed
2016-08-24 10:40:58 [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
    return func(
2016-08-24 10:40:59 [overcloud-Compute-uydvzmzehkxk-0-wge43u3s7l4v]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 n
ot found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
 return func(
2016-08-24 10:40:59 [overcloud-Compute-uydvzmzehkxk-0-wge43u3s7l4v]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 n
ot found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
    return func(
2016-08-24 10:40:59 [35]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:00 [0]: UPDATE_FAILED resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped

2016-08-24 10:41:00 [overcloud-Compute-uydvzmzehkxk]: UPDATE_FAILED resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped

2016-08-24 10:41:00 [21]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:01 [0]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:01 [45]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:02 [Compute]: UPDATE_FAILED resources.Compute: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line
2016-08-24 10:41:02 [6]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:02 [9]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:02 [ControllerServiceChain]: CREATE_FAILED CREATE aborted
2016-08-24 10:41:03 [overcloud]: UPDATE_FAILED resources.Compute: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id 6df446b1-0196-47ea-b97b-aeca011bc8b0 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line
2016-08-24 10:41:03 [ServiceChain]: CREATE_FAILED CREATE aborted
2016-08-24 10:41:03 [36]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:03 [overcloud-ControllerServiceChain-626ya4wtgir2]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2016-08-24 10:41:04 [30]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:05 [44]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:05 [10]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:06 [43]: CREATE_IN_PROGRESS state changed
2016-08-24 10:41:07 [55]: CREATE_IN_PROGRESS state changed
Stack overcloud UPDATE_FAILED
Heat Stack update failed. 
[stack@undercloud72 ~]$



[stack@undercloud72 ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+---------------------+---------------------+
| id                                   | stack_name | stack_status  | creation_time       | updated_time        |
+--------------------------------------+------------+---------------+---------------------+---------------------+
| 9f5b4dec-a9f1-496c-8c55-19f09e735f65 | overcloud  | UPDATE_FAILED | 2016-08-23T17:42:27 | 2016-08-24T11:07:52 |
+--------------------------------------+------------+---------------+---------------------+---------------------+


[stack@undercloud72 ~]$ heat deployment-show 870771e7-6c54-4df8-a47e-d91e7ae41aa1
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{ 
  "status": "COMPLETE",
  "server_id": "6a5d918a-00a0-4ac2-9669-dd6e6604a15a",
  "config_id": "77833a6d-ff4e-4a09-8211-b9b83a583df3",
  "output_values": {
    "deploy_stdout": "Started yum_update.sh on server 6a5d918a-00a0-4ac2-9669-dd6e6604a15a at Mon Aug 29 02:09:05 EDT 2016\nNot running due to unset update_identifier\n",
    "deploy_stderr": "",
    "update_managed_packages": "false",
    "deploy_status_code": 0
  },
  "creation_time": "2016-08-23T17:55:17",
  "updated_time": "2016-08-23T17:57:58",
  "input_values": {
    "update_identifier": ""
  },
  "action": "CREATE",
  "status_reason": "Outputs received",
  "id": "870771e7-6c54-4df8-a47e-d91e7ae41aa1"
}

Comment 2 Omri Hochman 2016-08-29 22:09:42 UTC
I've change the init stage command according the changes in : 
https://gitlab.cee.redhat.com/sathlang/ospd-9-to-10-upgrade#controller-and-block-storage-upgrade

to: 
#!/usr/bin/bash

. stackrc

cat > overcloud-repos.yaml <<EOF
parameter_defaults:
  UpgradeInitCommand: |
    set -e
    yum localinstall -y http://rhos-release.virt.bos.redhat.com/repos/rhos-release/rhos-release-latest.noarch.rpm
    rhos-release -P 10 -d
    # Workaround for bz-1361148
    ! [ -e /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d ] || rm /usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.d
EOF

$DEPLOY -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml \
        -e /home/stack/overcloud-repos.yaml \
        -e /usr/share/openstack-tripleo-heat-templates/environments/updates/update-from-overcloud-compute-hostnames.yaml





results :
----------
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
  
2016-08-24 11:11:13 [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
    return func(
2016-08-24 11:11:14 [StorageMgmtPort]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:14 [ExternalPort]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:15 [Controller]: UPDATE_FAILED resources.Controller: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id efdb9688-0279-4173-854f-c2be1c83fe3e not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", l
2016-08-24 11:11:15 [TenantPort]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:15 [Compute]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:15 [overcloud]: UPDATE_FAILED resources.Controller: resources[0]: NotFound_Remote: resources.UpdateDeployment: Software config with id efdb9688-0279-4173-854f-c2be1c83fe3e not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", l
2016-08-24 11:11:15 [0]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:16 [overcloud-Compute-uydvzmzehkxk]: UPDATE_FAILED Operation cancelled
2016-08-24 11:11:16 [ManagementPort]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:17 [InternalApiPort]: UPDATE_FAILED UPDATE aborted
2016-08-24 11:11:17 [overcloud-Controller-mrkekqec3nea-2-37cg6jdefkv6]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 424, in wrapped
    return func(
Stack overcloud UPDATE_FAILED
Heat Stack update failed.

Comment 3 Sofer Athlan-Guyot 2016-08-30 17:23:49 UTC
Hi,

looks like the overcloud lost connectivity with undercloud:

    [UpdateDeployment]: UPDATE_FAILED NotFound_Remote: resources.UpdateDeployment: Software config with id 5a80c923-abaa-44a1-8c16-679acb1b8b49 not found

Let's see if we can reproduce this one, as we had the systemctl timeout issue and finished the undercloud upgrade manually.

Comment 4 Sofer Athlan-Guyot 2016-08-31 21:34:11 UTC
Oki, a upstream bug made its way into the latest puddle.  Here are the related upstream bug and fix.

This would be caused by too much nested stacks in tripleo which bring up corner case in heat which are not guaranty to work.  The full description in launchpad.

Comment 5 Sofer Athlan-Guyot 2016-08-31 21:42:14 UTC
Adding the first required review.

Comment 6 Omri Hochman 2016-09-02 14:49:30 UTC
Verified a temp workaround  (added to the Git doc) :

curl -o software_deployment.py \
    https://git.openstack.org/cgit/openstack/heat/plain/heat/engine/resources/openstack/heat/software_deployment.py?id=8fcebfae3c2a9e86bffb8a66f8bc84fbf4237d22

sudo cp software_deployment.py \
    /usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/software_deployment.py

sudo systemctl restart openstack-heat-engine.service

Comment 7 Marios Andreou 2016-10-13 10:25:54 UTC
moving this to POST as the related changes linked above have merged upstream for a while now -  as omri posted with comment #6 that fix worked to overcome the issue reported here

Comment 10 errata-xmlrpc 2016-12-14 15:54:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.