Bug 1315467 - rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852.
Summary: rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Marios Andreou
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-07 20:04 UTC by Alexander Chuzhoy
Modified: 2016-04-07 21:48 UTC (History)
7 users (show)

Fixed In Version: python-tripleoclient-0.3.4-1.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, after upgrading the undercloud, there was a missing restart of the openstack-nova-api service, which would cause upgrades of the overcloud to fail due to a timeout that would report the error "ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852". Now, the openstack-nova-api service is correctly restarted as part of the undercloud upgrade process, allowing the overcloud upgrade process to proceed without encountering this timeout issue.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:48:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nova-api.log (6.13 MB, application/x-gzip)
2016-03-07 21:11 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1558495 0 None None None 2016-03-17 11:42:03 UTC
OpenStack gerrit 293960 0 None MERGED Add a restart of openstack-nova-api after upgrading undercloud 2020-02-17 22:25:51 UTC
OpenStack gerrit 296797 0 None MERGED Add a restart of openstack-nova-api after upgrading undercloud 2020-02-17 22:25:51 UTC
Red Hat Product Errata RHEA-2016:0604 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC

Description Alexander Chuzhoy 2016-03-07 20:04:05 UTC
rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852.


Environment:
openstack-tripleo-heat-templates-kilo-0.8.9-1.el7ost.noarch
instack-undercloud-2.2.4-1.el7ost.noarch
openstack-puppet-modules-7.0.12-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.9-1.el7ost.noarch

Steps to reproduce:
1. Deploy 7.3 (3 controllers +2 computes) with network isolation. 
Deployment command: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2    --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server x.x.x.x --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml
 
2. Upgrade the undercloud to 8.0
3. Attempt to update the overcloud with:
openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network-environment.yaml -e tripleo-heat-templates/environments/major-upgrade-script-delivery.yaml


Result:
2016-03-07 15:31:50 [NodeTLSData]: UPDATE_COMPLETE  state changed                       
2016-03-07 15:31:51 [ControllerConfig]: UPDATE_IN_PROGRESS  state changed               
2016-03-07 15:31:52 [NetworkConfig]: UPDATE_COMPLETE  state changed                     
2016-03-07 15:31:52 [NodeTLSCAData]: UPDATE_IN_PROGRESS  state changed                  
2016-03-07 15:31:53 [ControllerConfig]: CREATE_IN_PROGRESS  state changed               
2016-03-07 15:31:54 [ControllerConfig]: CREATE_COMPLETE  state changed                  
2016-03-07 15:31:55 [NodeTLSCAData]: UPDATE_COMPLETE  state changed                     
2016-03-07 15:31:55 [NodeTLSData]: UPDATE_IN_PROGRESS  state changed                    
2016-03-07 15:31:55 [ControllerDeployment]: UPDATE_IN_PROGRESS  state changed           
2016-03-07 15:31:57 [NodeTLSData]: UPDATE_COMPLETE  state changed                       
2016-03-07 15:31:57 [ControllerConfig]: UPDATE_IN_PROGRESS  state changed               
2016-03-07 15:31:58 [ControllerConfig]: CREATE_IN_PROGRESS  state changed               
2016-03-07 15:31:59 [ControllerConfig]: CREATE_COMPLETE  state changed                  
2016-03-07 15:32:19 [UpdateDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment succeeded
2016-03-07 15:32:19 [UpdateDeployment]: UPDATE_COMPLETE  state changed                  
2016-03-07 15:32:20 [ControllerDeployment]: UPDATE_IN_PROGRESS  state changed       

Broadcast message from systemd-journald (Mon 2016-03-07 12:48:12 EST):                                           
haproxy[27435]: proxy ironic has no server available!                                                                                                                                                               
ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852



Checking the os-collect-config for errors - (repeating messages):
Mar 07 19:04:10 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:04:10.710 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:04:41 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:04:41.352 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:05:12 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:05:12.036 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:05:42 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:05:42.642 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error




Expected result:
Successful update of the overcloud.

Comment 2 Alexander Chuzhoy 2016-03-07 21:11:13 UTC
Created attachment 1133904 [details]
nova-api.log

Comment 3 Marios Andreou 2016-03-17 11:32:56 UTC
So I can confirm that I've hit this many many times testing the upgrades in a virt environment. The fix discussed on irc yesterday, to restart openstack-nova-api after upgrading the undercloud seems to fix it for me. I've added the restart to the tripleoclient undercloud upgrade @  https://review.openstack.org/#/c/293960/

Comment 5 Alexander Chuzhoy 2016-04-04 20:15:42 UTC
Verified:

Environment:
python-tripleoclient-0.3.4-2.el7ost.noarch

Was able to upgrade OC 7.3 to 8.0

Comment 7 errata-xmlrpc 2016-04-07 21:48:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html


Note You need to log in before you can comment on or make changes to this bug.