Bug 1315467 - rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852.
rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
unspecified Severity unspecified
: ga
: 8.0 (Liberty)
Assigned To: Marios Andreou
Alexander Chuzhoy
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-07 15:04 EST by Alexander Chuzhoy
Modified: 2016-04-07 17:48 EDT (History)
7 users (show)

See Also:
Fixed In Version: python-tripleoclient-0.3.4-1.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, after upgrading the undercloud, there was a missing restart of the openstack-nova-api service, which would cause upgrades of the overcloud to fail due to a timeout that would report the error "ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852". Now, the openstack-nova-api service is correctly restarted as part of the undercloud upgrade process, allowing the overcloud upgrade process to proceed without encountering this timeout issue.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-04-07 17:48:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
nova-api.log (6.13 MB, application/x-gzip)
2016-03-07 16:11 EST, Alexander Chuzhoy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1558495 None None None 2016-03-17 07:42 EDT
OpenStack gerrit 293960 None None None 2016-03-17 07:40 EDT
OpenStack gerrit 296797 None None None 2016-03-28 07:02 EDT

  None (edit)
Description Alexander Chuzhoy 2016-03-07 15:04:05 EST
rhel-osp-director: 7.3->8.0 upgrade fails with ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852.


Environment:
openstack-tripleo-heat-templates-kilo-0.8.9-1.el7ost.noarch
instack-undercloud-2.2.4-1.el7ost.noarch
openstack-puppet-modules-7.0.12-1.el7ost.noarch
openstack-tripleo-heat-templates-0.8.9-1.el7ost.noarch

Steps to reproduce:
1. Deploy 7.3 (3 controllers +2 computes) with network isolation. 
Deployment command: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2    --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server x.x.x.x --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml
 
2. Upgrade the undercloud to 8.0
3. Attempt to update the overcloud with:
openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/network-isolation.yaml -e tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e network-environment.yaml -e tripleo-heat-templates/environments/major-upgrade-script-delivery.yaml


Result:
2016-03-07 15:31:50 [NodeTLSData]: UPDATE_COMPLETE  state changed                       
2016-03-07 15:31:51 [ControllerConfig]: UPDATE_IN_PROGRESS  state changed               
2016-03-07 15:31:52 [NetworkConfig]: UPDATE_COMPLETE  state changed                     
2016-03-07 15:31:52 [NodeTLSCAData]: UPDATE_IN_PROGRESS  state changed                  
2016-03-07 15:31:53 [ControllerConfig]: CREATE_IN_PROGRESS  state changed               
2016-03-07 15:31:54 [ControllerConfig]: CREATE_COMPLETE  state changed                  
2016-03-07 15:31:55 [NodeTLSCAData]: UPDATE_COMPLETE  state changed                     
2016-03-07 15:31:55 [NodeTLSData]: UPDATE_IN_PROGRESS  state changed                    
2016-03-07 15:31:55 [ControllerDeployment]: UPDATE_IN_PROGRESS  state changed           
2016-03-07 15:31:57 [NodeTLSData]: UPDATE_COMPLETE  state changed                       
2016-03-07 15:31:57 [ControllerConfig]: UPDATE_IN_PROGRESS  state changed               
2016-03-07 15:31:58 [ControllerConfig]: CREATE_IN_PROGRESS  state changed               
2016-03-07 15:31:59 [ControllerConfig]: CREATE_COMPLETE  state changed                  
2016-03-07 15:32:19 [UpdateDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment succeeded
2016-03-07 15:32:19 [UpdateDeployment]: UPDATE_COMPLETE  state changed                  
2016-03-07 15:32:20 [ControllerDeployment]: UPDATE_IN_PROGRESS  state changed       

Broadcast message from systemd-journald@instack.localdomain (Mon 2016-03-07 12:48:12 EST):                                           
haproxy[27435]: proxy ironic has no server available!                                                                                                                                                               
ERROR: Timed out waiting for a reply to message ID 84a44ca3ed724eda991ba689cc364852



Checking the os-collect-config for errors - (repeating messages):
Mar 07 19:04:10 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:04:10.710 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:04:41 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:04:41.352 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:05:12 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:05:12.036 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error
Mar 07 19:05:42 overcloud-controller-0.localdomain os-collect-config[3829]: 2016-03-07 19:05:42.642 3829 WARNING os_collect_config.ec2 [-] 500 Server Error: Internal Server Error




Expected result:
Successful update of the overcloud.
Comment 2 Alexander Chuzhoy 2016-03-07 16:11 EST
Created attachment 1133904 [details]
nova-api.log
Comment 3 Marios Andreou 2016-03-17 07:32:56 EDT
So I can confirm that I've hit this many many times testing the upgrades in a virt environment. The fix discussed on irc yesterday, to restart openstack-nova-api after upgrading the undercloud seems to fix it for me. I've added the restart to the tripleoclient undercloud upgrade @  https://review.openstack.org/#/c/293960/
Comment 5 Alexander Chuzhoy 2016-04-04 16:15:42 EDT
Verified:

Environment:
python-tripleoclient-0.3.4-2.el7ost.noarch

Was able to upgrade OC 7.3 to 8.0
Comment 7 errata-xmlrpc 2016-04-07 17:48:58 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html

Note You need to log in before you can comment on or make changes to this bug.