Bug 1309738

Summary: "ERROR: openstack ERROR: 504 Gateway Time-out The server didn't respond in time" thrown in stdout while the update actually keeps working
Product: Red Hat OpenStack Reporter: Dan Yasny <dyasny>
Component: rhosp-directorAssignee: Angus Thomas <athomas>
Status: CLOSED CURRENTRELEASE QA Contact: Shai Revivo <srevivo>
Severity: low Docs Contact:
Priority: low    
Version: 7.0 (Kilo)CC: dbecker, jcoufal, kbasil, mburns, morazi, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-04 19:38:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Yasny 2016-02-18 15:12:20 UTC
Description of problem:
Upgrading from 7.0 (reproduced with 7.1 as well) to 7.3-GOLD, under a overcomitted virt environment, produced a failure in stdout:

ERROR: openstack ERROR: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.

Checking heat, shows 
+--------------------------------------+------------+--------------------+----------------------+
| id                                   | stack_name | stack_status       | creation_time        |
+--------------------------------------+------------+--------------------+----------------------+
| 132ffafa-12a5-4bea-8d3d-887dc6138b3c | overcloud  | UPDATE_IN_PROGRESS | 2016-02-17T16:34:40Z |
+--------------------------------------+------------+--------------------+----------------------+

and a few hours later, it actually succeeds. 

+--------------------------------------+------------+-----------------+----------------------+
| id                                   | stack_name | stack_status    | creation_time        |
+--------------------------------------+------------+-----------------+----------------------+
| 132ffafa-12a5-4bea-8d3d-887dc6138b3c | overcloud  | UPDATE_COMPLETE | 2016-02-17T16:34:40Z |
+--------------------------------------+------------+-----------------+----------------------+


Version-Release number of selected component (if applicable):
[stack@instack ~]$ rpm -qa |grep tripleo
openstack-tripleo-image-elements-0.9.6-10.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-6.git49b57eb.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-5.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-121.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch
python-openstackclient-1.0.3-3.el7ost.noarch
python-cinderclient-1.2.1-1.el7ost.noarch
python-swiftclient-2.4.0-1.el7ost.noarch
python-neutronclient-2.4.0-2.el7ost.noarch
python-novaclient-2.23.0-2.el7ost.noarch
python-ceilometerclient-1.0.14-1.el7ost.noarch
python-heatclient-0.6.0-1.el7ost.noarch
python-glanceclient-0.17.3-2.el7ost.noarch
python-ncclient-0.4.2-2.el7ost.noarch
python-saharaclient-0.9.0-1.el7ost.noarch
python-troveclient-1.0.9-1.el7ost.noarch
python-keystoneclient-1.3.0-2.el7ost.noarch
python-ironicclient-0.5.1-12.el7ost.noarch
python-tuskarclient-0.1.18-5.el7ost.noarch
python-rdomanager-oscplugin-0.0.10-28.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. deploy a large amount of nodes in a weak environment (I used 3 controllers, 2 computes, 1 swift, 1 ceph and 1 cinder)
2. run upgrade to 7.3


Actual results:
ERROR: openstack ERROR: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.


Expected results:
Keep monitoring heat status and fail when stack update fails

Additional info:

Comment 2 Mike Burns 2016-04-07 21:11:06 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.