Bug 1313674 - Failing to update the existing overcloud adding more compute nodes. [NEEDINFO]
Failing to update the existing overcloud adding more compute nodes.
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
7.0 (Kilo)
x86_64 Unspecified
unspecified Severity high
: rc
: 10.0 (Newton)
Assigned To: Jiri Stransky
Omri Hochman
: TestOnly, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-03-02 03:06 EST by Francesco Vollero
Modified: 2016-12-14 10:25 EST (History)
13 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-5.0.0-0.5.0rc3.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-12-14 10:25:11 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jcoufal: needinfo? (fvollero)


Attachments (Terms of Use)
heat-engine (14.65 KB, text/plain)
2016-03-02 03:06 EST, Francesco Vollero
no flags Details
nova-conductor (6.80 KB, text/plain)
2016-03-02 03:08 EST, Francesco Vollero
no flags Details
nova-api (18.69 KB, text/plain)
2016-03-02 03:08 EST, Francesco Vollero
no flags Details
grep of a failing request (2.76 KB, text/plain)
2016-03-02 03:12 EST, Francesco Vollero
no flags Details

  None (edit)
Description Francesco Vollero 2016-03-02 03:06:50 EST
Created attachment 1132164 [details]
heat-engine

Description of problem:

When I am trying to update an existing overcloud I am facing the most hated error in heat, the UPDATE_FAILED one. 

I am executing this deployments on physical hardware, with a director having 4 vcores (running on kvm instance) and 32 gb or ram and having the engine_num_workers to 8 as suggested.

If the deployment is executed from scratch it succeed without any issues, but if is an update, it always (based on my number of trials, 15) fail.

I collected the logs from heat-engine, nova-conductor and neutron and it seems to be a problem 'related' to neutron, but is just speculation.


Version-Release number of selected component (if applicable):
7.3

How reproducible:
Run a deployment to update the number of compute nodes.

Steps to Reproduce:
The same for creating a normal deployment.

Actual results:
UPDATE_FAILED
Stack failed with status: resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
ERROR: openstack Heat Stack update failed.


Expected results:
Deployment succeeded

Additional info:
Comment 2 Francesco Vollero 2016-03-02 03:08 EST
Created attachment 1132165 [details]
nova-conductor
Comment 3 Francesco Vollero 2016-03-02 03:08 EST
Created attachment 1132166 [details]
nova-api
Comment 4 Francesco Vollero 2016-03-02 03:12 EST
Created attachment 1132167 [details]
grep of a failing request
Comment 5 Ryan Brown 2016-03-02 14:18:42 EST
Extended nova logs: http://chunk.io/f/c1078acec8ee4286995813db6e020481
Comment 6 Ryan Brown 2016-03-02 14:24:55 EST
It looks to me like a source of this problem is in Neutron - sometimes a 404 can indicate not enough floating IPs. The next step would be to get the neutron logs to see what's causing the 404.
Comment 7 Ryan Brown 2016-03-03 11:23:56 EST
Neutron logs http://chunk.io/f/7da633892d50410aa17dec6963afbc41
Comment 8 Francesco Vollero 2016-03-04 05:57:46 EST
The range of floating ip have size 20 while the number of nodes is maximum 9.
Comment 9 Mike Burns 2016-04-07 17:14:44 EDT
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.
Comment 10 Jaromir Coufal 2016-10-11 09:19:45 EDT
Francesco, still experiencing similar issues? Seems like one off.
Comment 11 James Slagle 2016-10-14 12:34:08 EDT
setting TestOnly
Comment 13 Marius Cornea 2016-11-22 08:47:08 EST
Moving this to verified since scaling out with an additional compute node complete ok on OSP10.
Comment 16 errata-xmlrpc 2016-12-14 10:25:11 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.