Bug 1313674 - Failing to update the existing overcloud adding more compute nodes. [NEEDINFO]
Summary: Failing to update the existing overcloud adding more compute nodes.
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: x86_64
OS: Unspecified
Target Milestone: rc
: 10.0 (Newton)
Assignee: Jiri Stransky
QA Contact: Omri Hochman
Depends On:
TreeView+ depends on / blocked
Reported: 2016-03-02 08:06 UTC by Francesco Vollero
Modified: 2016-12-14 15:25 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.0.0-0.5.0rc3.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2016-12-14 15:25:11 UTC
jcoufal: needinfo? (fvollero)

Attachments (Terms of Use)
heat-engine (14.65 KB, text/plain)
2016-03-02 08:06 UTC, Francesco Vollero
no flags Details
nova-conductor (6.80 KB, text/plain)
2016-03-02 08:08 UTC, Francesco Vollero
no flags Details
nova-api (18.69 KB, text/plain)
2016-03-02 08:08 UTC, Francesco Vollero
no flags Details
grep of a failing request (2.76 KB, text/plain)
2016-03-02 08:12 UTC, Francesco Vollero
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Francesco Vollero 2016-03-02 08:06:50 UTC
Created attachment 1132164 [details]

Description of problem:

When I am trying to update an existing overcloud I am facing the most hated error in heat, the UPDATE_FAILED one. 

I am executing this deployments on physical hardware, with a director having 4 vcores (running on kvm instance) and 32 gb or ram and having the engine_num_workers to 8 as suggested.

If the deployment is executed from scratch it succeed without any issues, but if is an update, it always (based on my number of trials, 15) fail.

I collected the logs from heat-engine, nova-conductor and neutron and it seems to be a problem 'related' to neutron, but is just speculation.

Version-Release number of selected component (if applicable):

How reproducible:
Run a deployment to update the number of compute nodes.

Steps to Reproduce:
The same for creating a normal deployment.

Actual results:
Stack failed with status: resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
ERROR: openstack Heat Stack update failed.

Expected results:
Deployment succeeded

Additional info:

Comment 2 Francesco Vollero 2016-03-02 08:08:02 UTC
Created attachment 1132165 [details]

Comment 3 Francesco Vollero 2016-03-02 08:08:57 UTC
Created attachment 1132166 [details]

Comment 4 Francesco Vollero 2016-03-02 08:12:37 UTC
Created attachment 1132167 [details]
grep of a failing request

Comment 5 Ryan Brown 2016-03-02 19:18:42 UTC
Extended nova logs: http://chunk.io/f/c1078acec8ee4286995813db6e020481

Comment 6 Ryan Brown 2016-03-02 19:24:55 UTC
It looks to me like a source of this problem is in Neutron - sometimes a 404 can indicate not enough floating IPs. The next step would be to get the neutron logs to see what's causing the 404.

Comment 7 Ryan Brown 2016-03-03 16:23:56 UTC
Neutron logs http://chunk.io/f/7da633892d50410aa17dec6963afbc41

Comment 8 Francesco Vollero 2016-03-04 10:57:46 UTC
The range of floating ip have size 20 while the number of nodes is maximum 9.

Comment 9 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 Jaromir Coufal 2016-10-11 13:19:45 UTC
Francesco, still experiencing similar issues? Seems like one off.

Comment 11 James Slagle 2016-10-14 16:34:08 UTC
setting TestOnly

Comment 13 Marius Cornea 2016-11-22 13:47:08 UTC
Moving this to verified since scaling out with an additional compute node complete ok on OSP10.

Comment 16 errata-xmlrpc 2016-12-14 15:25:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.