Bug 1313674

Summary: Failing to update the existing overcloud adding more compute nodes.
Product: Red Hat OpenStack Reporter: Francesco Vollero <fvollero>
Component: openstack-tripleo-heat-templatesAssignee: Jiri Stransky <jstransk>
Status: CLOSED ERRATA QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0 (Kilo)CC: astellwa, dbecker, fvollero, jcoufal, jraju, jslagle, mburns, mcornea, mgandolf, morazi, rhel-osp-director-maint, riontel, rybrown
Target Milestone: rcKeywords: TestOnly, Triaged
Target Release: 10.0 (Newton)   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-5.0.0-0.5.0rc3.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 15:25:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
heat-engine
none
nova-conductor
none
nova-api
none
grep of a failing request none

Description Francesco Vollero 2016-03-02 08:06:50 UTC
Created attachment 1132164 [details]
heat-engine

Description of problem:

When I am trying to update an existing overcloud I am facing the most hated error in heat, the UPDATE_FAILED one. 

I am executing this deployments on physical hardware, with a director having 4 vcores (running on kvm instance) and 32 gb or ram and having the engine_num_workers to 8 as suggested.

If the deployment is executed from scratch it succeed without any issues, but if is an update, it always (based on my number of trials, 15) fail.

I collected the logs from heat-engine, nova-conductor and neutron and it seems to be a problem 'related' to neutron, but is just speculation.


Version-Release number of selected component (if applicable):
7.3

How reproducible:
Run a deployment to update the number of compute nodes.

Steps to Reproduce:
The same for creating a normal deployment.

Actual results:
UPDATE_FAILED
Stack failed with status: resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
ERROR: openstack Heat Stack update failed.


Expected results:
Deployment succeeded

Additional info:

Comment 2 Francesco Vollero 2016-03-02 08:08:02 UTC
Created attachment 1132165 [details]
nova-conductor

Comment 3 Francesco Vollero 2016-03-02 08:08:57 UTC
Created attachment 1132166 [details]
nova-api

Comment 4 Francesco Vollero 2016-03-02 08:12:37 UTC
Created attachment 1132167 [details]
grep of a failing request

Comment 5 Ryan Brown 2016-03-02 19:18:42 UTC
Extended nova logs: http://chunk.io/f/c1078acec8ee4286995813db6e020481

Comment 6 Ryan Brown 2016-03-02 19:24:55 UTC
It looks to me like a source of this problem is in Neutron - sometimes a 404 can indicate not enough floating IPs. The next step would be to get the neutron logs to see what's causing the 404.

Comment 7 Ryan Brown 2016-03-03 16:23:56 UTC
Neutron logs http://chunk.io/f/7da633892d50410aa17dec6963afbc41

Comment 8 Francesco Vollero 2016-03-04 10:57:46 UTC
The range of floating ip have size 20 while the number of nodes is maximum 9.

Comment 9 Mike Burns 2016-04-07 21:14:44 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 Jaromir Coufal 2016-10-11 13:19:45 UTC
Francesco, still experiencing similar issues? Seems like one off.

Comment 11 James Slagle 2016-10-14 16:34:08 UTC
setting TestOnly

Comment 13 Marius Cornea 2016-11-22 13:47:08 UTC
Moving this to verified since scaling out with an additional compute node complete ok on OSP10.

Comment 16 errata-xmlrpc 2016-12-14 15:25:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Comment 17 Red Hat Bugzilla 2023-09-14 03:18:47 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days