Created attachment 1132164 [details]
Description of problem:
When I am trying to update an existing overcloud I am facing the most hated error in heat, the UPDATE_FAILED one.
I am executing this deployments on physical hardware, with a director having 4 vcores (running on kvm instance) and 32 gb or ram and having the engine_num_workers to 8 as suggested.
If the deployment is executed from scratch it succeed without any issues, but if is an update, it always (based on my number of trials, 15) fail.
I collected the logs from heat-engine, nova-conductor and neutron and it seems to be a problem 'related' to neutron, but is just speculation.
Version-Release number of selected component (if applicable):
Run a deployment to update the number of compute nodes.
Steps to Reproduce:
The same for creating a normal deployment.
Stack failed with status: resources.Compute: ResourceInError: resources.resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown"
ERROR: openstack Heat Stack update failed.
Created attachment 1132165 [details]
Created attachment 1132166 [details]
Created attachment 1132167 [details]
grep of a failing request
Extended nova logs: http://chunk.io/f/c1078acec8ee4286995813db6e020481
It looks to me like a source of this problem is in Neutron - sometimes a 404 can indicate not enough floating IPs. The next step would be to get the neutron logs to see what's causing the 404.
Neutron logs http://chunk.io/f/7da633892d50410aa17dec6963afbc41
The range of floating ip have size 20 while the number of nodes is maximum 9.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Francesco, still experiencing similar issues? Seems like one off.
Moving this to verified since scaling out with an additional compute node complete ok on OSP10.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.