Bug 1313674
Summary: | Failing to update the existing overcloud adding more compute nodes. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Francesco Vollero <fvollero> | ||||||||||
Component: | openstack-tripleo-heat-templates | Assignee: | Jiri Stransky <jstransk> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.0 (Kilo) | CC: | astellwa, dbecker, fvollero, jcoufal, jraju, jslagle, mburns, mcornea, mgandolf, morazi, rhel-osp-director-maint, riontel, rybrown | ||||||||||
Target Milestone: | rc | Keywords: | TestOnly, Triaged | ||||||||||
Target Release: | 10.0 (Newton) | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | openstack-tripleo-heat-templates-5.0.0-0.5.0rc3.el7ost | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2016-12-14 15:25:11 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Created attachment 1132165 [details]
nova-conductor
Created attachment 1132166 [details]
nova-api
Created attachment 1132167 [details]
grep of a failing request
Extended nova logs: http://chunk.io/f/c1078acec8ee4286995813db6e020481 It looks to me like a source of this problem is in Neutron - sometimes a 404 can indicate not enough floating IPs. The next step would be to get the neutron logs to see what's causing the 404. Neutron logs http://chunk.io/f/7da633892d50410aa17dec6963afbc41 The range of floating ip have size 20 while the number of nodes is maximum 9. This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. Francesco, still experiencing similar issues? Seems like one off. setting TestOnly Moving this to verified since scaling out with an additional compute node complete ok on OSP10. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |
Created attachment 1132164 [details] heat-engine Description of problem: When I am trying to update an existing overcloud I am facing the most hated error in heat, the UPDATE_FAILED one. I am executing this deployments on physical hardware, with a director having 4 vcores (running on kvm instance) and 32 gb or ram and having the engine_num_workers to 8 as suggested. If the deployment is executed from scratch it succeed without any issues, but if is an update, it always (based on my number of trials, 15) fail. I collected the logs from heat-engine, nova-conductor and neutron and it seems to be a problem 'related' to neutron, but is just speculation. Version-Release number of selected component (if applicable): 7.3 How reproducible: Run a deployment to update the number of compute nodes. Steps to Reproduce: The same for creating a normal deployment. Actual results: UPDATE_FAILED Stack failed with status: resources.Compute: ResourceInError: resources[2].resources.NovaCompute: Went to status ERROR due to "Message: Unknown, Code: Unknown" ERROR: openstack Heat Stack update failed. Expected results: Deployment succeeded Additional info: