1257401 – Extended delay between calls to create nova instances from heat-engine

Bug 1257401 - Extended delay between calls to create nova instances from heat-engine

Summary: Extended delay between calls to create nova instances from heat-engine

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-heat
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z4
Target Release:	7.0 (Kilo)
Assignee:	Steven Hardy
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-27 02:53 UTC by Mark Wagner
Modified:	2016-04-26 14:44 UTC (History)
CC List:	9 users (show)
Fixed In Version:	openstack-heat-2015.1.0-2.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-01-06 22:53:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Mark Wagner 2015-08-27 02:53:34 UTC

Description of problem:
It does not appear that heat is honoring the value of max_concurrent_builds set in the nova.conf file. Yes I know its heat and not nova.


Version-Release number of selected component (if applicable):


How reproducible:
everytime

Steps to Reproduce:
1.modify the value of max_concurrent_builds in nova.conf
2.restart services
3.deploy

Actual results:
Not all of the requested build happen concurrently

Expected results:
The level of concurrency should be honored

Additional info:

Comment 4 Steven Hardy 2015-08-27 17:50:19 UTC

I did some profiling today, and discovered that most of the time is getting eaten up doing recursive calculation of the number of resources.  This is something we made optional (can be disabled via config file) during Liberty via the Unlimited option max_resources_per_stack - see this patch:

https://review.openstack.org/#/c/185894/7

Unfortunately that isn't yet backported to kilo (I'm working on a backport, few issues to sort out with the tests but otherwise should be straight forward), so to prove this is the bottleneck, it's possible to comment the validation in the code:

In my testing I observed a 500% performance penalty for having the resource number check enabled (admittedly something of a worst-case due to tailoring a template to magnify the issue).

https://github.com/openstack/heat/blob/stable/kilo/heat/engine/resources/stack_resource.py#L197

The root_stack and total resource count thing has long been acknowledged as a problem in the heat community, but it wasn't until my testing today that I fully understood the severity of the impact, particularly for highly nested deployments like tripleo.

I raised an upstream bug to discuss disabling the check by default (this was rejected previously, but worth revisiting I think, this discussion shouldn't influence the viabillty of the backport of the unlimited option mentioned above);

https://bugs.launchpad.net/heat/+bug/1489548

Comment 5 Steven Hardy 2015-08-28 10:43:01 UTC

Stable/kilo backport proposed for the unlimited option mentioned in comment #4:

https://review.openstack.org/#/c/218193/

Note You need to log in before you can comment on or make changes to this bug.