Bug 1257401 - Extended delay between calls to create nova instances from heat-engine
Extended delay between calls to create nova instances from heat-engine
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat (Show other bugs)
7.0 (Kilo)
Unspecified Linux
high Severity high
: z4
: 7.0 (Kilo)
Assigned To: Steven Hardy
Amit Ugol
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-26 22:53 EDT by Mark Wagner
Modified: 2016-04-26 10:44 EDT (History)
9 users (show)

See Also:
Fixed In Version: openstack-heat-2015.1.0-2.el7ost
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-06 17:53:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark Wagner 2015-08-26 22:53:34 EDT
Description of problem:
It does not appear that heat is honoring the value of max_concurrent_builds set in the nova.conf file. Yes I know its heat and not nova.


Version-Release number of selected component (if applicable):


How reproducible:
everytime

Steps to Reproduce:
1.modify the value of max_concurrent_builds in nova.conf
2.restart services
3.deploy

Actual results:
Not all of the requested build happen concurrently

Expected results:
The level of concurrency should be honored

Additional info:
Comment 4 Steven Hardy 2015-08-27 13:50:19 EDT
I did some profiling today, and discovered that most of the time is getting eaten up doing recursive calculation of the number of resources.  This is something we made optional (can be disabled via config file) during Liberty via the Unlimited option max_resources_per_stack - see this patch:

https://review.openstack.org/#/c/185894/7

Unfortunately that isn't yet backported to kilo (I'm working on a backport, few issues to sort out with the tests but otherwise should be straight forward), so to prove this is the bottleneck, it's possible to comment the validation in the code:

In my testing I observed a 500% performance penalty for having the resource number check enabled (admittedly something of a worst-case due to tailoring a template to magnify the issue).

https://github.com/openstack/heat/blob/stable/kilo/heat/engine/resources/stack_resource.py#L197

The root_stack and total resource count thing has long been acknowledged as a problem in the heat community, but it wasn't until my testing today that I fully understood the severity of the impact, particularly for highly nested deployments like tripleo.

I raised an upstream bug to discuss disabling the check by default (this was rejected previously, but worth revisiting I think, this discussion shouldn't influence the viabillty of the backport of the unlimited option mentioned above);

https://bugs.launchpad.net/heat/+bug/1489548
Comment 5 Steven Hardy 2015-08-28 06:43:01 EDT
Stable/kilo backport proposed for the unlimited option mentioned in comment #4:

https://review.openstack.org/#/c/218193/

Note You need to log in before you can comment on or make changes to this bug.