Bug 1396398

Summary: listing of stack events is slow on a big stack
Product: Red Hat OpenStack Reporter: Jan Provaznik <jprovazn>
Component: openstack-heatAssignee: Thomas Hervé <therve>
Status: CLOSED WONTFIX QA Contact: Amit Ugol <augol>
Severity: low Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: mburns, rhel-osp-director-maint, sbaker, shardy, srevivo, therve, zbitter
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-34
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-12 15:21:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
stack events (nested depth 2) none

Description Jan Provaznik 2016-11-18 09:11:44 UTC
Description of problem:
listing of events on a stack with 350 nodes is too slow (50 seconds). Structure of stack is not too complex (one level of nested stack), templates definition is here:
https://github.com/redhat-openstack/openshift-on-openstack


Version-Release number of selected component (if applicable):
python-heatclient-1.4.0-0.20160831084943.fb7802e.el7ost.noarch
openstack-heat-engine-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cloudwatch-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cfn-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
puppet-heat-9.2.0-0.20160901072004.4d7b5be.el7ost.noarch
openstack-heat-common-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-7.0.0-0.20160907124808.21e49dc.el7ost.noarch


How reproducible:
I understand it's hard to reproduce because of required HW resources, in this case openshift-on-openstack templates and 350 openshift compute nodes were deployed. I don't think this is specific to openshift-on-openstack templates so any big stack which includes nested stack will work as a reproducer.

My guess would be that there is no reference of top-level stack in events of nested stack so Heat has to do many selects (per each nested stack) to get all events.

Actual results:
$ time openstack stack event list --nested-depth=2 test
...
real	0m52.846s
user	0m3.366s
sys	0m2.363s


Expected results:
List of events is returned in few seconds.

Related/similar to BZ https://bugzilla.redhat.com/show_bug.cgi?id=1396391#c0

Comment 1 Thomas Hervé 2016-12-01 15:49:20 UTC
According to https://review.openstack.org/#/c/326229/, this should have been optimized already. It seems we got a regression, I'll try to have a look.

Comment 2 Thomas Hervé 2016-12-02 10:18:59 UTC
I tested with master on tripleo (rather big stack), and it seems fine:

$ time openstack stack event list --nested-depth 4  overcloud | wc -l
1120

real    0m3.671s
user    0m1.316s
sys     0m0.108s

Not blazing fast, but good enough it seems. How many events do you have?

Comment 3 Zane Bitter 2016-12-02 13:41:42 UTC
Some back-of-the-envelope calculations suggest it's probably a little over 9000 events just to create a stack like this (about 7700 resources, but the stack is scaled out in stages). And those are spread over 350+ stacks.

However, as you pointed out, the code appears to be doing exactly 3 DB queries regardless of the number of stacks involved. That suggests that if there's a problem it's likely to be in DB optimisation rather than application optimisation.

I wonder if we're missing an index we should have?

Comment 4 Jan Provaznik 2016-12-02 14:53:28 UTC
Created attachment 1227353 [details]
stack events (nested depth 2)

Comment 6 Thomas Hervé 2017-01-12 15:21:56 UTC
So the given list contains 30k events. I don't think there is much we can do to improve it. We probably should have had a default limit, but it's hard to change that for backward compatibility.

In the mean time, you should use limits, markers and filters if you need quick responses. You may also want to use stack failure list.