Bug 1396398 - listing of stack events is slow on a big stack
Summary: listing of stack events is slow on a big stack
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Thomas Hervé
QA Contact: Amit Ugol
URL:
Whiteboard: aos-scalability-34
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-18 09:11 UTC by Jan Provaznik
Modified: 2017-01-12 15:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-12 15:21:56 UTC
Target Upstream Version:


Attachments (Terms of Use)
stack events (nested depth 2) (192.49 KB, application/x-gzip)
2016-12-02 14:53 UTC, Jan Provaznik
no flags Details

Description Jan Provaznik 2016-11-18 09:11:44 UTC
Description of problem:
listing of events on a stack with 350 nodes is too slow (50 seconds). Structure of stack is not too complex (one level of nested stack), templates definition is here:
https://github.com/redhat-openstack/openshift-on-openstack


Version-Release number of selected component (if applicable):
python-heatclient-1.4.0-0.20160831084943.fb7802e.el7ost.noarch
openstack-heat-engine-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cloudwatch-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cfn-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
puppet-heat-9.2.0-0.20160901072004.4d7b5be.el7ost.noarch
openstack-heat-common-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-7.0.0-0.20160907124808.21e49dc.el7ost.noarch


How reproducible:
I understand it's hard to reproduce because of required HW resources, in this case openshift-on-openstack templates and 350 openshift compute nodes were deployed. I don't think this is specific to openshift-on-openstack templates so any big stack which includes nested stack will work as a reproducer.

My guess would be that there is no reference of top-level stack in events of nested stack so Heat has to do many selects (per each nested stack) to get all events.

Actual results:
$ time openstack stack event list --nested-depth=2 test
...
real	0m52.846s
user	0m3.366s
sys	0m2.363s


Expected results:
List of events is returned in few seconds.

Related/similar to BZ https://bugzilla.redhat.com/show_bug.cgi?id=1396391#c0

Comment 1 Thomas Hervé 2016-12-01 15:49:20 UTC
According to https://review.openstack.org/#/c/326229/, this should have been optimized already. It seems we got a regression, I'll try to have a look.

Comment 2 Thomas Hervé 2016-12-02 10:18:59 UTC
I tested with master on tripleo (rather big stack), and it seems fine:

$ time openstack stack event list --nested-depth 4  overcloud | wc -l
1120

real    0m3.671s
user    0m1.316s
sys     0m0.108s

Not blazing fast, but good enough it seems. How many events do you have?

Comment 3 Zane Bitter 2016-12-02 13:41:42 UTC
Some back-of-the-envelope calculations suggest it's probably a little over 9000 events just to create a stack like this (about 7700 resources, but the stack is scaled out in stages). And those are spread over 350+ stacks.

However, as you pointed out, the code appears to be doing exactly 3 DB queries regardless of the number of stacks involved. That suggests that if there's a problem it's likely to be in DB optimisation rather than application optimisation.

I wonder if we're missing an index we should have?

Comment 4 Jan Provaznik 2016-12-02 14:53:28 UTC
Created attachment 1227353 [details]
stack events (nested depth 2)

Comment 6 Thomas Hervé 2017-01-12 15:21:56 UTC
So the given list contains 30k events. I don't think there is much we can do to improve it. We probably should have had a default limit, but it's hard to change that for backward compatibility.

In the mean time, you should use limits, markers and filters if you need quick responses. You may also want to use stack failure list.


Note You need to log in before you can comment on or make changes to this bug.