Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1396398

Summary:

listing of stack events is slow on a big stack

Product:

Red Hat OpenStack

Reporter:

Jan Provaznik <jprovazn>

Component:

openstack-heat

Assignee:

Thomas Hervé <therve>

Status:

CLOSED WONTFIX

QA Contact:

Amit Ugol <augol>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

10.0 (Newton)

CC:

mburns, rhel-osp-director-maint, sbaker, shardy, srevivo, therve, zbitter

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

aos-scalability-34

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-12 15:21:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
stack events (nested depth 2)	none

Description Jan Provaznik 2016-11-18 09:11:44 UTC

Description of problem:
listing of events on a stack with 350 nodes is too slow (50 seconds). Structure of stack is not too complex (one level of nested stack), templates definition is here:
https://github.com/redhat-openstack/openshift-on-openstack


Version-Release number of selected component (if applicable):
python-heatclient-1.4.0-0.20160831084943.fb7802e.el7ost.noarch
openstack-heat-engine-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cloudwatch-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-cfn-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
puppet-heat-9.2.0-0.20160901072004.4d7b5be.el7ost.noarch
openstack-heat-common-7.0.0-0.20160907124808.21e49dc.el7ost.noarch
openstack-heat-api-7.0.0-0.20160907124808.21e49dc.el7ost.noarch


How reproducible:
I understand it's hard to reproduce because of required HW resources, in this case openshift-on-openstack templates and 350 openshift compute nodes were deployed. I don't think this is specific to openshift-on-openstack templates so any big stack which includes nested stack will work as a reproducer.

My guess would be that there is no reference of top-level stack in events of nested stack so Heat has to do many selects (per each nested stack) to get all events.

Actual results:
$ time openstack stack event list --nested-depth=2 test
...
real	0m52.846s
user	0m3.366s
sys	0m2.363s


Expected results:
List of events is returned in few seconds.

Related/similar to BZ https://bugzilla.redhat.com/show_bug.cgi?id=1396391#c0

Comment 1 Thomas Hervé 2016-12-01 15:49:20 UTC

According to https://review.openstack.org/#/c/326229/, this should have been optimized already. It seems we got a regression, I'll try to have a look.

Comment 2 Thomas Hervé 2016-12-02 10:18:59 UTC

I tested with master on tripleo (rather big stack), and it seems fine:

$ time openstack stack event list --nested-depth 4  overcloud | wc -l
1120

real    0m3.671s
user    0m1.316s
sys     0m0.108s

Not blazing fast, but good enough it seems. How many events do you have?

Comment 3 Zane Bitter 2016-12-02 13:41:42 UTC

Some back-of-the-envelope calculations suggest it's probably a little over 9000 events just to create a stack like this (about 7700 resources, but the stack is scaled out in stages). And those are spread over 350+ stacks.

However, as you pointed out, the code appears to be doing exactly 3 DB queries regardless of the number of stacks involved. That suggests that if there's a problem it's likely to be in DB optimisation rather than application optimisation.

I wonder if we're missing an index we should have?

Comment 4 Jan Provaznik 2016-12-02 14:53:28 UTC

Created attachment 1227353 [details]
stack events (nested depth 2)

Comment 6 Thomas Hervé 2017-01-12 15:21:56 UTC

So the given list contains 30k events. I don't think there is much we can do to improve it. We probably should have had a default limit, but it's hard to change that for backward compatibility.

In the mean time, you should use limits, markers and filters if you need quick responses. You may also want to use stack failure list.