1056064 – [SCALE] Events are being pulled from audit_log in a very inefficient way

Bug 1056064 - [SCALE] Events are being pulled from audit_log in a very inefficient way

Summary: [SCALE] Events are being pulled from audit_log in a very inefficient way

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.4.0
Assignee:	Liran Zelkha
QA Contact:	Eldad Marciano
Docs Contact:
URL:
Whiteboard:	infra
Depends On:
Blocks:	GSS_RHEV_33_BETA 1057561 1061165 rhev3.4beta 1142926
TreeView+	depends on / blocked

Reported:	2014-01-21 13:45 UTC by Tomas Dosek
Modified:	2019-04-28 10:49 UTC (History)
CC List:	13 users (show)
Fixed In Version:	ovirt-3.4.0-beta2
Doc Type:	Bug Fix
Doc Text:	Previously, an inefficient query caused information pulled from the database to take a long time. This resulted in the query taking longer than expected to display results. Now, an updated query has fixed the problem. As a result, information is displayed more quickly.
Clone Of:
Clones:	1057561 1061165 (view as bug list)
Environment:
Last Closed:	2014-06-09 15:09:09 UTC
oVirt Team:	Infra
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Postgres log (7.74 MB, text/plain) 2014-01-21 13:45 UTC, Tomas Dosek	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2014:0506	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update	2014-06-09 18:55:38 UTC
oVirt gerrit	23567	None	None	None	Never
oVirt gerrit	24048	None	None	None	Never
oVirt gerrit	24094	None	None	None	Never

Description Tomas Dosek 2014-01-21 13:45:30 UTC

Created attachment 853237 [details]
Postgres log

Description of problem:

Events displayed in UI (used by reports and API as well) are being pulled from database by very inefficient query.

This was found during scalability testing done by strategic EMEA customer.

The query is (taken from postgres logs with enhanced verbosity level):
[pid 3766 127.0.0.1(39426)]DEBUG:  00000: parse <unnamed>: SELECT * FROM (SELECT * FROM audit_log WHERE ( audit_log_id IN (SELECT audit_log.audit_log_id FROM  audit_log   WHERE  audit_log.severity = '10' ))  ORDER BY audit_log_id DESC ) as T1 OFFSET (1 -1) LIMIT 100


Version-Release number of selected component (if applicable):
is29 (basically any RHEV-M until now)

How reproducible:
100 %

Steps to Reproduce:
1. enhance log verbosity of postgres (in postgresql.conf):
        log_error_verbosity = verbose
        log_min_messages = debug2
        log_line_prefix = '[pid %p %r]'
2. restart postgres
3. go to events in portal, or fetch event via api

Actual results:
The query used is very inefficient and takes very long time to finish.

Expected results:
We should most likely use just one SELECT (ain't ;))

Additional info:
Hopefully not needed. But still for the record attaching the postgres log

Comment 2 Itamar Heim 2014-01-21 22:15:58 UTC

liran/eli - didn't we fix something in these for 3.4?

Comment 3 Eli Mesika 2014-01-22 09:07:13 UTC

(In reply to Itamar Heim from comment #2)
> liran/eli - didn't we fix something in these for 3.4?

As far as I remember Liran handled that for a diffrent query for 3.4 
Liran ?

Comment 4 Eli Mesika 2014-01-22 09:14:37 UTC

Actually , looking again in the query described in BZ description , it seems that this is the query that populate the Alerts TAB ( WHERE  audit_log.severity = '10' ) => 10 is the severity of Alerts 

So, we had optimized the regular query that populates the Events TAB but not that one

Comment 6 Tareq Alayan 2014-02-17 10:17:38 UTC

what is considered to be OK time  ?
now it takes ~2seconds is it ok?

tested on ovirt-engine-3.4.0-0.7.beta2.el6.noarch

Comment 7 Tomas Dosek 2014-02-17 10:39:37 UTC

That depends on the amount events you have in your audit_log. Is that a clear environment or some long-termly used one?

Comment 8 Tareq Alayan 2014-02-17 11:30:55 UTC

clear environment

Comment 9 Tomas Dosek 2014-02-17 11:35:28 UTC

In that case 2 seconds is quite a lot, but we should try to feed the audit log with some entries to see how much UI limits this and what is the performance degradation per 100, 1000, 10 000 lines of audit log (10k should be the upper limit where we start cleaning up the audit_log table automatically so there's no need to test more than that).

Comment 10 Tareq Alayan 2014-02-17 11:43:06 UTC

functional testing point of view: i see that the audit_log and event_log are viewed ok. 

Gil, do u see apropriate to do scale testing here by Yuri's team? 
Or shall i move it to verify?

Comment 11 Gil Klein 2014-02-17 16:18:41 UTC

Eldad, would you be able to verify this on the scale lab?

Comment 13 Eldad Marciano 2014-05-13 11:03:08 UTC

Not reproduced 

not in scale setup and even small one,

tested on 3.3 and also 3.4

Comment 14 errata-xmlrpc 2014-06-09 15:09:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html

Note You need to log in before you can comment on or make changes to this bug.