Created attachment 853237 [details] Postgres log Description of problem: Events displayed in UI (used by reports and API as well) are being pulled from database by very inefficient query. This was found during scalability testing done by strategic EMEA customer. The query is (taken from postgres logs with enhanced verbosity level): [pid 3766 127.0.0.1(39426)]DEBUG: 00000: parse <unnamed>: SELECT * FROM (SELECT * FROM audit_log WHERE ( audit_log_id IN (SELECT audit_log.audit_log_id FROM audit_log WHERE audit_log.severity = '10' )) ORDER BY audit_log_id DESC ) as T1 OFFSET (1 -1) LIMIT 100 Version-Release number of selected component (if applicable): is29 (basically any RHEV-M until now) How reproducible: 100 % Steps to Reproduce: 1. enhance log verbosity of postgres (in postgresql.conf): log_error_verbosity = verbose log_min_messages = debug2 log_line_prefix = '[pid %p %r]' 2. restart postgres 3. go to events in portal, or fetch event via api Actual results: The query used is very inefficient and takes very long time to finish. Expected results: We should most likely use just one SELECT (ain't ;)) Additional info: Hopefully not needed. But still for the record attaching the postgres log
liran/eli - didn't we fix something in these for 3.4?
(In reply to Itamar Heim from comment #2) > liran/eli - didn't we fix something in these for 3.4? As far as I remember Liran handled that for a diffrent query for 3.4 Liran ?
Actually , looking again in the query described in BZ description , it seems that this is the query that populate the Alerts TAB ( WHERE audit_log.severity = '10' ) => 10 is the severity of Alerts So, we had optimized the regular query that populates the Events TAB but not that one
what is considered to be OK time ? now it takes ~2seconds is it ok? tested on ovirt-engine-3.4.0-0.7.beta2.el6.noarch
That depends on the amount events you have in your audit_log. Is that a clear environment or some long-termly used one?
clear environment
In that case 2 seconds is quite a lot, but we should try to feed the audit log with some entries to see how much UI limits this and what is the performance degradation per 100, 1000, 10 000 lines of audit log (10k should be the upper limit where we start cleaning up the audit_log table automatically so there's no need to test more than that).
functional testing point of view: i see that the audit_log and event_log are viewed ok. Gil, do u see apropriate to do scale testing here by Yuri's team? Or shall i move it to verify?
Eldad, would you be able to verify this on the scale lab?
Not reproduced not in scale setup and even small one, tested on 3.3 and also 3.4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html