Bug 1035767 - Showing metric schedules for compatible group is very slow
Summary: Showing metric schedules for compatible group is very slow
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHQ 4.10
Assignee: RHQ Project Maintainer
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1034796
TreeView+ depends on / blocked
 
Reported: 2013-11-28 12:34 UTC by Lukas Krejci
Modified: 2014-10-13 11:51 UTC (History)
5 users (show)

Fixed In Version:
Clone Of: 1034796
Environment:
Last Closed: 2014-04-23 12:31:51 UTC
Embargoed:


Attachments (Terms of Use)

Description Lukas Krejci 2013-11-28 12:34:04 UTC
+++ This bug was initially created as a clone of Bug #1034796 +++

Description of problem:

There seems to be a performance regression comparing to JON 3.1.2 in showing metric schedules for compatible group

Version-Release number of selected component (if applicable):
JON 3.2.ER7

How reproducible: always


Steps to Reproduce:
1. create compatible group of your network interfaces (2 on linux by default)
2. go to this group -> Monitoring -> Schedules
3. Hit refresh

Actual results:

on JON 3.2.0.ER7 it takes 4-5 seconds to load 
on JON 3.1.2.GA it takes 1second (almost instant)


Expected results:

refreshing/loading schedules on JON 3.2.0 takes more or less same time as on 3.1.2

Additional info: Setup: 1 server with 1 local agent + storage node

--- Additional comment from Mike Foley on 2013-11-26 09:19:57 EST ---

Detected as part of JON 3.2 ER7 Test Day.

Supporting facts:
here is 3.2 http://testday5.bc.jonqe.lab.eng.bos.redhat.com:7080/coregui/#ResourceGroup/10011/Monitoring/Schedules

and here 3.1.2 http://lzoubek4.bc.jonqe.lab.eng.bos.redhat.com:7080/coregui/#ResourceGroup/10001/Monitoring/Schedules

this looks like a serious performance regression in an area where JON 3.2 is supposed to be more performant and scalable.  

adding the JON 3.2 GA blocker (although this still needs to be triaged by the triage team).  preliminarily targetting for CR1.

--- Additional comment from Heiko W. Rupp on 2013-11-26 12:14:56 EST ---

Assigning to jsanda - even as we did not triage it yet, so that he can investigate this.

--- Additional comment from John Sanda on 2013-11-26 16:50:07 EST ---

The schedules tab doesn't execute any C* queries, only JPQL queries generated by our criteria query framework. The reason for the slow down has to do with changes made for bug 855674. In Libor's server log, there should be warnings that look like,

16:43:56,213 WARN  [org.rhq.enterprise.server.util.CriteriaQueryRunner] (http-/0.0.0.0:7080-7) Could not get consistent results of the paged data and a total count for MeasurementScheduleCriteria[filters: [filterResourceGroupId=10001, filterResourceInventoryStatuses=[COMMITTED]], fetches: [], sorts: []]. After 10 attempts, the collection size is 22, while the count query reports 220 for PageControl[page=0, size=-1, sort[definition.displayName ASC]]. The discrepancy has not cleared up in 3900ms so we're giving up, returning inconsistent results. Note that is most possibly NOT an error. It is likely caused by concurrent database activity that changes the contents of the database that the criteria query is querying.: java.lang.Exception
	at org.rhq.enterprise.server.util.CriteriaQueryRunner.execute(CriteriaQueryRunner.java:82) [rhq-server.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.measurement.MeasurementScheduleManagerBean.getMeasurementScheduleCompositesByContext(MeasurementScheduleManagerBean.java:1490) [rhq-server.jar:4.10.0-SNAPSHOT]


Enable debug logging on the class org.rhq.enterprise.server.util.QueryUtility and you will see DEBUG statements like,

6:43:55,187 DEBUG [org.rhq.enterprise.server.util.QueryUtility] (http-/0.0.0.0:7080-7) Possible phantom read detected while running a query. The collection size = 22, count = 220, pageControl = PageControl[page=0, size=-1, sort[definition.displayName ASC]]. Attempt number 9. Will wait for 1000ms.: java.lang.Exception
	at org.rhq.enterprise.server.util.QueryUtility.fetchPagedDataAndCount(QueryUtility.java:311) [rhq-server.jar:4.10.0-SNAPSHOT]
	at org.rhq.enterprise.server.util.CriteriaQueryRunner.execute(CriteriaQueryRunner.java:79) [rhq-server.jar:4.10.0-SNAPSHOT]


jshaughn pointed out that the phantom read detection stuff might be having some problems with the query in question due to its alter projection. 

I am of the opinion that this is not a blocker as the full result set does get returned to the UI, albeit more slowly. I am reassigning to Lukas since he can better assess this.

Comment 1 Lukas Krejci 2013-11-28 12:40:36 UTC
commit 25f768afb5c72e5eb0fa17c6789b3b8aaa895be9
Author: Lukas Krejci <lkrejci>
Date:   Thu Nov 28 13:39:26 2013 +0100

    [BZ 1035767] - Enable altering count query along with alterProjection
    If CriteriaQueryGeneration.alterProjection changes the projection such
    that the actual number of returned results changes (e.g. using distinct),
    the default count query would no longer match and the returned results
    would be inconsistent with the count found by the count query.
    
    Because alterProjection is essentially free-form JPQL, we need to provide
    a similar means to alter the count query so that the data query and count
    query can be modified to return consistent data.

Comment 2 Heiko W. Rupp 2014-04-23 12:31:51 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.