Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts
Summary: Group inventory pages (compatible, mixed, all) fail to display groups due to ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Operations Network
Classification: JBoss
Component: Database
Version: JON 3.1.2
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: post-GA
: One-off release
Assignee: Libor Zoubek
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1244941 1245701
TreeView+ depends on / blocked
 
Reported: 2015-07-07 23:11 UTC by Larry O'Leary
Modified: 2019-08-15 04:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1244941 (view as bug list)
Environment:
Last Closed: 2015-10-06 14:22:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 534293 0 high CLOSED slowness on resource group browser 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 535237 0 high CLOSED group rendering is slow 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 720497 0 urgent CLOSED w/ lots of compat groups defined, scrolling quickly to bottom of table on #Inventory/Groups/AllGroups view, causes gwt R... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 865698 0 unspecified NEW Usage of implicitResources in ClusterManagerBean#buildQuery causes performance issue 2022-03-31 04:27:57 UTC

Internal Links: 534293 535237 720497 865698

Description Larry O'Leary 2015-07-07 23:11:58 UTC
Description of problem:
When a JBoss ON system has a large number of groups (100) which have a large number of explicit (100) and implicit (100) members with an inventory that contains 10s of 1000s of resources with a vast availability history, the resource group pages take several seconds to display and in extreme cases will not display at all.

Version-Release number of selected component (if applicable):
3.1.2

How reproducible:
Frequently

Steps to Reproduce:
1. Install, configure, and start JBoss ON 3.1.2 system.
2. Add to inventory 100 EAP 6 servers or other resources that will allow a topology of 100 resource groups made up of 100 resources each with 100 or more child servers/services.
3. Create 100 compatible and recursive groups containing 100 explicit members each.
4. Navigate to the _Compatible Groups Inventory_ page.

Actual results:
After several seconds a modal dialog appears containing the message "Server returned FAILURE with no error message." The message center contains the following warning:

    Failed to load group composite data. This occurred because the server is taking a long time to complete this request. Please be aware that the server may still be processing your request and it may complete shortly. You can check the server logs to see if any abnormal errors occurred.

Expected results:
The list of groups should appear after a couple of seconds.

Additional info:
This issue is a direct result of Hibernate generating a very poor native query:

    select * from ( 
      select (
        select count(resourceav3_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre1_, RHQ_RESOURCE resource2_ 
        inner join RHQ_RESOURCE_AVAIL resourceav3_ on resource2_.ID=resourceav3_.RESOURCE_ID 
        where resourcegr0_.ID=explicitre1_.RESOURCE_GROUP_ID and explicitre1_.RESOURCE_ID=resource2_.ID and resource2_.INVENTORY_STATUS='COMMITTED'
      ) as col_0_0_, (
        select count(resourceav6_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre4_, RHQ_RESOURCE resource5_ 
        inner join RHQ_RESOURCE_AVAIL resourceav6_ on resource5_.ID=resourceav6_.RESOURCE_ID 
        where resourcegr0_.ID=explicitre4_.RESOURCE_GROUP_ID and explicitre4_.RESOURCE_ID=resource5_.ID and resource5_.INVENTORY_STATUS='COMMITTED' and resourceav6_.AVAILABILITY_TYPE=0
      ) as col_1_0_, (
        select count(resourceav9_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre7_, RHQ_RESOURCE resource8_ 
        inner join RHQ_RESOURCE_AVAIL resourceav9_ on resource8_.ID=resourceav9_.RESOURCE_ID 
        where resourcegr0_.ID=explicitre7_.RESOURCE_GROUP_ID and explicitre7_.RESOURCE_ID=resource8_.ID and resource8_.INVENTORY_STATUS='COMMITTED' and resourceav9_.AVAILABILITY_TYPE=2
      ) as col_2_0_, (
        select count(resourceav12_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre10_, RHQ_RESOURCE resource11_ 
        inner join RHQ_RESOURCE_AVAIL resourceav12_ on resource11_.ID=resourceav12_.RESOURCE_ID 
        where resourcegr0_.ID=explicitre10_.RESOURCE_GROUP_ID and explicitre10_.RESOURCE_ID=resource11_.ID and resource11_.INVENTORY_STATUS='COMMITTED' and resourceav12_.AVAILABILITY_TYPE=3
      ) as col_3_0_, (
        select count(resourceav15_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre13_, RHQ_RESOURCE resource14_ 
        inner join RHQ_RESOURCE_AVAIL resourceav15_ on resource14_.ID=resourceav15_.RESOURCE_ID 
        where resourcegr0_.ID=implicitre13_.RESOURCE_GROUP_ID and implicitre13_.RESOURCE_ID=resource14_.ID and resource14_.INVENTORY_STATUS='COMMITTED'
      ) as col_4_0_, (
        select count(resourceav18_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre16_, RHQ_RESOURCE resource17_ 
        inner join RHQ_RESOURCE_AVAIL resourceav18_ on resource17_.ID=resourceav18_.RESOURCE_ID 
        where resourcegr0_.ID=implicitre16_.RESOURCE_GROUP_ID and implicitre16_.RESOURCE_ID=resource17_.ID and resource17_.INVENTORY_STATUS='COMMITTED' and resourceav18_.AVAILABILITY_TYPE=0
      ) as col_5_0_, (
        select count(resourceav21_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre19_, RHQ_RESOURCE resource20_ 
        inner join RHQ_RESOURCE_AVAIL resourceav21_ on resource20_.ID=resourceav21_.RESOURCE_ID 
        where resourcegr0_.ID=implicitre19_.RESOURCE_GROUP_ID and implicitre19_.RESOURCE_ID=resource20_.ID and resource20_.INVENTORY_STATUS='COMMITTED' and resourceav21_.AVAILABILITY_TYPE=2
      ) as col_6_0_, (
        select count(resourceav24_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre22_, RHQ_RESOURCE resource23_ 
        inner join RHQ_RESOURCE_AVAIL resourceav24_ on resource23_.ID=resourceav24_.RESOURCE_ID 
        where resourcegr0_.ID=implicitre22_.RESOURCE_GROUP_ID and implicitre22_.RESOURCE_ID=resource23_.ID and resource23_.INVENTORY_STATUS='COMMITTED' and resourceav24_.AVAILABILITY_TYPE=3
      ) as col_7_0_, resourcegr0_.ID as col_8_0_ 
      from RHQ_RESOURCE_GROUP resourcegr0_ 
      where resourcegr0_.CATEGORY='COMPATIBLE' and resourcegr0_.visible=1
    ) where rownum <= 50; 

Because each column is a sub-select in itself, it is like executing 8 queries per 50 rows. In my own testing the above query took 11.1 seconds to return results. In the user's case which reported this issue, their database has 100s of 1000s more resource and resource availability records then mine. In their case execution time was 29.7 seconds.

Using an optimized approach to generate an identical result set using a native query which will work with PostgreSQL and Oracle execution time was reduced. In my case the 50 rows were returned in 0.2 seconds and in the user's case it was 1.1 seconds. 

It should also be noted that using the optimized query, the number of items per page could be increased from 50 and yield similar query execution times.

The optimized query I tested with is:

    SELECT  explicit.totalCount AS col_0_0_, explicit.availDown AS col_1_0_, explicit.availUnknown AS col_2_0_, explicit.availDisabled AS col_3_0_, implicit.totalCount AS col_4_0_, implicit.availDown AS col_5_0_, implicit.availUnknown AS col_6_0_, implicit.availDisabled AS col_7_0_, explicit.resource_group_id AS col_8_0_
    FROM    (
                SELECT      COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id
                FROM        rhq_resource_group
                LEFT JOIN   rhq_resource_group_res_exp_map ON rhq_resource_group.id = rhq_resource_group_res_exp_map.resource_group_id
                LEFT JOIN   rhq_resource ON rhq_resource_group_res_exp_map.resource_id = rhq_resource.id
                LEFT JOIN   rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id
                WHERE       rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1
                GROUP BY    rhq_resource_group.id
            ) explicit
            LEFT JOIN 
            (
                SELECT      COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id
                FROM        rhq_resource_group
                LEFT JOIN   rhq_resource_group_res_imp_map ON rhq_resource_group.id = rhq_resource_group_res_imp_map.resource_group_id
                LEFT JOIN   rhq_resource ON rhq_resource_group_res_imp_map.resource_id = rhq_resource.id
                LEFT JOIN   rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id
                WHERE       rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1
                GROUP BY    rhq_resource_group.id
            ) implicit ON explicit.resource_group_id = implicit.resource_group_id
    WHERE   rownum <= 50;

Please note that the projected column names were used to match that of the Hibernate native query. This was for testing and comparison purposes.

Comment 1 Larry O'Leary 2015-07-07 23:14:56 UTC
This seems to be related/regression of some upstream bug reports that discuss the same issue.

Bug 865698 - Usage of implicitResources in ClusterManagerBean#buildQuery causes performance issue
Bug 534293 - (RHQ-1103) slowness on resource group browser
Bug 535237 - (RHQ-1955) group rendering is slow
Bug 720497 - w/ lots of compat groups defined, scrolling quickly to bottom of table on #Inventory/Groups/AllGroups view, causes gwt RequestTimeoutException
    

However, many of these appear to have resulted in some performance optimization or discussion of such optimizations. I would therefore think this is either a regression or the upstream bugs were closed without actually resulting in any changes.

Comment 2 Jay Shaughnessy 2015-07-08 13:43:58 UTC
The code for ResourceGroupManagerBean.findResourceGroupCompositesByCriteria() is specialized in various ways to return the composite data.  It's likely these variations are causing an inefficiency.  We likely need to carefully examine the criteria query being generated and see if we can optimize.  I haven't studied Larry's optimized query above, but if it's a dedicated query just to solve this use case it would require a new call chain, because the current logic utilizes the general-purpose criteria call.

As far as I can tell this code is not changed between 3.1.x and 3.3.x.  Let me know if you want me to look into this, it's probably code I wrote...

Comment 3 Libor Zoubek 2015-07-08 15:03:21 UTC
Yeah, it is pretty tricky, because the code combines ResourceGroupCriteria together with JPA queries.

I am working on rewriting the original JPA subquery to what Larry suggests. (not sure if it's doable, since I did not get to valid query yet and JPA may not support CASE statements inside SUM statements.

Comment 4 Libor Zoubek 2015-07-14 07:40:47 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/5cc77db6b
time:    2015-07-09 14:34:39 +0200
commit:  5cc77db6b8ec6873a95d596d04376f4dccf08b32
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display groups due to UI timeouts

         Replace subqueries by JOINs. Due to JPA limitation of
         subqueries (One cannot write select x from (select y from z)).
         One query has been replaced with 2
         (or 3) queries which we have to join in memory. This should not
         be a perf issue, since we're joining composite instance sets
         (max count of such set equals to total resource group count in
         DB). A case when 2 (or 3) subsequent queries could return sets
         of different sizes (due to underlying group deletion/addition)
         is handled as well.


branch:  master
link:    https://github.com/rhq-project/rhq/commit/0f6a0f949
time:    2015-07-13 16:17:30 +0200
commit:  0f6a0f9498139d0baa49f4cd830204a1b80138b8
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display
         Fix counting total resource count (only count COMMITTED
         resources)

Comment 5 Jay Shaughnessy 2015-07-14 13:35:31 UTC
Just mentioning that I reviewed Libor's solution and I could not find any further optimizations.

Comment 7 Libor Zoubek 2015-07-31 17:33:35 UTC
There was an issue with paging. Result pageList did not have totalSize and pageControl set, thatswhy totalSize was equal to number of items inside the list - which was based on pageSize from input criteria.

branch:  master
link:    https://github.com/rhq-project/rhq/commit/abfcbdbdc
time:    2015-07-31 19:28:39 +0200
commit:  abfcbdbdcc0122cfef82a78aea213a5c696195c5
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display
         PageList which contains in memory joined results *must* contain
         paging info taken from one of original pagelists

Comment 15 Libor Zoubek 2015-08-28 07:58:51 UTC
postgres tests passed on jenkins, although oracle env has some issues, so these tests were skipped. I manuallly tested postgres & oracle anyway.

branch:  master
link:    https://github.com/rhq-project/rhq/commit/03d638722
time:    2015-08-27 12:32:31 +0200
commit:  03d63872248b6eb07173cdb02f2a546bbb02ab9f
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display groups due to UI timeouts
         Fix generated GROUP BY clauses. On oracle it is not enough to
         use group by resourcegroup, but it need also the field name (so
         it has to look like
         'resourcegroup.id'). Added various tests for 
         findResourceGroupCompositesByCriteria

Comment 16 Libor Zoubek 2015-09-15 12:50:25 UTC
One more attempt to make the queries faster.

branch:  master
link:    https://github.com/rhq-project/rhq/commit/62f0e6bf2
time:    2015-09-15 14:48:35 +0200
commit:  62f0e6bf27f0ee6aeb09a1581795ba15c80f20c1
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display

         Re-write to make queries even faster. We select only groups
         based on initial criteria, then we select composites
         (implicit/explicit) but only by group ids from first query.
         This should be faster, because database has to order results
         only once - when selecting plain groups. Expensive
         join/composite queries can be unordered.

Comment 18 Libor Zoubek 2015-09-17 08:56:03 UTC
Larry, I was not aware we're using JPA 1.0 in 3.1.2 and even EntityManager.createQuery(String, Class) was something introduced in version 2.

Fixed by bellow commit

branch:  master
link:    https://github.com/rhq-project/rhq/commit/fb177ff5b
time:    2015-09-17 10:53:24 +0200
commit:  fb177ff5b354af14fc05d13b165c5df953070cf0
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to …
         …display

         Use non-generic (JPA 1.0 compatible) way to create queries

Comment 20 Libor Zoubek 2015-09-22 08:52:03 UTC
branch:  master
link:    https://github.com/rhq-project/rhq/commit/6e881b44f
time:    2015-09-22 10:49:14 +0200
commit:  6e881b44f77beacaa83a1b9154c7439d417fec15
author:  Libor Zoubek - lzoubek
message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to
         display groups due to UI timeouts

         Load group facets to the right ResouceGroupComposite instances

Comment 24 Larry O'Leary 2015-10-06 14:22:48 UTC
This fix was released in server hotfix-10 for JBoss ON 3.1.2.


Note You need to log in before you can comment on or make changes to this bug.