Description of problem: When a JBoss ON system has a large number of groups (100) which have a large number of explicit (100) and implicit (100) members with an inventory that contains 10s of 1000s of resources with a vast availability history, the resource group pages take several seconds to display and in extreme cases will not display at all. Version-Release number of selected component (if applicable): 3.1.2 How reproducible: Frequently Steps to Reproduce: 1. Install, configure, and start JBoss ON 3.1.2 system. 2. Add to inventory 100 EAP 6 servers or other resources that will allow a topology of 100 resource groups made up of 100 resources each with 100 or more child servers/services. 3. Create 100 compatible and recursive groups containing 100 explicit members each. 4. Navigate to the _Compatible Groups Inventory_ page. Actual results: After several seconds a modal dialog appears containing the message "Server returned FAILURE with no error message." The message center contains the following warning: Failed to load group composite data. This occurred because the server is taking a long time to complete this request. Please be aware that the server may still be processing your request and it may complete shortly. You can check the server logs to see if any abnormal errors occurred. Expected results: The list of groups should appear after a couple of seconds. Additional info: This issue is a direct result of Hibernate generating a very poor native query: select * from ( select ( select count(resourceav3_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre1_, RHQ_RESOURCE resource2_ inner join RHQ_RESOURCE_AVAIL resourceav3_ on resource2_.ID=resourceav3_.RESOURCE_ID where resourcegr0_.ID=explicitre1_.RESOURCE_GROUP_ID and explicitre1_.RESOURCE_ID=resource2_.ID and resource2_.INVENTORY_STATUS='COMMITTED' ) as col_0_0_, ( select count(resourceav6_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre4_, RHQ_RESOURCE resource5_ inner join RHQ_RESOURCE_AVAIL resourceav6_ on resource5_.ID=resourceav6_.RESOURCE_ID where resourcegr0_.ID=explicitre4_.RESOURCE_GROUP_ID and explicitre4_.RESOURCE_ID=resource5_.ID and resource5_.INVENTORY_STATUS='COMMITTED' and resourceav6_.AVAILABILITY_TYPE=0 ) as col_1_0_, ( select count(resourceav9_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre7_, RHQ_RESOURCE resource8_ inner join RHQ_RESOURCE_AVAIL resourceav9_ on resource8_.ID=resourceav9_.RESOURCE_ID where resourcegr0_.ID=explicitre7_.RESOURCE_GROUP_ID and explicitre7_.RESOURCE_ID=resource8_.ID and resource8_.INVENTORY_STATUS='COMMITTED' and resourceav9_.AVAILABILITY_TYPE=2 ) as col_2_0_, ( select count(resourceav12_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre10_, RHQ_RESOURCE resource11_ inner join RHQ_RESOURCE_AVAIL resourceav12_ on resource11_.ID=resourceav12_.RESOURCE_ID where resourcegr0_.ID=explicitre10_.RESOURCE_GROUP_ID and explicitre10_.RESOURCE_ID=resource11_.ID and resource11_.INVENTORY_STATUS='COMMITTED' and resourceav12_.AVAILABILITY_TYPE=3 ) as col_3_0_, ( select count(resourceav15_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre13_, RHQ_RESOURCE resource14_ inner join RHQ_RESOURCE_AVAIL resourceav15_ on resource14_.ID=resourceav15_.RESOURCE_ID where resourcegr0_.ID=implicitre13_.RESOURCE_GROUP_ID and implicitre13_.RESOURCE_ID=resource14_.ID and resource14_.INVENTORY_STATUS='COMMITTED' ) as col_4_0_, ( select count(resourceav18_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre16_, RHQ_RESOURCE resource17_ inner join RHQ_RESOURCE_AVAIL resourceav18_ on resource17_.ID=resourceav18_.RESOURCE_ID where resourcegr0_.ID=implicitre16_.RESOURCE_GROUP_ID and implicitre16_.RESOURCE_ID=resource17_.ID and resource17_.INVENTORY_STATUS='COMMITTED' and resourceav18_.AVAILABILITY_TYPE=0 ) as col_5_0_, ( select count(resourceav21_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre19_, RHQ_RESOURCE resource20_ inner join RHQ_RESOURCE_AVAIL resourceav21_ on resource20_.ID=resourceav21_.RESOURCE_ID where resourcegr0_.ID=implicitre19_.RESOURCE_GROUP_ID and implicitre19_.RESOURCE_ID=resource20_.ID and resource20_.INVENTORY_STATUS='COMMITTED' and resourceav21_.AVAILABILITY_TYPE=2 ) as col_6_0_, ( select count(resourceav24_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre22_, RHQ_RESOURCE resource23_ inner join RHQ_RESOURCE_AVAIL resourceav24_ on resource23_.ID=resourceav24_.RESOURCE_ID where resourcegr0_.ID=implicitre22_.RESOURCE_GROUP_ID and implicitre22_.RESOURCE_ID=resource23_.ID and resource23_.INVENTORY_STATUS='COMMITTED' and resourceav24_.AVAILABILITY_TYPE=3 ) as col_7_0_, resourcegr0_.ID as col_8_0_ from RHQ_RESOURCE_GROUP resourcegr0_ where resourcegr0_.CATEGORY='COMPATIBLE' and resourcegr0_.visible=1 ) where rownum <= 50; Because each column is a sub-select in itself, it is like executing 8 queries per 50 rows. In my own testing the above query took 11.1 seconds to return results. In the user's case which reported this issue, their database has 100s of 1000s more resource and resource availability records then mine. In their case execution time was 29.7 seconds. Using an optimized approach to generate an identical result set using a native query which will work with PostgreSQL and Oracle execution time was reduced. In my case the 50 rows were returned in 0.2 seconds and in the user's case it was 1.1 seconds. It should also be noted that using the optimized query, the number of items per page could be increased from 50 and yield similar query execution times. The optimized query I tested with is: SELECT explicit.totalCount AS col_0_0_, explicit.availDown AS col_1_0_, explicit.availUnknown AS col_2_0_, explicit.availDisabled AS col_3_0_, implicit.totalCount AS col_4_0_, implicit.availDown AS col_5_0_, implicit.availUnknown AS col_6_0_, implicit.availDisabled AS col_7_0_, explicit.resource_group_id AS col_8_0_ FROM ( SELECT COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id FROM rhq_resource_group LEFT JOIN rhq_resource_group_res_exp_map ON rhq_resource_group.id = rhq_resource_group_res_exp_map.resource_group_id LEFT JOIN rhq_resource ON rhq_resource_group_res_exp_map.resource_id = rhq_resource.id LEFT JOIN rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id WHERE rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1 GROUP BY rhq_resource_group.id ) explicit LEFT JOIN ( SELECT COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id FROM rhq_resource_group LEFT JOIN rhq_resource_group_res_imp_map ON rhq_resource_group.id = rhq_resource_group_res_imp_map.resource_group_id LEFT JOIN rhq_resource ON rhq_resource_group_res_imp_map.resource_id = rhq_resource.id LEFT JOIN rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id WHERE rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1 GROUP BY rhq_resource_group.id ) implicit ON explicit.resource_group_id = implicit.resource_group_id WHERE rownum <= 50; Please note that the projected column names were used to match that of the Hibernate native query. This was for testing and comparison purposes.
This seems to be related/regression of some upstream bug reports that discuss the same issue. Bug 865698 - Usage of implicitResources in ClusterManagerBean#buildQuery causes performance issue Bug 534293 - (RHQ-1103) slowness on resource group browser Bug 535237 - (RHQ-1955) group rendering is slow Bug 720497 - w/ lots of compat groups defined, scrolling quickly to bottom of table on #Inventory/Groups/AllGroups view, causes gwt RequestTimeoutException However, many of these appear to have resulted in some performance optimization or discussion of such optimizations. I would therefore think this is either a regression or the upstream bugs were closed without actually resulting in any changes.
The code for ResourceGroupManagerBean.findResourceGroupCompositesByCriteria() is specialized in various ways to return the composite data. It's likely these variations are causing an inefficiency. We likely need to carefully examine the criteria query being generated and see if we can optimize. I haven't studied Larry's optimized query above, but if it's a dedicated query just to solve this use case it would require a new call chain, because the current logic utilizes the general-purpose criteria call. As far as I can tell this code is not changed between 3.1.x and 3.3.x. Let me know if you want me to look into this, it's probably code I wrote...
Yeah, it is pretty tricky, because the code combines ResourceGroupCriteria together with JPA queries. I am working on rewriting the original JPA subquery to what Larry suggests. (not sure if it's doable, since I did not get to valid query yet and JPA may not support CASE statements inside SUM statements.
branch: master link: https://github.com/rhq-project/rhq/commit/5cc77db6b time: 2015-07-09 14:34:39 +0200 commit: 5cc77db6b8ec6873a95d596d04376f4dccf08b32 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Replace subqueries by JOINs. Due to JPA limitation of subqueries (One cannot write select x from (select y from z)). One query has been replaced with 2 (or 3) queries which we have to join in memory. This should not be a perf issue, since we're joining composite instance sets (max count of such set equals to total resource group count in DB). A case when 2 (or 3) subsequent queries could return sets of different sizes (due to underlying group deletion/addition) is handled as well. branch: master link: https://github.com/rhq-project/rhq/commit/0f6a0f949 time: 2015-07-13 16:17:30 +0200 commit: 0f6a0f9498139d0baa49f4cd830204a1b80138b8 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Fix counting total resource count (only count COMMITTED resources)
Just mentioning that I reviewed Libor's solution and I could not find any further optimizations.
There was an issue with paging. Result pageList did not have totalSize and pageControl set, thatswhy totalSize was equal to number of items inside the list - which was based on pageSize from input criteria. branch: master link: https://github.com/rhq-project/rhq/commit/abfcbdbdc time: 2015-07-31 19:28:39 +0200 commit: abfcbdbdcc0122cfef82a78aea213a5c696195c5 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display PageList which contains in memory joined results *must* contain paging info taken from one of original pagelists
postgres tests passed on jenkins, although oracle env has some issues, so these tests were skipped. I manuallly tested postgres & oracle anyway. branch: master link: https://github.com/rhq-project/rhq/commit/03d638722 time: 2015-08-27 12:32:31 +0200 commit: 03d63872248b6eb07173cdb02f2a546bbb02ab9f author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Fix generated GROUP BY clauses. On oracle it is not enough to use group by resourcegroup, but it need also the field name (so it has to look like 'resourcegroup.id'). Added various tests for findResourceGroupCompositesByCriteria
One more attempt to make the queries faster. branch: master link: https://github.com/rhq-project/rhq/commit/62f0e6bf2 time: 2015-09-15 14:48:35 +0200 commit: 62f0e6bf27f0ee6aeb09a1581795ba15c80f20c1 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Re-write to make queries even faster. We select only groups based on initial criteria, then we select composites (implicit/explicit) but only by group ids from first query. This should be faster, because database has to order results only once - when selecting plain groups. Expensive join/composite queries can be unordered.
Larry, I was not aware we're using JPA 1.0 in 3.1.2 and even EntityManager.createQuery(String, Class) was something introduced in version 2. Fixed by bellow commit branch: master link: https://github.com/rhq-project/rhq/commit/fb177ff5b time: 2015-09-17 10:53:24 +0200 commit: fb177ff5b354af14fc05d13b165c5df953070cf0 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to … …display Use non-generic (JPA 1.0 compatible) way to create queries
branch: master link: https://github.com/rhq-project/rhq/commit/6e881b44f time: 2015-09-22 10:49:14 +0200 commit: 6e881b44f77beacaa83a1b9154c7439d417fec15 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Load group facets to the right ResouceGroupComposite instances
This fix was released in server hotfix-10 for JBoss ON 3.1.2.