+++ This bug was initially created as a clone of Bug #1240854 +++ Description of problem: When a JBoss ON system has a large number of groups (100) which have a large number of explicit (100) and implicit (100) members with an inventory that contains 10s of 1000s of resources with a vast availability history, the resource group pages take several seconds to display and in extreme cases will not display at all. Version-Release number of selected component (if applicable): 3.1.2 How reproducible: Frequently Steps to Reproduce: 1. Install, configure, and start JBoss ON 3.1.2 system. 2. Add to inventory 100 EAP 6 servers or other resources that will allow a topology of 100 resource groups made up of 100 resources each with 100 or more child servers/services. 3. Create 100 compatible and recursive groups containing 100 explicit members each. 4. Navigate to the _Compatible Groups Inventory_ page. Actual results: After several seconds a modal dialog appears containing the message "Server returned FAILURE with no error message." The message center contains the following warning: Failed to load group composite data. This occurred because the server is taking a long time to complete this request. Please be aware that the server may still be processing your request and it may complete shortly. You can check the server logs to see if any abnormal errors occurred. Expected results: The list of groups should appear after a couple of seconds. Additional info: This issue is a direct result of Hibernate generating a very poor native query: select * from ( select ( select count(resourceav3_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre1_, RHQ_RESOURCE resource2_ inner join RHQ_RESOURCE_AVAIL resourceav3_ on resource2_.ID=resourceav3_.RESOURCE_ID where resourcegr0_.ID=explicitre1_.RESOURCE_GROUP_ID and explicitre1_.RESOURCE_ID=resource2_.ID and resource2_.INVENTORY_STATUS='COMMITTED' ) as col_0_0_, ( select count(resourceav6_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre4_, RHQ_RESOURCE resource5_ inner join RHQ_RESOURCE_AVAIL resourceav6_ on resource5_.ID=resourceav6_.RESOURCE_ID where resourcegr0_.ID=explicitre4_.RESOURCE_GROUP_ID and explicitre4_.RESOURCE_ID=resource5_.ID and resource5_.INVENTORY_STATUS='COMMITTED' and resourceav6_.AVAILABILITY_TYPE=0 ) as col_1_0_, ( select count(resourceav9_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre7_, RHQ_RESOURCE resource8_ inner join RHQ_RESOURCE_AVAIL resourceav9_ on resource8_.ID=resourceav9_.RESOURCE_ID where resourcegr0_.ID=explicitre7_.RESOURCE_GROUP_ID and explicitre7_.RESOURCE_ID=resource8_.ID and resource8_.INVENTORY_STATUS='COMMITTED' and resourceav9_.AVAILABILITY_TYPE=2 ) as col_2_0_, ( select count(resourceav12_.ID) from RHQ_RESOURCE_GROUP_RES_EXP_MAP explicitre10_, RHQ_RESOURCE resource11_ inner join RHQ_RESOURCE_AVAIL resourceav12_ on resource11_.ID=resourceav12_.RESOURCE_ID where resourcegr0_.ID=explicitre10_.RESOURCE_GROUP_ID and explicitre10_.RESOURCE_ID=resource11_.ID and resource11_.INVENTORY_STATUS='COMMITTED' and resourceav12_.AVAILABILITY_TYPE=3 ) as col_3_0_, ( select count(resourceav15_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre13_, RHQ_RESOURCE resource14_ inner join RHQ_RESOURCE_AVAIL resourceav15_ on resource14_.ID=resourceav15_.RESOURCE_ID where resourcegr0_.ID=implicitre13_.RESOURCE_GROUP_ID and implicitre13_.RESOURCE_ID=resource14_.ID and resource14_.INVENTORY_STATUS='COMMITTED' ) as col_4_0_, ( select count(resourceav18_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre16_, RHQ_RESOURCE resource17_ inner join RHQ_RESOURCE_AVAIL resourceav18_ on resource17_.ID=resourceav18_.RESOURCE_ID where resourcegr0_.ID=implicitre16_.RESOURCE_GROUP_ID and implicitre16_.RESOURCE_ID=resource17_.ID and resource17_.INVENTORY_STATUS='COMMITTED' and resourceav18_.AVAILABILITY_TYPE=0 ) as col_5_0_, ( select count(resourceav21_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre19_, RHQ_RESOURCE resource20_ inner join RHQ_RESOURCE_AVAIL resourceav21_ on resource20_.ID=resourceav21_.RESOURCE_ID where resourcegr0_.ID=implicitre19_.RESOURCE_GROUP_ID and implicitre19_.RESOURCE_ID=resource20_.ID and resource20_.INVENTORY_STATUS='COMMITTED' and resourceav21_.AVAILABILITY_TYPE=2 ) as col_6_0_, ( select count(resourceav24_.ID) from RHQ_RESOURCE_GROUP_RES_IMP_MAP implicitre22_, RHQ_RESOURCE resource23_ inner join RHQ_RESOURCE_AVAIL resourceav24_ on resource23_.ID=resourceav24_.RESOURCE_ID where resourcegr0_.ID=implicitre22_.RESOURCE_GROUP_ID and implicitre22_.RESOURCE_ID=resource23_.ID and resource23_.INVENTORY_STATUS='COMMITTED' and resourceav24_.AVAILABILITY_TYPE=3 ) as col_7_0_, resourcegr0_.ID as col_8_0_ from RHQ_RESOURCE_GROUP resourcegr0_ where resourcegr0_.CATEGORY='COMPATIBLE' and resourcegr0_.visible=1 ) where rownum <= 50; Because each column is a sub-select in itself, it is like executing 8 queries per 50 rows. In my own testing the above query took 11.1 seconds to return results. In the user's case which reported this issue, their database has 100s of 1000s more resource and resource availability records then mine. In their case execution time was 29.7 seconds. Using an optimized approach to generate an identical result set using a native query which will work with PostgreSQL and Oracle execution time was reduced. In my case the 50 rows were returned in 0.2 seconds and in the user's case it was 1.1 seconds. It should also be noted that using the optimized query, the number of items per page could be increased from 50 and yield similar query execution times. The optimized query I tested with is: SELECT explicit.totalCount AS col_0_0_, explicit.availDown AS col_1_0_, explicit.availUnknown AS col_2_0_, explicit.availDisabled AS col_3_0_, implicit.totalCount AS col_4_0_, implicit.availDown AS col_5_0_, implicit.availUnknown AS col_6_0_, implicit.availDisabled AS col_7_0_, explicit.resource_group_id AS col_8_0_ FROM ( SELECT COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id FROM rhq_resource_group LEFT JOIN rhq_resource_group_res_exp_map ON rhq_resource_group.id = rhq_resource_group_res_exp_map.resource_group_id LEFT JOIN rhq_resource ON rhq_resource_group_res_exp_map.resource_id = rhq_resource.id LEFT JOIN rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id WHERE rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1 GROUP BY rhq_resource_group.id ) explicit LEFT JOIN ( SELECT COUNT(rhq_resource_avail.resource_id) AS totalCount, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 0 THEN 1 ELSE 0 END) AS availDown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 2 THEN 1 ELSE 0 END) AS availUnknown, SUM(CASE WHEN rhq_resource.inventory_status = 'COMMITTED' AND rhq_resource_avail.availability_type = 3 THEN 1 ELSE 0 END) AS availDisabled, rhq_resource_group.id AS resource_group_id FROM rhq_resource_group LEFT JOIN rhq_resource_group_res_imp_map ON rhq_resource_group.id = rhq_resource_group_res_imp_map.resource_group_id LEFT JOIN rhq_resource ON rhq_resource_group_res_imp_map.resource_id = rhq_resource.id LEFT JOIN rhq_resource_avail ON rhq_resource.id = rhq_resource_avail.resource_id WHERE rhq_resource_group.category = 'COMPATIBLE' AND rhq_resource_group.visible = 1 GROUP BY rhq_resource_group.id ) implicit ON explicit.resource_group_id = implicit.resource_group_id WHERE rownum <= 50; Please note that the projected column names were used to match that of the Hibernate native query. This was for testing and comparison purposes. --- Additional comment from Larry O'Leary on 2015-07-07 19:14:56 EDT --- This seems to be related/regression of some upstream bug reports that discuss the same issue. Bug 865698 - Usage of implicitResources in ClusterManagerBean#buildQuery causes performance issue Bug 534293 - (RHQ-1103) slowness on resource group browser Bug 535237 - (RHQ-1955) group rendering is slow Bug 720497 - w/ lots of compat groups defined, scrolling quickly to bottom of table on #Inventory/Groups/AllGroups view, causes gwt RequestTimeoutException However, many of these appear to have resulted in some performance optimization or discussion of such optimizations. I would therefore think this is either a regression or the upstream bugs were closed without actually resulting in any changes. --- Additional comment from Jay Shaughnessy on 2015-07-08 09:43:58 EDT --- The code for ResourceGroupManagerBean.findResourceGroupCompositesByCriteria() is specialized in various ways to return the composite data. It's likely these variations are causing an inefficiency. We likely need to carefully examine the criteria query being generated and see if we can optimize. I haven't studied Larry's optimized query above, but if it's a dedicated query just to solve this use case it would require a new call chain, because the current logic utilizes the general-purpose criteria call. As far as I can tell this code is not changed between 3.1.x and 3.3.x. Let me know if you want me to look into this, it's probably code I wrote... --- Additional comment from Libor Zoubek on 2015-07-08 11:03:21 EDT --- Yeah, it is pretty tricky, because the code combines ResourceGroupCriteria together with JPA queries. I am working on rewriting the original JPA subquery to what Larry suggests. (not sure if it's doable, since I did not get to valid query yet and JPA may not support CASE statements inside SUM statements. --- Additional comment from Libor Zoubek on 2015-07-14 03:40:47 EDT --- branch: master link: https://github.com/rhq-project/rhq/commit/5cc77db6b time: 2015-07-09 14:34:39 +0200 commit: 5cc77db6b8ec6873a95d596d04376f4dccf08b32 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Replace subqueries by JOINs. Due to JPA limitation of subqueries (One cannot write select x from (select y from z)). One query has been replaced with 2 (or 3) queries which we have to join in memory. This should not be a perf issue, since we're joining composite instance sets (max count of such set equals to total resource group count in DB). A case when 2 (or 3) subsequent queries could return sets of different sizes (due to underlying group deletion/addition) is handled as well. branch: master link: https://github.com/rhq-project/rhq/commit/0f6a0f949 time: 2015-07-13 16:17:30 +0200 commit: 0f6a0f9498139d0baa49f4cd830204a1b80138b8 author: Libor Zoubek - lzoubek message: Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Fix counting total resource count (only count COMMITTED resources) --- Additional comment from Jay Shaughnessy on 2015-07-14 09:35:31 EDT --- Just mentioning that I reviewed Libor's solution and I could not find any further optimizations.
There is a few commits in master to cherry-pick: https://github.com/rhq-project/rhq/commit/5cc77db6b8ec6873a95d596d04376f4dccf08b32 https://github.com/rhq-project/rhq/commit/0f6a0f9498139d0baa49f4cd830204a1b80138b8 https://github.com/rhq-project/rhq/commit/abfcbdbdcc0122cfef82a78aea213a5c696195c5 https://github.com/rhq-project/rhq/commit/ee0902a5508e16363bbe6b482e423d109f4a0738 https://github.com/rhq-project/rhq/commit/e835d8957e5d0a61d85d67f9755cf56c0f65c655 https://github.com/rhq-project/rhq/commit/03d63872248b6eb07173cdb02f2a546bbb02ab9f https://github.com/rhq-project/rhq/commit/62f0e6bf27f0ee6aeb09a1581795ba15c80f20c1 https://github.com/rhq-project/rhq/commit/fb177ff5b354af14fc05d13b165c5df953070cf0 https://github.com/rhq-project/rhq/commit/6e881b44f77beacaa83a1b9154c7439d417fec15 https://github.com/rhq-project/rhq/commit/8ebc027a5fbc93902f1ff1c4cea4b30a9740deaa
in 3.3.x branch commit d822e24d62af87e5a201194f7ffa4cf854e080f6 Author: Libor Zoubek <lzoubek> Date: Fri Sep 25 16:15:59 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to Update copyright (cherry picked from commit 8ebc027a5fbc93902f1ff1c4cea4b30a9740deaa) Signed-off-by: Libor Zoubek <lzoubek> commit 09ae6ab2dee9ed9f3eaaf6d7115173712c68e71f Author: Libor Zoubek <lzoubek> Date: Tue Sep 22 10:49:14 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Load group facets to the right ResouceGroupComposite instances (cherry picked from commit 6e881b44f77beacaa83a1b9154c7439d417fec15) Signed-off-by: Libor Zoubek <lzoubek> commit bec870df3a5f3dfeba1038108930b3a6befbd266 Author: Libor Zoubek <lzoubek> Date: Thu Sep 17 10:51:02 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to … …display Use non-generic (JPA 1.0 compatible) way to create queries (cherry picked from commit fb177ff5b354af14fc05d13b165c5df953070cf0) Signed-off-by: Libor Zoubek <lzoubek> commit ddc8ac6f6489fd9fb82fb247fe2f7d82adac3d0a Author: Libor Zoubek <lzoubek> Date: Tue Sep 15 14:42:16 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Re-write to make queries even faster. We select only groups based on initial criteria, then we select composites (implicit/explicit) but only by group ids from first query. This should be faster, because database has to order results only once - when selecting plain groups. Expensive join/composite queries can be unordered. (cherry picked from commit 62f0e6bf27f0ee6aeb09a1581795ba15c80f20c1) Signed-off-by: Libor Zoubek <lzoubek> commit cc9f5c0d50f5977a792d1aa23b692e00ef1957c1 Author: Libor Zoubek <lzoubek> Date: Thu Aug 27 11:52:17 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Fix generated GROUP BY clauses. On oracle it is not enough to use group by resourcegroup, but it need also the field name (so it has to look like 'resourcegroup.id'). Added various tests for findResourceGroupCompositesByCriteria (cherry picked from commit 03d63872248b6eb07173cdb02f2a546bbb02ab9f) Signed-off-by: Libor Zoubek <lzoubek> commit a44c9a11f2f240610fd51c54a1469db300b9d6f5 Author: Libor Zoubek <lzoubek> Date: Tue Aug 18 23:17:48 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts CriteriaQueryGenerator is now able to write group by aliases, based on ordering fields. In a specific case, when we alter projection, override FROM clause (whith several JOINs) override GROUP BY clause the resulting query must be grouped by ordering fields - this fix in CriteriaQueryGenerator adds a new flag, which enables this feature - so default behaviour does not change. This fixes issue with ordering of ResourceGroupComposites based on criteria. (cherry picked from commit e835d8957e5d0a61d85d67f9755cf56c0f65c655) Signed-off-by: Libor Zoubek <lzoubek> commit b6639167915dbeb6eb040a475d59b9416100b985 Author: Libor Zoubek <lzoubek> Date: Tue Aug 18 23:15:48 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts use left (outer) joins, so we don't miss empty groups (cherry picked from commit ee0902a5508e16363bbe6b482e423d109f4a0738) Signed-off-by: Libor Zoubek <lzoubek> commit e3fc0ba6e615661feec26a65599479cbb5f0e2ef Author: Libor Zoubek <lzoubek> Date: Fri Jul 31 19:28:39 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display PageList which contains in memory joined results *must* contain paging info taken from one of original pagelists (cherry picked from commit abfcbdbdcc0122cfef82a78aea213a5c696195c5) Signed-off-by: Libor Zoubek <lzoubek> commit 152a1085b301f444e6795a69bed59290fbfc80f5 Author: Libor Zoubek <lzoubek> Date: Mon Jul 13 16:17:30 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Fix counting total resource count (only count COMMITTED resources) (cherry picked from commit 0f6a0f9498139d0baa49f4cd830204a1b80138b8) Signed-off-by: Libor Zoubek <lzoubek> commit d013835ff2fad09c548a61df04bd0de386890090 Author: Libor Zoubek <lzoubek> Date: Thu Jul 9 14:34:39 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Replace subqueries by JOINs. Due to JPA limitation of subqueries (One cannot write select x from (select y from z)). One query has been replaced with 2 (or 3) queries which we have to join in memory. This should not be a perf issue, since we're joining composite instance sets (max count of such set equals to total resource group count in DB). A case when 2 (or 3) subsequent queries could return sets of different sizes (due to underlying group deletion/addition) is handled as well. (cherry picked from commit 5cc77db6b8ec6873a95d596d04376f4dccf08b32) Signed-off-by: Libor Zoubek <lzoubek> commit 5419181f4f88e6e4b4dd4de492598f4acf43c5a0 Author: Michael Burman <miburman> Date: Tue Sep 29 18:49:42 2015 +0300 [BZ 1258870] Expose purgeEventForContext and deleteEventsForContext to remote API (cherry picked from commit fc3849c64739399d95c76c01316775e2b7b044f2) Signed-off-by: Libor Zoubek <lzoubek> commit ee4afd78df30af016539b925de06179827c40773 Author: Libor Zoubek <lzoubek> Date: Tue Sep 8 16:04:11 2015 +0200 Bug 1234912 - Do not authenticate against new storage node when replication_factor of system_auth keyspace is wrong taken from one of original pagelists (cherry picked from commit abfcbdbdcc0122cfef82a78aea213a5c696195c5) Signed-off-by: Libor Zoubek <lzoubek> commit 152a1085b301f444e6795a69bed59290fbfc80f5 Author: Libor Zoubek <lzoubek> Date: Mon Jul 13 16:17:30 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display Fix counting total resource count (only count COMMITTED resources) (cherry picked from commit 0f6a0f9498139d0baa49f4cd830204a1b80138b8) Signed-off-by: Libor Zoubek <lzoubek> commit d013835ff2fad09c548a61df04bd0de386890090 Author: Libor Zoubek <lzoubek> Date: Thu Jul 9 14:34:39 2015 +0200 Bug 1240854 - Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts Replace subqueries by JOINs. Due to JPA limitation of subqueries (One cannot write select x from (select y from z)). One query has been replaced with 2 (or 3) queries which we have to join in memory. This should not be a perf issue, since we're joining composite instance sets (max count of such set equals to total resource group count in DB). A case when 2 (or 3) subsequent queries could return sets of different sizes (due to underlying group deletion/addition) is handled as well. (cherry picked from commit 5cc77db6b8ec6873a95d596d04376f4dccf08b32) Signed-off-by: Libor Zoubek <lzoubek>
Moving to ON_QA as available to test with the following build: https://brewweb.devel.redhat.com/buildinfo?buildID=460382 *Note: jon-server-patch-3.3.0.GA.zip maps to ER01 build of jon-server-3.3.0.GA-update-04.zip.
Verified on: Version : 3.3.0.GA Update 04 Build Number : 821a526:fa7b1a1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1947.html