534293 – (RHQ-1103) slowness on resource group browser

Bug 534293 (RHQ-1103) - slowness on resource group browser

Summary: slowness on resource group browser

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	RHQ-1103
Product:	RHQ Project
Classification:	Other
Component:	Core UI
Sub Component:
Version:	1.1
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Joseph Marques
QA Contact:	Jeff Weiss
Docs Contact:
URL:	http://jira.rhq-project.org/browse/RH...
Whiteboard:
Depends On:	RHQ-1509
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-11 22:32 UTC by Joseph Marques
Modified:	2015-07-07 23:14 UTC (History)
CC List:	1 user (show)
Fixed In Version:	1.2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:	when running a 6-server HA environment. only 7 groups but memberships sizes ranging from 80 to 320, with mixed availability
Last Closed:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1240854	0	unspecified	CLOSED	Group inventory pages (compatible, mixed, all) fail to display groups due to UI timeouts	2021-02-22 00:41:40 UTC

Internal Links: 1240854

Description Joseph Marques 2008-11-11 22:32:00 UTC

significant slowness loading the compatible groups page.

Comment 1 John Mazzitelli 2008-11-12 00:11:03 UTC

We were just discussing this very thing this morning. On Greg's plate to think of a way to fix this.

See: http://jira.rhq-project.org/browse/RHQ-1002

Comment 2 John Mazzitelli 2008-11-12 00:19:13 UTC

I may have jumped the gun - are you saying the "resource hub" page is slow?  i.e. clicking on Browse Resources -> Compatible Groups link?

If so, that other JIRA is probably related but doesn't address that exactly.

reopening just in case

Comment 3 John Mazzitelli 2008-11-12 13:17:42 UTC

from joe:

i think i might have a bead on why we see contention from time to time with queries like

update RHQ_AVAILABILITY set START_TIME=:1, END_TIME=:2, AVAILABILITY_TYPE=:3, RESOURCE_ID=:4 where ID=:5

the suspicion is that the group browser might be taking a long time to load because it takes a long time to find the current availabilities of all resources in the system, i created a recursive dynagroup of "groupby resource.parent.trait[Trait.hostname]", which created 6 groups and put all resources into each that exist on each machine. the membership counts were: 16000, 3530, 1864, 653, 653, and 240.

it took 123 seconds to load the group definition view page, which essentially has the same logic as the group browser except that it's filtered by groupDefinitionId. yes, this is an extreme example, but i did it on purpose to test performance and database contention. sure enough, after doing so, i saw a lot of update statements (like the one above) sitting there blocked.

so, i think the availability subquery is hurting us here. after looking at some of the composite queries across the system that pull in the current availability, i see that we're using two different strategies: max(starttime) or endtime is null. we probably want to shy away from null endtime lookups, because there are plenty of dbs out there that can't index a null value...their argument being that you can't index a lack of a value).

but more than that, i think we should have a direct link from resource to availability. this way, we can use JPQL fragments like "resource.currentAvailability" or something like that. this would enable us to bypass the subquery altogether because joins would do a direct index lookup (as opposed to subquery) to find the availability row in question. however, that requires a little bit extra code in the availability manager to set the current availability on the corresponding resource object when the new RLE data comes across the line.

note, this is what we do for the pluginConfiguration and resourceConfiguration objects too (though, that was done to make navigating the object graph starting from a resource easier so as to make the dynamic JPQL generation for dynagroups simpler).

Comment 4 John Mazzitelli 2008-11-12 13:19:29 UTC

> i think we should have a direct link from resource to availability

I'd be hesitant to add yet another relationship off of the Resource entity - won't that add more contention on an already-very-busy entity? (i.e. will we add additional contention elsewhere in the system that is using the RHQ_RESOURCE table - which is probably used in much more places than the avail data)

Comment 5 Joseph Marques 2008-11-12 18:37:52 UTC

a couple reasons i think we should have a direct link:

* although we have many things that hang off of Resource, it is a relatively static construct
** this entity does not change often nor does the row count (once your inventory has stabilized after import), as opposed to insertion of availability data which needs to insert a new row as well as update an exist row for RLE processing for each member in the availability report
* availability table is several times larger than resource - if each resource in the system has over time gone down and come back up, say, 10 times, then the availability table will be 10x as large at the resource table
* in stable state (resource importing done, uninventory/deletes are infrequent), availability reporting happens much more frequently than the discovery reports

in fact, since availability data is so frequently used, i'm even tempted to consider a denormalized structure where the last known AvailabilityType enum (UP/DOWN) is directly set on the Resource entity.  something like:

@Entity Resource { AvailabilityType currentAvailability; }

this would allow ALL queries that need to serve up availability data with the resource (which is a large majority of the resource queries) to bypass the rhq_availability table altogether.  all the AvailabilityManagerBean needs to do is keep this field up to date when a new entry is inserted into the availability table.  this way, the rhq_resource table is the only thing needed for composite queries, while the rhq_availability table is used for the monitor subtab to show the "xmas tree lights".

Comment 6 Greg Hinkle 2008-11-12 19:39:42 UTC

What is the subquery that is slow? Have we proven that its that part that is slowing things down and does it change much switching between max(startTime) and null endTime?

I'm ok with denormalizing if it will improve things, but I don't want to optimize the wrong thing or make it worse. The Resource table is much wider than the avail table and will be significantly more expensive to update in terms of io and cache. This would be worse when we're doing the checkForSuspectAgents task of marking all resources on a box down.

Comment 7 Joseph Marques 2008-11-19 20:58:40 UTC

schema updates
* new table - rhq_current_availability(resource_id, availability_type)
* resource_id fk to rhq_resource(id)
* index on resource_id

db-upgrade
* insert into rhq_current_availability(resource_id, availability_type)
select res.id, (correlated subQ for current avail) from rhq_resource res

biz logic
* upon resource import, insert row into rhq_current_availability too (null availability_type --> unknown)
* upon receipt of availability report, update rhq_current_availability table

ui changes
* update resource group browser queries to hit this rhq_current_availability table, instead of 2 joins with correlated subQ (rhq_resource_group left join rhq_resource left join rhq_availability w/correlated subQ between res and avail where avail has the max starttime or null endtime)

Comment 8 Joseph Marques 2008-11-21 04:51:02 UTC

branch FEATURE_PRECOMPUTE_AVAIL:
rev2054 - precompute current / latest resource availability data; 
use precompute data to improve named queries in ResourceGroup entity that display aggregate/average group availability;

Comment 9 Joseph Marques 2008-12-04 17:47:50 UTC

rev2064 - merge branches/FEATURE_PRECOMPUTE_AVAIL back into trunk; 

rev2073 - without setting the resourceId explicitly, this blows up upon insert (because resource.getId() does not match resourceId)

rev2074 - use hibernate postPersist hooks to seed the rhq_resource_avail table during initial persistence of resources, instead of as separate calls to entityManager.persist(ResourceAvailability) in DiscoveryBossBean; 
this isn't just a cleaner methodology, it's critical - without this fix, rhq_resource_avail table will only eve get persisted with the InventoryReport's root resources; 

rev2075 - use surrogate id field for rhq_resource_availability; 
update other things to work in accordance with that; 

rev2076 - additional processing to support updating resource availability during backfilling procedure; 
break ResourceAvailability processing into its own SLSB; 

rev2077 - update monitor tab auto-group queries to use precompute resource availability; 

rev2078 - update monitor tab auto-group children queries to use precompute resource availability; 

rev2079 - yet more monitor tab updates for auto-group and/or auto-group children queries to use precompute resource availability; 

rev2080 - rest of the  monitor tab updates for auto-group and/or auto-group children queries to use precompute resource availability; 

rev2081 - LEFT JOIN the resource's availability so the cardinality of the query and countQuery are the same when viewing groups with 0 resource members; 

rev2082 - LEFT JOIN the resource's availability so the cardinality of the query and countQuery are the same when viewing group detail page with 0 resource members; 

rev2083 - update resourceGroup queries that display group membership details to use precompute resource availability; 

rev2084 - update resource browser queries that display platforms / servers / services to use precompute resource availability; 

rev2085 - fix query to update ResourceAvailability entities when back-filling occurs; 

rev2098 - necessary logic to ensure ResourceAvailability already has a record for every COMMITTEED resource in inventory; 
see comment in ResourceAvailabilityManagerLocal for details; 

rev2116 - fix availability tests; 

rev2126 - no longer need to explicitly create the ResourceAvailaibility objects in the SLSB, this is done now as a post-persist hook;

Comment 10 Jeff Weiss 2009-02-09 13:57:13 UTC

<joseph> jweiss: there are two sides to that coin.  one is concerning the raw functionality.  basically every page that displays availability needs to display it consistently.  it's possible, now that we have availability in two separate tables, that these can become inconsistent.  so, pick a resource, look at it's metric graphs which show the xmas tree lights at the top, note the current availability - and confirm that the other availabilities on all ot
 part 2 is about performance, but i presume that ccouch will have covered that for you.
 resource browser, availability status floating in upper-right, child resource on inventory tab, the resource state within the nav tree, resource group details page

Comment 11 Jeff Weiss 2009-02-09 16:26:33 UTC

Verified the resource availability functionality above - found one problem which I will link here.

Comment 12 Red Hat Bugzilla 2009-11-10 20:24:01 UTC

This bug was previously known as http://jira.rhq-project.org/browse/RHQ-1103
This bug is related to RHQ-1002
This bug incorporates RHQ-1213

Note You need to log in before you can comment on or make changes to this bug.