Bug 737196

Summary: transaction aborts (times out?) when going to Inventory>Connection Settings subtab for a compat group w/ 1000+ members
Product: [Other] RHQ Project Reporter: Ian Springer <ian.springer>
Component: Core ServerAssignee: John Mazzitelli <mazz>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Foley <mfoley>
Severity: high Docs Contact:
Priority: high    
Version: 4.1CC: ccrouch, hrupp, mazz
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-07 19:27:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 620933    
Attachments:
Description Flags
relevant lines from the Server log none

Description Ian Springer 2011-09-09 21:20:17 UTC
The subtab tries to load for 10 minutes or so and then the exception occurs and the subtab content never loads. The full stack traces from the Server log are attached.

Comment 1 Ian Springer 2011-09-09 21:20:58 UTC
Created attachment 522407 [details]
relevant lines from the Server log

Comment 2 John Mazzitelli 2011-09-30 19:57:24 UTC
I found several distinct areas that are slow. If we speed them up, this could alleviate the issue. I had 1020 resources in a group when I  ran tests.

All are in the SLSB ConfigurationManagerBean.

* ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)

** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
--- the call to getExplicitResources() takes a long time - we are probably asking hibernate to load this collection here. Can we do a jpql query to make it go faster?

** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(), memberResource.getId()))
--- this call is very slow

* getPersistedPluginConfigurationsForCompatibleGroup(ResourceGroup compatibleGroup)

** long count = (Long) countQuery.getSingleResult();
--- this is getting the count of the Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID and is slow. Specifically the code is:

        Query countQuery = PersistenceUtility.createCountQuery(entityManager,
            Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID);
        countQuery.setParameter("resourceGroupId", compatibleGroup.getId());
        long count = (Long) countQuery.getSingleResult();

** It turns out we were trying to use paging here but we are NOT. There is no chunking going on here. So the "while(true)" loop is only done once and we load everything at once. Which is weird since who ever wrote this commented:

        // Configurations are very expensive to load, so load 'em in chunks to ease the strain on the DB.

so they did at least try to chunk the DB access. But it isn't doing what this person thought it was doing.

Comment 3 John Mazzitelli 2011-10-05 18:48:04 UTC
wrote unit test called:

  org.rhq.enterprise.server.configuration.LargeGroupPluginConfigurationTest

to test a large group and plugin config updates. Using both postgres and oracle, it took about 100s to obtain plugin configuration for 1010 resources in the group. Will see if I can speed it up now.

Comment 4 John Mazzitelli 2011-10-05 19:46:59 UTC
(In reply to comment #2)
> * ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)
> 
> ** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
> --- the call to getExplicitResources() takes a long time - we are probably
> asking hibernate to load this collection here. Can we do a jpql query to make
> it go faster?
> 
> ** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(),
> memberResource.getId()))
> --- this call is very slow
> 

I refactored this to use the same kind of query for "ensureNoResourceConfigurationUpdatesInProgress" and the results were dramatically better. I went down from about 100s to 4s. We were doing some really rudimentary looping in Java when a simple JPQL query is sufficient.

Comment 5 John Mazzitelli 2011-10-05 20:50:34 UTC
In getPersistedPluginConfigurationsForCompatibleGroup, we were calling "group.getExplicitResources.size() to get the size of the group that was really slow. Now instead we are going to do this:

        int groupSize = resourceGroupManager.getExplicitGroupMemberCount(compatibleGroup.getId());

which makes it infinitely more faster. I made this change in the analogous resource configuration method in addition to this plugin configuration method since it was doing the same thing.

Comment 6 John Mazzitelli 2011-10-06 00:42:36 UTC
(In reply to comment #2)
> ** It turns out we were trying to use paging here but we are NOT. There is no
> chunking going on here. So the "while(true)" loop is only done once and we load
> everything at once.

I fixed this. It turns out this exposed another bug that was here AND in the other method used to get the RESOURCE config updates - we weren't using ORDER BY so the chunking wasn't even getting the correct page data anyway. I fixed that too by ensuring we add ORDER BY clauses. So we are now paging correctly.

Comment 7 John Mazzitelli 2011-10-06 13:51:35 UTC
I think the changes I made drastically reduces the time needed to load this data.

Comment 8 Mike Foley 2011-10-06 17:46:11 UTC
verified thru functional testing around compat groups

Comment 9 Mike Foley 2012-02-07 19:27:12 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE