Bug 737196 - transaction aborts (times out?) when going to Inventory>Connection Settings subtab for a compat group w/ 1000+ members
transaction aborts (times out?) when going to Inventory>Connection Settings s...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Core Server (Show other bugs)
4.1
Unspecified Unspecified
high Severity high (vote)
: ---
: ---
Assigned To: John Mazzitelli
Mike Foley
:
Depends On:
Blocks: rhq-perf
  Show dependency treegraph
 
Reported: 2011-09-09 17:20 EDT by Ian Springer
Modified: 2013-08-05 20:40 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-02-07 14:27:12 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
relevant lines from the Server log (40.25 KB, text/plain)
2011-09-09 17:20 EDT, Ian Springer
no flags Details

  None (edit)
Description Ian Springer 2011-09-09 17:20:17 EDT
The subtab tries to load for 10 minutes or so and then the exception occurs and the subtab content never loads. The full stack traces from the Server log are attached.
Comment 1 Ian Springer 2011-09-09 17:20:58 EDT
Created attachment 522407 [details]
relevant lines from the Server log
Comment 2 John Mazzitelli 2011-09-30 15:57:24 EDT
I found several distinct areas that are slow. If we speed them up, this could alleviate the issue. I had 1020 resources in a group when I  ran tests.

All are in the SLSB ConfigurationManagerBean.

* ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)

** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
--- the call to getExplicitResources() takes a long time - we are probably asking hibernate to load this collection here. Can we do a jpql query to make it go faster?

** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(), memberResource.getId()))
--- this call is very slow

* getPersistedPluginConfigurationsForCompatibleGroup(ResourceGroup compatibleGroup)

** long count = (Long) countQuery.getSingleResult();
--- this is getting the count of the Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID and is slow. Specifically the code is:

        Query countQuery = PersistenceUtility.createCountQuery(entityManager,
            Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID);
        countQuery.setParameter("resourceGroupId", compatibleGroup.getId());
        long count = (Long) countQuery.getSingleResult();

** It turns out we were trying to use paging here but we are NOT. There is no chunking going on here. So the "while(true)" loop is only done once and we load everything at once. Which is weird since who ever wrote this commented:

        // Configurations are very expensive to load, so load 'em in chunks to ease the strain on the DB.

so they did at least try to chunk the DB access. But it isn't doing what this person thought it was doing.
Comment 3 John Mazzitelli 2011-10-05 14:48:04 EDT
wrote unit test called:

  org.rhq.enterprise.server.configuration.LargeGroupPluginConfigurationTest

to test a large group and plugin config updates. Using both postgres and oracle, it took about 100s to obtain plugin configuration for 1010 resources in the group. Will see if I can speed it up now.
Comment 4 John Mazzitelli 2011-10-05 15:46:59 EDT
(In reply to comment #2)
> * ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)
> 
> ** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
> --- the call to getExplicitResources() takes a long time - we are probably
> asking hibernate to load this collection here. Can we do a jpql query to make
> it go faster?
> 
> ** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(),
> memberResource.getId()))
> --- this call is very slow
> 

I refactored this to use the same kind of query for "ensureNoResourceConfigurationUpdatesInProgress" and the results were dramatically better. I went down from about 100s to 4s. We were doing some really rudimentary looping in Java when a simple JPQL query is sufficient.
Comment 5 John Mazzitelli 2011-10-05 16:50:34 EDT
In getPersistedPluginConfigurationsForCompatibleGroup, we were calling "group.getExplicitResources.size() to get the size of the group that was really slow. Now instead we are going to do this:

        int groupSize = resourceGroupManager.getExplicitGroupMemberCount(compatibleGroup.getId());

which makes it infinitely more faster. I made this change in the analogous resource configuration method in addition to this plugin configuration method since it was doing the same thing.
Comment 6 John Mazzitelli 2011-10-05 20:42:36 EDT
(In reply to comment #2)
> ** It turns out we were trying to use paging here but we are NOT. There is no
> chunking going on here. So the "while(true)" loop is only done once and we load
> everything at once.

I fixed this. It turns out this exposed another bug that was here AND in the other method used to get the RESOURCE config updates - we weren't using ORDER BY so the chunking wasn't even getting the correct page data anyway. I fixed that too by ensuring we add ORDER BY clauses. So we are now paging correctly.
Comment 7 John Mazzitelli 2011-10-06 09:51:35 EDT
I think the changes I made drastically reduces the time needed to load this data.
Comment 8 Mike Foley 2011-10-06 13:46:11 EDT
verified thru functional testing around compat groups
Comment 9 Mike Foley 2012-02-07 14:27:12 EST
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE

Note You need to log in before you can comment on or make changes to this bug.