Bug 737196 - transaction aborts (times out?) when going to Inventory>Connection Settings subtab for a compat group w/ 1000+ members
Summary: transaction aborts (times out?) when going to Inventory>Connection Settings s...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Core Server
Version: 4.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: John Mazzitelli
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: rhq-perf
TreeView+ depends on / blocked
 
Reported: 2011-09-09 21:20 UTC by Ian Springer
Modified: 2013-08-06 00:40 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2012-02-07 19:27:12 UTC
Embargoed:


Attachments (Terms of Use)
relevant lines from the Server log (40.25 KB, text/plain)
2011-09-09 21:20 UTC, Ian Springer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 739090 0 medium CLOSED plugin config for a group w/ 1000+ members fails to load due to "ORA-01795: maximum number of expressions in a list is 1... 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 789529 0 medium CLOSED ClassCastException in ConfigurationManagerBean.ensureNoPluginConfigurationUpdatesInProgress() can occur when visiting th... 2021-02-22 00:41:40 UTC

Internal Links: 739090 789529

Description Ian Springer 2011-09-09 21:20:17 UTC
The subtab tries to load for 10 minutes or so and then the exception occurs and the subtab content never loads. The full stack traces from the Server log are attached.

Comment 1 Ian Springer 2011-09-09 21:20:58 UTC
Created attachment 522407 [details]
relevant lines from the Server log

Comment 2 John Mazzitelli 2011-09-30 19:57:24 UTC
I found several distinct areas that are slow. If we speed them up, this could alleviate the issue. I had 1020 resources in a group when I  ran tests.

All are in the SLSB ConfigurationManagerBean.

* ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)

** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
--- the call to getExplicitResources() takes a long time - we are probably asking hibernate to load this collection here. Can we do a jpql query to make it go faster?

** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(), memberResource.getId()))
--- this call is very slow

* getPersistedPluginConfigurationsForCompatibleGroup(ResourceGroup compatibleGroup)

** long count = (Long) countQuery.getSingleResult();
--- this is getting the count of the Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID and is slow. Specifically the code is:

        Query countQuery = PersistenceUtility.createCountQuery(entityManager,
            Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID);
        countQuery.setParameter("resourceGroupId", compatibleGroup.getId());
        long count = (Long) countQuery.getSingleResult();

** It turns out we were trying to use paging here but we are NOT. There is no chunking going on here. So the "while(true)" loop is only done once and we load everything at once. Which is weird since who ever wrote this commented:

        // Configurations are very expensive to load, so load 'em in chunks to ease the strain on the DB.

so they did at least try to chunk the DB access. But it isn't doing what this person thought it was doing.

Comment 3 John Mazzitelli 2011-10-05 18:48:04 UTC
wrote unit test called:

  org.rhq.enterprise.server.configuration.LargeGroupPluginConfigurationTest

to test a large group and plugin config updates. Using both postgres and oracle, it took about 100s to obtain plugin configuration for 1010 resources in the group. Will see if I can speed it up now.

Comment 4 John Mazzitelli 2011-10-05 19:46:59 UTC
(In reply to comment #2)
> * ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup)
> 
> ** for (Resource memberResource : compatibleGroup.getExplicitResources()) {
> --- the call to getExplicitResources() takes a long time - we are probably
> asking hibernate to load this collection here. Can we do a jpql query to make
> it go faster?
> 
> ** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(),
> memberResource.getId()))
> --- this call is very slow
> 

I refactored this to use the same kind of query for "ensureNoResourceConfigurationUpdatesInProgress" and the results were dramatically better. I went down from about 100s to 4s. We were doing some really rudimentary looping in Java when a simple JPQL query is sufficient.

Comment 5 John Mazzitelli 2011-10-05 20:50:34 UTC
In getPersistedPluginConfigurationsForCompatibleGroup, we were calling "group.getExplicitResources.size() to get the size of the group that was really slow. Now instead we are going to do this:

        int groupSize = resourceGroupManager.getExplicitGroupMemberCount(compatibleGroup.getId());

which makes it infinitely more faster. I made this change in the analogous resource configuration method in addition to this plugin configuration method since it was doing the same thing.

Comment 6 John Mazzitelli 2011-10-06 00:42:36 UTC
(In reply to comment #2)
> ** It turns out we were trying to use paging here but we are NOT. There is no
> chunking going on here. So the "while(true)" loop is only done once and we load
> everything at once.

I fixed this. It turns out this exposed another bug that was here AND in the other method used to get the RESOURCE config updates - we weren't using ORDER BY so the chunking wasn't even getting the correct page data anyway. I fixed that too by ensuring we add ORDER BY clauses. So we are now paging correctly.

Comment 7 John Mazzitelli 2011-10-06 13:51:35 UTC
I think the changes I made drastically reduces the time needed to load this data.

Comment 8 Mike Foley 2011-10-06 17:46:11 UTC
verified thru functional testing around compat groups

Comment 9 Mike Foley 2012-02-07 19:27:12 UTC
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE


Note You need to log in before you can comment on or make changes to this bug.