The subtab tries to load for 10 minutes or so and then the exception occurs and the subtab content never loads. The full stack traces from the Server log are attached.
Created attachment 522407 [details] relevant lines from the Server log
I found several distinct areas that are slow. If we speed them up, this could alleviate the issue. I had 1020 resources in a group when I ran tests. All are in the SLSB ConfigurationManagerBean. * ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup) ** for (Resource memberResource : compatibleGroup.getExplicitResources()) { --- the call to getExplicitResources() takes a long time - we are probably asking hibernate to load this collection here. Can we do a jpql query to make it go faster? ** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(), memberResource.getId())) --- this call is very slow * getPersistedPluginConfigurationsForCompatibleGroup(ResourceGroup compatibleGroup) ** long count = (Long) countQuery.getSingleResult(); --- this is getting the count of the Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID and is slow. Specifically the code is: Query countQuery = PersistenceUtility.createCountQuery(entityManager, Configuration.QUERY_GET_PLUGIN_CONFIG_MAP_BY_GROUP_ID); countQuery.setParameter("resourceGroupId", compatibleGroup.getId()); long count = (Long) countQuery.getSingleResult(); ** It turns out we were trying to use paging here but we are NOT. There is no chunking going on here. So the "while(true)" loop is only done once and we load everything at once. Which is weird since who ever wrote this commented: // Configurations are very expensive to load, so load 'em in chunks to ease the strain on the DB. so they did at least try to chunk the DB access. But it isn't doing what this person thought it was doing.
wrote unit test called: org.rhq.enterprise.server.configuration.LargeGroupPluginConfigurationTest to test a large group and plugin config updates. Using both postgres and oracle, it took about 100s to obtain plugin configuration for 1010 resources in the group. Will see if I can speed it up now.
(In reply to comment #2) > * ensureNoPluginConfigurationUpdatesInProgress(ResourceGroup) > > ** for (Resource memberResource : compatibleGroup.getExplicitResources()) { > --- the call to getExplicitResources() takes a long time - we are probably > asking hibernate to load this collection here. Can we do a jpql query to make > it go faster? > > ** if (isPluginConfigurationUpdateInProgress(this.subjectManager.getOverlord(), > memberResource.getId())) > --- this call is very slow > I refactored this to use the same kind of query for "ensureNoResourceConfigurationUpdatesInProgress" and the results were dramatically better. I went down from about 100s to 4s. We were doing some really rudimentary looping in Java when a simple JPQL query is sufficient.
In getPersistedPluginConfigurationsForCompatibleGroup, we were calling "group.getExplicitResources.size() to get the size of the group that was really slow. Now instead we are going to do this: int groupSize = resourceGroupManager.getExplicitGroupMemberCount(compatibleGroup.getId()); which makes it infinitely more faster. I made this change in the analogous resource configuration method in addition to this plugin configuration method since it was doing the same thing.
(In reply to comment #2) > ** It turns out we were trying to use paging here but we are NOT. There is no > chunking going on here. So the "while(true)" loop is only done once and we load > everything at once. I fixed this. It turns out this exposed another bug that was here AND in the other method used to get the RESOURCE config updates - we weren't using ORDER BY so the chunking wasn't even getting the correct page data anyway. I fixed that too by ensuring we add ORDER BY clauses. So we are now paging correctly.
I think the changes I made drastically reduces the time needed to load this data.
verified thru functional testing around compat groups
changing status of VERIFIED BZs for JON 2.4.2 and JON 3.0 to CLOSED/CURRENTRELEASE