997670 – High CPU usage for InventoryManager.getResourceContainer(); 30% CPU for 10,000 resources

Bug 997670 - High CPU usage for InventoryManager.getResourceContainer(); 30% CPU for 10,000 resources

Summary: High CPU usage for InventoryManager.getResourceContainer(); 30% CPU for 10,00...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	RHQ Project
Classification:	Other
Component:	Agent, Performance
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHQ 4.10
Assignee:	Heiko W. Rupp
QA Contact:	Mike Foley
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-16 00:21 UTC by Elias Ross
Modified:	2014-04-23 12:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-04-23 12:32:28 UTC
Embargoed:

Attachments	(Terms of Use)
PNG from VisualVM (172.76 KB, image/png) 2013-08-16 00:21 UTC, Elias Ross	no flags	Details
Proposed patch (11.40 KB, patch) 2013-08-16 12:48 UTC, Heiko W. Rupp	no flags	Details \| Diff
Proposed patch (24.70 KB, patch) 2013-08-20 10:09 UTC, Heiko W. Rupp	no flags	Details \| Diff
Show Obsolete (1) View All

Description Elias Ross 2013-08-16 00:21:01 UTC

Created attachment 787119 [details]
PNG from VisualVM

Description of problem:

For agents with 10,000 resources, VisualVM reports about 30% CPU usage for this one method. See attached screenshot.

It's obviously not an issue with much fewer resources, but it's something to consider fixing.

There are two approaches:
1. Add Map<Integer, Resource> in addition.
2. Replace the existing Map<String, Resource> map with <Integer, Resource> instead, and then when querying by UUID, just iterate. Basically the reverse of the existing situation.

How reproducible:

Always. See the Bug 991153 for how I created this many resources.

Comment 1 Heiko W. Rupp 2013-08-16 12:48:12 UTC

Created attachment 787233 [details]
Proposed patch

Elias,

good find.

Attached is a proposed patch, that I am currently also testing on RHEL.
On OS/X I saw getResourceContainer() for ~1600 resources, while afterwards visualvm did not show that method in its sampling anymore.

On RHEL I am on ~50ms / invocation for ~2500 resources - with the patch that seems mostly gone.

Could you test that on your large result set?

A different implementation could also just hold a: 

Map<resourceId,uuid> and then the getComponent would be
String uuid = resourceContainersById.get(resourceId);
ResourceContainer rc = resourceContainers.get(uuid);


I think it may also be beneticial to initialize the HashSet with a larger size
in org.rhq.core.pc.measurement.MeasurementManager#getNextScheduledSet

        Set<ScheduledMeasurementInfo> nextScheduledSet = new HashSet<ScheduledMeasurementInfo>(this.scheduledRequests.size());

Not sure if the complete original size makes sense, but the default is too small and will end up in repeated enlarging of the set.

Comment 2 Heiko W. Rupp 2013-08-16 13:01:11 UTC

( Sampling may not be that scientific, but at the end I also see some 30% of the time spent in getResourceContainer() ). One could try to validate this with a profiler.

Comment 3 Elias Ross 2013-08-16 18:14:20 UTC

I wanted to mention I have roughly about 20,000 metrics per minute per agent, so it's not a typical configuration at all.

So to see this as an issue you might have to increase the number of metrics.

As for the OS, it is RHEL 6 on HP ProLiant DL360 G7 with 32GB of memory.

Comment 4 Heiko W. Rupp 2013-08-16 18:42:49 UTC

Well, the number of calls to getResourceContainer() scales with the number of getValues() calls, but not directly with the number of metrics, as a huge number of them may be gathered in one go and on one resource.

I was running that with only 2.5k resources, but 60k metrics per minute.
As you mentioned before that you have 10k resources, I was thinking if you could test that in said env.

Comment 5 Heiko W. Rupp 2013-08-20 10:09:13 UTC

Created attachment 788404 [details]
Proposed patch

An updated proposed patch, that uses less memory by 
keeping a mapping resource id -> uuid instead of resource id -> container.

getResourceContainer(resourceId) now first tries to look up by resource id.
If that fails, we search through the containers and then add the found 
mapping resource id -> uuid to above map.

Some other adjustments have been made especially in the calls
to check if debug logging is enabled. Also hash set allocation size is now much higher to prevent enlarging the backing map during set.add() calls.

Comment 6 Heiko W. Rupp 2013-12-04 10:42:46 UTC

This has been pushed to master as 81b37ad -- instead of a classical map, we now use a Trove Int->Object map and key on int instead of Integer. That saves around 1MB heap relative to a classical HashMap for 40k resources. With 40k resources the memory usage of this map is ~1.2MB

Comment 7 Heiko W. Rupp 2014-04-23 12:32:28 UTC

Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.

Note You need to log in before you can comment on or make changes to this bug.