Bug 997670
Summary: | High CPU usage for InventoryManager.getResourceContainer(); 30% CPU for 10,000 resources | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Other] RHQ Project | Reporter: | Elias Ross <genman> | ||||||||
Component: | Agent, Performance | Assignee: | Heiko W. Rupp <hrupp> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Mike Foley <mfoley> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 4.8 | CC: | hrupp, vnguyen | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | RHQ 4.10 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-04-23 12:32:28 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Elias Ross
2013-08-16 00:21:01 UTC
Created attachment 787233 [details]
Proposed patch
Elias,
good find.
Attached is a proposed patch, that I am currently also testing on RHEL.
On OS/X I saw getResourceContainer() for ~1600 resources, while afterwards visualvm did not show that method in its sampling anymore.
On RHEL I am on ~50ms / invocation for ~2500 resources - with the patch that seems mostly gone.
Could you test that on your large result set?
A different implementation could also just hold a:
Map<resourceId,uuid> and then the getComponent would be
String uuid = resourceContainersById.get(resourceId);
ResourceContainer rc = resourceContainers.get(uuid);
I think it may also be beneticial to initialize the HashSet with a larger size
in org.rhq.core.pc.measurement.MeasurementManager#getNextScheduledSet
Set<ScheduledMeasurementInfo> nextScheduledSet = new HashSet<ScheduledMeasurementInfo>(this.scheduledRequests.size());
Not sure if the complete original size makes sense, but the default is too small and will end up in repeated enlarging of the set.
( Sampling may not be that scientific, but at the end I also see some 30% of the time spent in getResourceContainer() ). One could try to validate this with a profiler. I wanted to mention I have roughly about 20,000 metrics per minute per agent, so it's not a typical configuration at all. So to see this as an issue you might have to increase the number of metrics. As for the OS, it is RHEL 6 on HP ProLiant DL360 G7 with 32GB of memory. Well, the number of calls to getResourceContainer() scales with the number of getValues() calls, but not directly with the number of metrics, as a huge number of them may be gathered in one go and on one resource. I was running that with only 2.5k resources, but 60k metrics per minute. As you mentioned before that you have 10k resources, I was thinking if you could test that in said env. Created attachment 788404 [details]
Proposed patch
An updated proposed patch, that uses less memory by
keeping a mapping resource id -> uuid instead of resource id -> container.
getResourceContainer(resourceId) now first tries to look up by resource id.
If that fails, we search through the containers and then add the found
mapping resource id -> uuid to above map.
Some other adjustments have been made especially in the calls
to check if debug logging is enabled. Also hash set allocation size is now much higher to prevent enlarging the backing map during set.add() calls.
This has been pushed to master as 81b37ad -- instead of a classical map, we now use a Trove Int->Object map and key on int instead of Integer. That saves around 1MB heap relative to a classical HashMap for 40k resources. With 40k resources the memory usage of this map is ~1.2MB Bulk closing of 4.10 issues. If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10. |