Red Hat Bugzilla – Bug 1023451
[perf] Large retained heap during inventory report merge (causing OOMs)
Last modified: 2014-06-30 08:24:42 EDT
Description of problem:
While investigating BZ 1018233, we've across the following situation:
75 agents, each with 25 AS7s in the inventory can take down the RHQ server with OutOfMemoryError by merely merging their inventories (presumably all of them doing this at the same time, during initial discoveries).
A heap dump taken when the OOM was thrown shows that each call to DiscoveryBossBean.mergeInventoryReport can take up as much as 60MB of memory.
I.e. we'd need more than 4GB of heap to handle the above situation without an OOM, which I think is just not reasonable at all.
One concrete example:
* a single thread consumes over 57MB of memory
* out of this, whole 50MB (!!!) is consumed by 8643 Resource objects (each such object holding on to 5863B on average). Out of this 50MB, 20MB are consumed by resource types.
While we have no way to reduce the potential size of our resource objects (because our feature set is large and centered around resources), we should think about trying to reduce the above numbers.
One obvious, even though not trivial to rewrite, way of changing the impl is to approach the problem in a more streaming-like fashion.
Currently, when we obtain the InventoryReport (which is pretty lightweight and cotains only absolutely necessary fields) we first initialize ALL the resources in that report will FULL versions of their resource types (note that we try to cache the resource types during the course of this "hydration", but nevertheless, this is still can get huge for big inventories).
When we "hydrated" all the resources with their resource types, we then proceed to merging the individual resources into the database.
I am not 100% sure if there is a strong reason for the upfront loading of all resource types, but I am pretty sure that if we managed to avoid any "hydration" of the inventory report as a whole and instead processed the individual elements in it on one-by-one basis, freeing stuff up for GC as soon as possible, we'd should be able to reduce the memory consumption.
Version-Release number of selected component (if applicable):
always with an inventory of the above size
Steps to Reproduce:
1. Contact Viet for the testsuite set up
OOM after 15minutes of running the testsuite
There was a lot of work done to improve the performance (speed) of merging inventory. There is optimization is in place to efficiently handle large hierarchies and varying child cardinalities. Changes here must not compromise throughput so testing must measure throughput as well as memory consumption.
A second HA server would possibly mitigate the issue and may not be unreasonable for such a configuration.
I too am not sure about the Resource Type issue, it seems that a caching mechanism here would have been sufficient but maybe it has issues. Also, I do remember that there is code in place that already tries to help out GC, but perhaps we can do more.
Maybe this doesn't need to be said, but in short, it's very important to have a holistic understanding of this mechanism, including the transactioning, before making further optimizations.
A correction of the symptoms of this bug:
1) We limit the number of concurrently processed inventory reports and by default we process at most 5 of them.
2) The heap dump I have at hand is consistent with this finding:
a) it contains 2 threads with abnormal (among the rest) memory usage of > 50MB - the stacktraces of these 2 threads show that these threads are executing inside DiscoveryBossBean.mergeInventory().
b) There is a great number of threads with memory usage of approx 20MB that seem to be in the process of deserializing the requests for mergeInventory (judged by the presence of an InventoryReport instance in their heaps).
c) There are altogether 45 threads handling http input that account for the majority of the consumed memory.
So the issue here, I think, is twofold:
1) We consume large chunks of memory while handling inventory reports
2) Large inventories are actually quite expensive to transfer as command requests. Even before we can decide that given inventory merge request is to be ignored (because of concurrency limits), we can consume as much as 20MB of memory.
So it seems that rather than trying to minimize the memory usage while actually merging the inventories (this came through a speed optimization efforts, which might have increased the memory usage but definitely increased throughput), we should concentrate on changing the remote invocations such that we can stop the invocation even before we need to deserialize its full set of parameters.
One solution that comes to my mind is to make use of our remote streams to pass all the parameters (the question that is what to do with parameters that are streams themselves). We could then decide what to do with the invocation and only after we determined that it can be invoked, we'd read its params from the remote stream.
Mazz, what do you think about this?
After talking with Mazz I am trying out this simple approach:
1) Change the signature of DiscoveryServerService.mergeInventoryReport from:
The InventoryReport object will travel to the server in the serialized form in an input stream. This means that the communication infrastructure can process the @LimitedConcurrency and other annotations and proceed to actually executing the impl of the above method without the huge InventoryReport taking up (nearly) any memory. The DiscoveryServerServiceImpl will then deserialize the InventoryReport from the provided stream and pass it down to DiscoveryBossBean which does the real work and remains unchanged.
As such the code changes are rather small and we gain the ability to deserialize the InventoryReport lazily and only when really needed.
In another words, we decided not to change the implementation of the inventory merge in any manner as that is a highly optimized and complex code, even if the memory consumption can be quite high during it. Instead, we make sure that we consume no additional memory in merge inventory requests that are to be discarded anyway. In scenarios like the one described in this BZ, those discarded requests will greatly outnumber the actually processed ones.
I think this is going to need a more thorough look when we look at the agent-server comms as a whole.
The approach with turning the InventoryReport into an input stream (as suggested) doesn't work, because the mergeInventoryReport is a synchronous call. When a synchronous call tries to call back to the agent (i.e. read the provided inputstream), the communication will deadlock, because the synchronous call is still in progress and the agent processes the calls in a single threaded manner.
I tried to overcome that by splitting the execution of the merge into a following sequence:
1) @Asynchronous @LimitedConcurrency void DiscoveryServerService.initiateInventoryReportMerge(String handle, Agent agent)
2) DiscoveryAgentService: InventoryReport getCollectedInventoryReport(String handle)
3) DiscoveryAgentService: void finishInventoryReportMerge(String handle, MergeInventoryReportResults results, Exception failure)
The agent generates its inventory report and when it is ready to send it up to the server, it first stores the report into temporary storage and generates a unique "handle" for it. It then initiates the merge by calling the server with the handle. This is an asynchronous call and thus returns immediately on the agent.
On the server, during the initiateInventoryReportMerge method, we call the getCollectedInventoryReport to obtain the inventory report from the agent, process it normally and then report back to the agent about the results of the merge - both in case of success and failure.
Because the agent needs to know the result of the merge and originally the merge was synchronous, we need to somehow wait for the new workflow to finish (so that we don't have to reimplement large chunks of the agent code). This can be done, but has a number of edge cases that I feel introduce enough complexity that I am willing to stop trying to fix this problem until we come up with a more comprehensive solution to the problem of passing large arguments to remote methods (like stream the arguments in all cases or something).
The edge cases revolve around one of the parties becoming unavailable during the merge sequence. In order to not block the agent indefinitely, we need to define a timeout on the agent for how long it waits for the server to process the inventory report (and reports back by calling the finishInventoryReportMerge() method). What happens if the server processing just took longer than the timeout? How does the rest of the agent cope with the servers inability to merge the inventory "in time"? Wouldn't we potentially overload the server with merge requests that could quicker than the server can handle (due to too low a timeout on the agent)? This approach also opens the door to the possible processing of multiple merge requests from a single agent on the server side concurrently - this can be avoided, but again is a complication.
So all in all I'm in favor of not fixing this until we come up with a more scalable way of passing the arguments during synchronous calls. But this is something that I would not try to do in the scope of JON 3.2.0. This would be a fundamental change to how we do comms and thus is very risky.
i am recommending removing the JON 3.2 GA Blocker, and moving this to JON 3.2.1 for the following reasons:
1) comment #4. Developer feels the change is very risky
2) 3 weeks from GA, with all large topology testing behind us.
3) PM-402 ...the documented large topology requirement, is minimally met. (true, there are sporadic OOMs after the requirement is met, and true ... the agent heap size needs to be increased to 2GB)
true ...the agent needs a smaller memory footprint ...
true ... more optimizations and testing is needed for large topology scenarios
but this doesn't seem practical with the current timeline and scope for GA.
flagging for discussion by the triage team.
mark: let's discuss this at the next triage call.
removing JON 3.2 GA Blocker as this is RHQ. cloned this into a JON BZ
Just noting that I am doing some related work on lowering the footprint of inventory sync. It's not a complete redesign, it doesn't completely change our comm approach. It focuses on 1) breaking a full platform sync into pieces to lower the memory footprint and 2) moving the sync data away from a tree structure to reduce the inefficient hibernate-eager-fetch approach to building the tree, and lightening objects passed across the wire.
The work is in a branch, I'll update the BZ with commit hashes when it moves to master.
I've merged the branch into master. Perhaps this will be sufficient to handle the needs of this BZ? I'm moving to ON_QA and asking Heiko to perform any further triage...
There are several commits involved, from most recent:
Date: Fri Jan 3 16:50:39 2014 -0500
fix merge issue, this class somehow disappeared
Date: Fri Jan 3 16:05:16 2014 -0500
Some tweaks to test code
Date: Thu Dec 19 14:03:24 2013 -0500
Add a little more logging to get a better start/end of a sync.
Date: Wed Dec 18 09:28:48 2013 -0500
This commit builds on the "chunking" work introduced earlier. This work
replaces the tree structure, previously used to pass sync info between
agent and server, with flat collections. This allows us to replace the
costly Hibernate-based tree building approach with more comprehensive
queries that reduce the number of DB round trips dramatically.
- This commit fixed an issue with the previous work regarding the handling
of top level services.
- The ResourceSyncInfo class is now lighter weight, parent-child info is
- PlatformSyncInfo is now a POJO.
- The handling of unknown resources (agent side) changed significantly
because it had depended on the previous tree structure.
- mocks again had to be updated
Bulk closing of 4.10 issues.
If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.