Bug 1023451

Summary:	[perf] Large retained heap during inventory report merge (causing OOMs)
Product:	[Other] RHQ Project	Reporter:	Lukas Krejci <lkrejci>
Component:	Core Server	Assignee:	Jay Shaughnessy <jshaughn>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Mike Foley <mfoley>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.9	CC:	hrupp, jsanda, jshaughn, loleary, mazz, myarboro, rhatlapa, vnguyen
Target Milestone:	---
Target Release:	RHQ 4.10
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1029688 (view as bug list)		Environment:
Last Closed:	2014-04-23 12:31:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1024397, 1029688

Description Lukas Krejci 2013-10-25 13:03:11 UTC

Description of problem:

While investigating BZ 1018233, we've across the following situation:

75 agents, each with 25 AS7s in the inventory can take down the RHQ server with OutOfMemoryError by merely merging their inventories (presumably all of them doing this at the same time, during initial discoveries).

A heap dump taken when the OOM was thrown shows that each call to DiscoveryBossBean.mergeInventoryReport can take up as much as 60MB of memory.

I.e. we'd need more than 4GB of heap to handle the above situation without an OOM, which I think is just not reasonable at all.

One concrete example:
* a single thread consumes over 57MB of memory
* out of this, whole 50MB (!!!) is consumed by 8643 Resource objects (each such object holding on to 5863B on average). Out of this 50MB, 20MB are consumed by resource types.

While we have no way to reduce the potential size of our resource objects (because our feature set is large and centered around resources), we should think about trying to reduce the above numbers.

One obvious, even though not trivial to rewrite, way of changing the impl is to approach the problem in a more streaming-like fashion.

Currently, when we obtain the InventoryReport (which is pretty lightweight and cotains only absolutely necessary fields) we first initialize ALL the resources in that report will FULL versions of their resource types (note that we try to cache the resource types during the course of this "hydration", but nevertheless, this is still can get huge for big inventories).

When we "hydrated" all the resources with their resource types, we then proceed to merging the individual resources into the database.

I am not 100% sure if there is a strong reason for the upfront loading of all resource types, but I am pretty sure that if we managed to avoid any "hydration" of the inventory report as a whole and instead processed the individual elements in it on one-by-one basis, freeing stuff up for GC as soon as possible, we'd should be able to reduce the memory consumption.

Version-Release number of selected component (if applicable):
4.10.0-SNAPSHOT

How reproducible:
always with an inventory of the above size

Steps to Reproduce:
1. Contact Viet for the testsuite set up

Actual results:
OOM after 15minutes of running the testsuite

Expected results:
no OOMs

Additional info:

Comment 1 Jay Shaughnessy 2013-10-28 14:25:08 UTC

There was a lot of work done to improve the performance (speed) of merging inventory.  There is optimization is in place to efficiently handle large hierarchies and varying child cardinalities.   Changes here must not compromise throughput so testing must measure throughput as well as memory consumption.

A second HA server would possibly mitigate the issue and may not be unreasonable for such a configuration.

I too am not sure about the Resource Type issue, it seems that a caching mechanism here would have been sufficient but maybe it has issues.  Also, I do remember that  there is code in place that already tries to help out GC, but perhaps we can do more.

Maybe this doesn't need to be said, but in short, it's very important to have a holistic understanding of this mechanism, including the transactioning, before making further optimizations.

Comment 2 Lukas Krejci 2013-11-04 14:12:32 UTC

A correction of the symptoms of this bug:
1) We limit the number of concurrently processed inventory reports and by default we process at most 5 of them.
2) The heap dump I have at hand is consistent with this finding: 
a) it contains 2 threads with abnormal (among the rest) memory usage of > 50MB - the stacktraces of these 2 threads show that these threads are executing inside DiscoveryBossBean.mergeInventory().
b) There is a great number of threads with memory usage of approx 20MB that seem to be in the process of deserializing the requests for mergeInventory (judged by the presence of an InventoryReport instance in their heaps).
c) There are altogether 45 threads handling http input that account for the majority of the consumed memory.

So the issue here, I think, is twofold:
1) We consume large chunks of memory while handling inventory reports
2) Large inventories are actually quite expensive to transfer as command requests. Even before we can decide that given inventory merge request is to be ignored (because of concurrency limits), we can consume as much as 20MB of memory.

So it seems that rather than trying to minimize the memory usage while actually merging the inventories (this came through a speed optimization efforts, which might have increased the memory usage but definitely increased throughput), we should concentrate on changing the remote invocations such that we can stop the invocation even before we need to deserialize its full set of parameters.

One solution that comes to my mind is to make use of our remote streams to pass all the parameters (the question that is what to do with parameters that are streams themselves). We could then decide what to do with the invocation and only after we determined that it can be invoked, we'd read its params from the remote stream.

Mazz, what do you think about this?

Comment 3 Lukas Krejci 2013-11-06 12:19:57 UTC

After talking with Mazz I am trying out this simple approach:

1) Change the signature of DiscoveryServerService.mergeInventoryReport from:
MergeInventoryReportResults mergeInventoryReport(InventoryReport)

to:
MergeInventoryReportResults mergeInventoryReport(InputStream)

The InventoryReport object will travel to the server in the serialized form in an input stream. This means that the communication infrastructure can process the @LimitedConcurrency and other annotations and proceed to actually executing the impl of the above method without the huge InventoryReport taking up (nearly) any memory. The DiscoveryServerServiceImpl will then deserialize the InventoryReport from the provided stream and pass it down to DiscoveryBossBean which does the real work and remains unchanged.

As such the code changes are rather small and we gain the ability to deserialize the InventoryReport lazily and only when really needed.

In another words, we decided not to change the implementation of the inventory merge in any manner as that is a highly optimized and complex code, even if the memory consumption can be quite high during it. Instead, we make sure that we consume no additional memory in merge inventory requests that are to be discarded anyway. In scenarios like the one described in this BZ, those discarded requests will greatly outnumber the actually processed ones.

Comment 4 Lukas Krejci 2013-11-12 19:46:59 UTC

I think this is going to need a more thorough look when we look at the agent-server comms as a whole.

The approach with turning the InventoryReport into an input stream (as suggested) doesn't work, because the mergeInventoryReport is a synchronous call. When a synchronous call tries to call back to the agent (i.e. read the provided inputstream), the communication will deadlock, because the synchronous call is still in progress and the agent processes the calls in a single threaded manner.

I tried to overcome that by splitting the execution of the merge into a following sequence:

New methods:
1) @Asynchronous @LimitedConcurrency void DiscoveryServerService.initiateInventoryReportMerge(String handle, Agent agent)
2) DiscoveryAgentService: InventoryReport getCollectedInventoryReport(String handle)
3) DiscoveryAgentService: void finishInventoryReportMerge(String handle, MergeInventoryReportResults results, Exception failure)

The agent generates its inventory report and when it is ready to send it up to the server, it first stores the report into temporary storage and generates a unique "handle" for it. It then initiates the merge by calling the server with the handle. This is an asynchronous call and thus returns immediately on the agent.

On the server, during the initiateInventoryReportMerge method, we call the getCollectedInventoryReport to obtain the inventory report from the agent, process it normally and then report back to the agent about the results of the merge - both in case of success and failure.

Because the agent needs to know the result of the merge and originally the merge was synchronous, we need to somehow wait for the new workflow to finish (so that we don't have to reimplement large chunks of the agent code). This can be done, but has a number of edge cases that I feel introduce enough complexity that I am willing to stop trying to fix this problem until we come up with a more comprehensive solution to the problem of passing large arguments to remote methods (like stream the arguments in all cases or something).

The edge cases revolve around one of the parties becoming unavailable during the merge sequence. In order to not block the agent indefinitely, we need to define a timeout on the agent for how long it waits for the server to process the inventory report (and reports back by calling the finishInventoryReportMerge() method). What happens if the server processing just took longer than the timeout? How does the rest of the agent cope with the servers inability to merge the inventory "in time"? Wouldn't we potentially overload the server with merge requests that could quicker than the server can handle (due to too low a timeout on the agent)? This approach also opens the door to the possible processing of multiple merge requests from a single agent on the server side concurrently - this can be avoided, but again is a complication.

So all in all I'm in favor of not fixing this until we come up with a more scalable way of passing the arguments during synchronous calls. But this is something that I would not try to do in the scope of JON 3.2.0. This would be a fundamental change to how we do comms and thus is very risky.

Comment 5 Mike Foley 2013-11-12 20:37:43 UTC

i am recommending removing the JON 3.2 GA Blocker, and moving this to JON 3.2.1 for the following reasons:

1) comment #4. Developer feels the change is very risky
2) 3 weeks from GA, with all large topology testing behind us.  
3) PM-402 ...the documented large topology requirement, is minimally met.  (true, there are sporadic OOMs after the requirement is met, and true ... the agent heap size needs to be increased to 2GB)

true ...the agent needs a smaller memory footprint ... 

and 

true ... more optimizations and testing is needed for large topology scenarios 

but this doesn't seem practical with the current timeline and scope for GA.

flagging for discussion by the triage team.  

mark:  let's discuss this at the next triage call.

Comment 7 Mike Foley 2013-11-12 22:38:00 UTC

removing JON 3.2 GA Blocker as this is RHQ.  cloned this into a JON BZ

Comment 8 Jay Shaughnessy 2013-12-18 14:35:10 UTC

Just noting that I am doing some related work on lowering the footprint of inventory sync.  It's not a complete redesign, it doesn't completely change our comm approach.  It focuses on 1) breaking a full platform sync into pieces to lower the memory footprint and 2) moving the sync data away from a tree structure to reduce the inefficient hibernate-eager-fetch approach to building the tree, and lightening objects passed across the wire.

The work is in a branch, I'll update the BZ with commit hashes when it moves to master.

Comment 9 Jay Shaughnessy 2014-01-08 19:57:44 UTC

I've merged the branch into master.  Perhaps this will be sufficient to handle the needs of this BZ?  I'm moving to ON_QA and asking Heiko to perform any further triage...

There are several commits involved, from most recent:

commit 5ef600890aaedcc3a040f5496d18c3c808d73795
Date:   Fri Jan 3 16:50:39 2014 -0500
    fix merge issue, this class somehow disappeared

commit d0ab1fc5e0163804c1db9179f598e4b8d791e56b
Date:   Fri Jan 3 16:05:16 2014 -0500
    Some tweaks to test code

commit bd89436f6e6b5db574a32ffd6636e13e1d702d54
Date:   Thu Dec 19 14:03:24 2013 -0500
    Add a little more logging to get a better start/end of a sync.

commit ad9e7671b663f622cee1dac8e3e0113fe838743f
Date:   Wed Dec 18 09:28:48 2013 -0500
    This commit builds on the "chunking" work introduced earlier. This work
    replaces the tree structure, previously used to pass sync info between
    agent and server, with flat collections.  This allows us to replace the
    costly Hibernate-based tree building approach with more comprehensive
    queries that reduce the number of DB round trips dramatically.

    Notes:
    - This commit fixed an issue with the previous work regarding the handling
      of top level services.
    - The ResourceSyncInfo class is now lighter weight, parent-child info is
      removed.
    - PlatformSyncInfo is now a POJO.
    - The handling of unknown resources (agent side) changed significantly
      because it had depended on the previous tree structure.
    - mocks again had to be updated

Comment 10 Heiko W. Rupp 2014-04-23 12:31:30 UTC

Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.