Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 745921 (EDG-5)

Summary:	HotRod client/server memory leak suspected
Product:	[JBoss] JBoss Data Grid 6	Reporter:	Martin Gencur <mgencur>
Component:	Infinispan	Assignee:	Tristan Tarrant <ttarrant>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	6.0.0	CC:	atangrin, galder.zamarreno, jdg-bugs, mgencur, mlinhard, onevelik, prabhat.jha, ttarrant
Target Milestone:	---
Target Release:	6.0.0
Hardware:	Unspecified
OS:	Unspecified
URL:	http://jira.jboss.org/jira/browse/EDG-5
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-11-25 14:13:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Martin Gencur 2011-08-31 14:29:39 UTC

project_key: EDG

After running soak tests with HotRod client with duration 8h and 48h we found out that throughput is gradually decreasing during the test.

The following runs shows (in its artifacts) various statistics one of which is throughput (operations/second):
https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/8/
https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/

This is a potential memory leak.

Comment 1 Galder Zamarreño 2011-09-01 08:34:30 UTC

In which diagram/artifact did you see that decrease exactly? The diagram in https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/chart-cluster-throughput.png doesn't show much.

If you suspect a memory leak, I'd suggest getting a heap dump when the test finishes so that we can inspect it.

Comment 2 Martin Gencur 2011-09-01 08:39:28 UTC

I saw it in files https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/8/artifact/report/data_csv.txt and https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/data_csv.txt. I also created graphs from these files which will be in performance testing report in a few days. These graphs can be found at https://svn.devel.redhat.com/repos/jboss-qa/edg/release-testing/EDG6.0.0.ALPHA1/charts/hotrod_soak_8h_duration_vs_throughput.png and https://svn.devel.redhat.com/repos/jboss-qa/edg/release-testing/EDG6.0.0.ALPHA1/charts/hotrod_soak_48h_duration_vs_throughput.png. I did not have time to do the heap dump so far, will try to do it today/tomorrow.

Comment 3 Tristan Tarrant 2011-09-05 08:39:19 UTC

Looking at the memory charts on that run I see nothing that would indicate a memory leak (see https://hudson.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/9/artifact/report/chart-cluster-heap.png). Network traffic decreases (but that is expected with decreasing throughput). CPU usage decreases as well, so it's not like the server is being stressed there. Anything that we can do to see if too much time is spent in lock contention/acquisition ?

Comment 4 Ondrej Nevelik 2011-09-15 08:14:44 UTC

I ran this test with the same configuration yesterday, the build was successful, however the tests weren't finished, they failed in the iteration 32 (so after 32 minutes, instead of 8 hours) because the mean response time was above limits (7 sec): https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/13/

Comment 5 Anne-Louise Tangring 2011-09-26 19:14:45 UTC

Docs QE Status: Removed: NEW

Comment 6 Martin Gencur 2011-10-05 06:59:26 UTC

I configured SmartFrog to collect heap statistics also on driver nodes (HotRod clients are running on them) and ran 2hrs soak tests with 11000 clients. The results is the following: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/23/artifact/report/chart-driver-heap.png . So it seems there is really a memory leak in HotRod clients but I will need to run 8hrs soak test to confirm this. I'll do that over night because our perf. lab is occupied during the day.

Comment 8 Galder Zamarreño 2011-10-10 10:44:32 UTC

ISPN-1383 could be having an effect. Netty has some inbound buffers on the decoder size that are never pruned. They're capacity is only increased but never decreased. So, if the size of data that's stored varies over time, this capacity increase could have an impact on more mem consumption.

Comment 9 Martin Gencur 2011-10-10 11:00:32 UTC

According to my last comment, the memory leak does not seem to be there anymore. Not in HotRod server nor in HotRod clients. Here's the graph for servers: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/24/artifact/report/chart-cluster-heap.png. And here's the graph for HotRod clients: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-soak-hotrod-size4/24/artifact/report/chart-driver-heap.png. The throughput is more or less constant over the 8hrs run. This was no true with EDG ALPHA1 where the throughput was decreasing.