Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 745877 (EDG-88)

Summary:	hotrod server memory leak suspected
Product:	[JBoss] JBoss Data Grid 5	Reporter:	Michal Linhard <mlinhard>
Component:	Infinispan	Assignee:	Default User <jbpapp-maint>
Status:	CLOSED NEXTRELEASE	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	EAP 5.1.0 EDG TP	CC:	dan.berindei, galder.zamarreno, mlinhard, nobody
Target Milestone:	---
Target Release:	EAP 5.1.0 EDG TP
Hardware:	Unspecified
OS:	Unspecified
URL:	http://jira.jboss.org/jira/browse/EDG-88
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-07-11 15:35:36 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michal Linhard 2011-04-06 15:09:44 UTC

project_key: EDG

recent 6hr soak test of hotrod server module showed some suspicious results:
https://docspace.corp.redhat.com/docs/DOC-58120

Investigation is needed, because this is potential memory leak.

Comment 1 Michal Linhard 2011-04-06 15:12:20 UTC

I'm in the process of creating another soak test run, where memory histogram will be taken with command

{code}
jmap -histo:live <pid>
{code}

every 30 minutes.

Comment 2 Michal Linhard 2011-04-08 08:29:50 UTC

8hr check soak test with heap dump collection was performed, heap usage can be seen here:
https://docspace.corp.redhat.com/docs/DOC-61819

heap dumps were taken with 
{code}
jmap -dump:live,format=b,file=<file> <pid>
{code}

Heap dumps available here:
http://dev39.qa.atl.jboss.com/~mlinhard/soak_tests/

File name structure:
{code}
jmap_dump_HHHHHH_DDDD_DD_DD_DD_DD_DD_CC_MMMM.zip
jmap_dump_perf20_2011_04_07_12_22_23_00_0030.zip
{code}

H - host (host the dum comes from)
D - date (system time when dump took place)
C - counter (order of the dump taken for that particular JVM)
M - minute (how many minutes after start was this taken)

Comment 3 Dan Berindei 2011-04-12 15:44:03 UTC

I loaded the dump files (specifically jmap_dump_perf20_2011_04_07_19_45_17_03_0470.hprof) in Eclipse MAT and the biggest "dominator" is one instance of org.jgroups.protocols.pbcast.NAKACK that keeps a receiver window with 22870 instances of org.jgroups.Message for a total of ~25MB. 

The other dumps for perf20 have smaller numbers of messages, so I think this is the reason why the memory graph was trending upwards in the chart.

I was able to find the logical name of the server: perf20-59609. It would seem that JGroups is not able to ack messages sent by itself and so accumulates a very large backlog in its NAK receiver window.

Comment 4 Michal Linhard 2011-04-28 15:25:12 UTC

Dan, Galder do we have any conclusion on this ?

Comment 5 Galder Zamarreño 2011-05-09 14:41:57 UTC

Michal, I think Dan's been looking into this, so I'm assigning it to him.

Comment 6 Michal Linhard 2011-05-27 05:49:14 UTC

Link: Added: This issue relates to ISPN-1102

Comment 7 Dan Berindei 2011-07-11 15:35:36 UTC

After investigating with Bela we realized that this was not a memory leak, the NAKACK receiver window was holding a lot of memory only because our buffers are 500 bytes and the useful information (the one counted by JGroups) is 50 bytes.

Galder created issue ISPN-1102 to use smaller buffers.

Comment 8 Dan Berindei 2011-07-11 15:35:36 UTC

Release Notes Docs Status: Added: Not Required

Comment 9 Anne-Louise Tangring 2011-10-11 17:06:13 UTC

Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: ASSIGNED