Bug 745877 (EDG-88) - hotrod server memory leak suspected
Summary: hotrod server memory leak suspected
Keywords:
Status: CLOSED NEXTRELEASE
Alias: EDG-88
Product: JBoss Data Grid 5
Classification: JBoss
Component: Infinispan
Version: EAP 5.1.0 EDG TP
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: EAP 5.1.0 EDG TP
Assignee: Default User
QA Contact:
URL: http://jira.jboss.org/jira/browse/EDG-88
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-06 15:09 UTC by Michal Linhard
Modified: 2014-03-17 04:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-11 15:35:36 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker EDG-88 0 Major Closed hotrod server memory leak suspected 2014-07-15 14:51:54 UTC
Red Hat Issue Tracker ISPN-1102 0 Major Resolved Adaptive marshalling buffer size 2014-07-15 14:51:53 UTC

Description Michal Linhard 2011-04-06 15:09:44 UTC
project_key: EDG

recent 6hr soak test of hotrod server module showed some suspicious results:
https://docspace.corp.redhat.com/docs/DOC-58120

Investigation is needed, because this is potential memory leak.

Comment 1 Michal Linhard 2011-04-06 15:12:20 UTC
I'm in the process of creating another soak test run, where memory histogram will be taken with command

{code}
jmap -histo:live <pid>
{code}

every 30 minutes.

Comment 2 Michal Linhard 2011-04-08 08:29:50 UTC
8hr check soak test with heap dump collection was performed, heap usage can be seen here:
https://docspace.corp.redhat.com/docs/DOC-61819

heap dumps were taken with 
{code}
jmap -dump:live,format=b,file=<file> <pid>
{code}

Heap dumps available here:
http://dev39.qa.atl.jboss.com/~mlinhard/soak_tests/

File name structure:
{code}
jmap_dump_HHHHHH_DDDD_DD_DD_DD_DD_DD_CC_MMMM.zip
jmap_dump_perf20_2011_04_07_12_22_23_00_0030.zip
{code}

H - host (host the dum comes from)
D - date (system time when dump took place)
C - counter (order of the dump taken for that particular JVM)
M - minute (how many minutes after start was this taken)



Comment 3 Dan Berindei 2011-04-12 15:44:03 UTC
I loaded the dump files (specifically jmap_dump_perf20_2011_04_07_19_45_17_03_0470.hprof) in Eclipse MAT and the biggest "dominator" is one instance of org.jgroups.protocols.pbcast.NAKACK that keeps a receiver window with 22870 instances of org.jgroups.Message for a total of ~25MB. 

The other dumps for perf20 have smaller numbers of messages, so I think this is the reason why the memory graph was trending upwards in the chart.

I was able to find the logical name of the server: perf20-59609. It would seem that JGroups is not able to ack messages sent by itself and so accumulates a very large backlog in its NAK receiver window.

Comment 4 Michal Linhard 2011-04-28 15:25:12 UTC
Dan, Galder do we have any conclusion on this ?

Comment 5 Galder Zamarreño 2011-05-09 14:41:57 UTC
Michal, I think Dan's been looking into this, so I'm assigning it to him.

Comment 6 Michal Linhard 2011-05-27 05:49:14 UTC
Link: Added: This issue relates to ISPN-1102


Comment 7 Dan Berindei 2011-07-11 15:35:36 UTC
After investigating with Bela we realized that this was not a memory leak, the NAKACK receiver window was holding a lot of memory only because our buffers are 500 bytes and the useful information (the one counted by JGroups) is 50 bytes.

Galder created issue ISPN-1102 to use smaller buffers.

Comment 8 Dan Berindei 2011-07-11 15:35:36 UTC
Release Notes Docs Status: Added: Not Required


Comment 9 Anne-Louise Tangring 2011-10-11 17:06:13 UTC
Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: ASSIGNED 



Note You need to log in before you can comment on or make changes to this bug.