Bug 745877 (EDG-88)

Summary: hotrod server memory leak suspected
Product: [JBoss] JBoss Data Grid 5 Reporter: Michal Linhard <mlinhard>
Component: InfinispanAssignee: Default User <jbpapp-maint>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: EAP 5.1.0 EDG TPCC: dan.berindei, galder.zamarreno, mlinhard, nobody
Target Milestone: ---   
Target Release: EAP 5.1.0 EDG TP   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/EDG-88
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-11 15:35:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michal Linhard 2011-04-06 15:09:44 UTC
project_key: EDG

recent 6hr soak test of hotrod server module showed some suspicious results:
https://docspace.corp.redhat.com/docs/DOC-58120

Investigation is needed, because this is potential memory leak.

Comment 1 Michal Linhard 2011-04-06 15:12:20 UTC
I'm in the process of creating another soak test run, where memory histogram will be taken with command

{code}
jmap -histo:live <pid>
{code}

every 30 minutes.

Comment 2 Michal Linhard 2011-04-08 08:29:50 UTC
8hr check soak test with heap dump collection was performed, heap usage can be seen here:
https://docspace.corp.redhat.com/docs/DOC-61819

heap dumps were taken with 
{code}
jmap -dump:live,format=b,file=<file> <pid>
{code}

Heap dumps available here:
http://dev39.qa.atl.jboss.com/~mlinhard/soak_tests/

File name structure:
{code}
jmap_dump_HHHHHH_DDDD_DD_DD_DD_DD_DD_CC_MMMM.zip
jmap_dump_perf20_2011_04_07_12_22_23_00_0030.zip
{code}

H - host (host the dum comes from)
D - date (system time when dump took place)
C - counter (order of the dump taken for that particular JVM)
M - minute (how many minutes after start was this taken)



Comment 3 Dan Berindei 2011-04-12 15:44:03 UTC
I loaded the dump files (specifically jmap_dump_perf20_2011_04_07_19_45_17_03_0470.hprof) in Eclipse MAT and the biggest "dominator" is one instance of org.jgroups.protocols.pbcast.NAKACK that keeps a receiver window with 22870 instances of org.jgroups.Message for a total of ~25MB. 

The other dumps for perf20 have smaller numbers of messages, so I think this is the reason why the memory graph was trending upwards in the chart.

I was able to find the logical name of the server: perf20-59609. It would seem that JGroups is not able to ack messages sent by itself and so accumulates a very large backlog in its NAK receiver window.

Comment 4 Michal Linhard 2011-04-28 15:25:12 UTC
Dan, Galder do we have any conclusion on this ?

Comment 5 Galder ZamarreƱo 2011-05-09 14:41:57 UTC
Michal, I think Dan's been looking into this, so I'm assigning it to him.

Comment 6 Michal Linhard 2011-05-27 05:49:14 UTC
Link: Added: This issue relates to ISPN-1102


Comment 7 Dan Berindei 2011-07-11 15:35:36 UTC
After investigating with Bela we realized that this was not a memory leak, the NAKACK receiver window was holding a lot of memory only because our buffers are 500 bytes and the useful information (the one counted by JGroups) is 50 bytes.

Galder created issue ISPN-1102 to use smaller buffers.

Comment 8 Dan Berindei 2011-07-11 15:35:36 UTC
Release Notes Docs Status: Added: Not Required


Comment 9 Anne-Louise Tangring 2011-10-11 17:06:13 UTC
Release Notes Docs Status: Removed: Not Required 
Docs QE Status: Removed: ASSIGNED