Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 745875 (EDG-113)

Summary:

Large volumes of garbage when adding a single 1 Mb value

Product:

[JBoss] JBoss Data Grid 5

Reporter:

Richard Achmatowicz <rachmato>

Component:

Infinispan

Assignee:

Default User <jbpapp-maint>

Status:

CLOSED NEXTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

high

Version:

unspecified

CC:

galder.zamarreno, rachmato

Target Milestone:

---

Target Release:

EAP 5.1.0 EDG TP

Hardware:

Unspecified

OS:

Unspecified

URL:

http://jira.jboss.org/jira/browse/EDG-113

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-05-03 13:53:40 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
onemeg-heapchange.gif	none
onemeg-referencedheapobjects.gif	none
onemeg-alloccalltree.gif	none
onemeg-allochotspot.gif	none
onemegvalue-hr-referencedheapobjects-fix.gif	none
onemegvalue-hr-recordedobjects-fix.gif	none
onemegvalue-hr-allochotspot-fix.gif	none
onemegvalue-hr-alloccalltree-fix.gif	none

Description Richard Achmatowicz 2011-03-30 23:07:34 UTC

project_key: EDG

When a key with value of size 1 Mb is stored into the grid, very large amounts of garbage is generated - on the order of 30 Mb. 

This garbage eventually becomes unreferenced, but it is still a surprisingly large amount of garbage and adversely affects the size of the heap and promotes the need for garbage collection. It may be that some of this memory is being used unnecessarily, and even reducing the amount of memory used would probably improve performance.

Steps to reproduce:
1. Start a single node EDG cluster, instrumented for jProfiler
2. Remotely attach the jProfiler GUI to the node, with CPU and memory allocation turned off 
3. Once the GUI is attached, start memory allocation recording and CPU recording
4. Use HotRod or memcached client to put a single key witha 1 Mb value to the node
5. Turn off recording

The Recorded Objects memory views show that roughly 30 Mb of data have been allocated (not solely the first time through either). The same results can be seen by using jmap -heap to view the overall heap stats.

Comment 1 Richard Achmatowicz 2011-03-30 23:09:36 UTC

Attaching jProfiler screen dumps

Comment 2 Richard Achmatowicz 2011-03-30 23:09:36 UTC

Attachment: Added: onemeg-heapchange.gif
Attachment: Added: onemeg-referencedheapobjects.gif
Attachment: Added: onemeg-alloccalltree.gif
Attachment: Added: onemeg-allochotspot.gif

Comment 3 Richard Achmatowicz 2011-03-30 23:14:55 UTC

The screendumps taken during the exercise:

1. onemeg-heapchange.gif shows the recorded objects allocated during the put operation. The total allocated memory is 33 Mb.
2. onemeg-referencedheapobjects.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array
3. onemeg-alloccalltree.gif shows the methods doing the allocation of the recorded objects
4. onemeg-allochotspot.gif shows the stack traces of methods doing the byte[] allocations

Comment 4 Richard Achmatowicz 2011-03-30 23:17:05 UTC

32 Mb of overhead seems a bit OTT for handling 1 Mb of data. Could this be reduced?

This would help the data stress test results, whose heaps are under serious stress.

Comment 5 Galder Zamarreño 2011-04-07 11:52:14 UTC

Richard, I'm investigating this...

Comment 6 Galder Zamarreño 2011-04-07 14:06:21 UTC

Richard, this is really interesting. It would appear as if to read the 1mb key, it ends up having to create 53 times a new byte[] of 1mb. This is easily reproducible in my machine; a similar test, when profiled, shows 5 allocations of 1mb byte[]. I'll get back to you when I know more about this...

Comment 7 Galder Zamarreño 2011-04-07 14:15:41 UTC

Right, I think I know what the problem is. The netty decoder, as long as it has not received all the data, will carry on re-reading the header. So, I need to put some markers to make sure that if the key is read once, it's not read again all the time, since this generates all those garbage byte[]. The same needs to happen with value, which potentially could be big as well. I'll create a JIRA for this in Infinispan.

Comment 8 Galder Zamarreño 2011-04-07 14:16:10 UTC

Btw Richard, great catch! Fixing this is very important.

Comment 9 Galder Zamarreño 2011-04-07 14:20:47 UTC

Link: Added: This issue depends ISPN-1035

Comment 10 Richard Achmatowicz 2011-04-11 10:24:55 UTC

OK, that's great. The data stress tests showed really poor throughput (time in app / time in app + GC) I tried to find out where all the garbage was coming from. Hope this fixes the problems with those tests.

Comment 11 Galder Zamarreño 2011-04-11 14:13:08 UTC

Richard, I've verified it and it will fix it :) - I'm tidying up the code now, it required some decent restructuring in the decoding.

Comment 12 Richard Achmatowicz 2011-04-11 14:15:17 UTC

Wonderful. Let me know when I can grab a snapshot containing the changes and i'll give it a whirl.

Comment 13 Galder Zamarreño 2011-04-12 07:11:19 UTC

Hey Richard, I've just uploaded a snapshot taken from my 4.2.x branch of the ISPN-1035 issue. The id of the snapshot is: infinispan-4.2.1-20110412.064620-20

Give it a go and let's see if it makes a difference :)

Comment 14 Richard Achmatowicz 2011-04-12 12:33:43 UTC

Hi Galder

We rebuilt EDG using the snapshot here: http://hudson.qa.jboss.com/hudson/view/EDG/job/edg-51x-build-alt/14
The console log shows that the snapshot is getting picked up:
[java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group
     [java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group
     [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-memcached/4.2.1-SNAPSHOT/infinispan-server-memcached-4.2.1-20110412.064620-22.pom
     [java] 3K downloaded
     [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group
     [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group
     [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-parent/4.2.1-SNAPSHOT/infinispan-server-parent-4.2.1-20110412.064620-25.pom
     [java] 1K downloaded
     [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group
     [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group
     [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-parent/4.2.1-SNAPSHOT/infinispan-parent-4.2.1-20110412.064620-30.pom
     [java] 19K downloaded
     [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group
     [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group
     [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-core/4.2.1-SNAPSHOT/infinispan-server-core-4.2.1-20110412.064620-25.pom
     [java] 2K downloaded

and the zip file has been placed here:
/qa/home/mlinhard/edg-ispn-4.2.2-SNAPSHOT-20110412.064620.zip


However, when I rerun my 1 Mb addition test using the ew zip, I still see 31 Mb of data being allocated in 2073 instances of byte[].
I also ran a comparative test using one of the data stress tests and saew no differences.

So it looks as though your fix is not getting picked up somehow.
Can you check to see if the new zip tallies with the fix you applied?

Comment 15 Galder Zamarreño 2011-04-12 15:13:29 UTC

Just talked to Richard online and want to clarify something. In the profiler snapshots he attached, AbstractProtocolDecoder.readKeyParams() is generating over 50mb from reading the key multiple times. With the fix, AbstractProtocolDecoder.readKey() only generates 1mb which is the right thing, the key should only be read once. This should be an easy way to make sure the snapshot contains at least a fix for that issue.

If even with the fix there's still too much byte[] being generated, it needs further investigation.

Comment 16 Galder Zamarreño 2011-04-12 15:17:39 UTC

To be precise, in my local machine AsbtractProtocolDecoder.decode() is generating 1.8 mb data with 1mb coming from reading the key and 0.5mb CacheDelegate.put(). This compares to Richard's original snapshot showing 57mb being generated by AsbtractProtocolDecoder.decode()

Comment 17 Galder Zamarreño 2011-04-12 16:26:36 UTC

Ok, I've inspected Richard's profiler snapshots and checkpointing partially solves the problem but even when reading the key or value, we need to do it more efficiently and avoid creating some many useless byte[]. I'm checking with Trustin to find a more efficient way of reading a ranged byte[] (maybe buffering?)

Comment 18 Richard Achmatowicz 2011-04-14 17:02:05 UTC

The first snapshot which Galder passed on to me recuded the garbage from about 30 Mb to 26 Mb. 
He then provided a second snapshot file which reduces the garbage generated from 30 Mb down to 6 Mb. Very nice improvement, Galder!

Here are some jProfiler screenshots to look at, resulting from putting one key value pair with a 1 Mb value:

1. onemegvalue-hr-recordedobjects-fix.gif shows the recorded objects allocated during the put operation. The total allocated memory is about 6 Mb.
2. onemegvalue-hr-referencedheapobjects-fix.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array
3. onemegvalue-hr-alloccalltree-fix.gif shows the methods doing the allocation of the recorded objects
4. onemegvalue-hr-allochotspot-fix.gif shows the stack traces of methods doing the byte[] allocations

Comment 19 Richard Achmatowicz 2011-04-14 17:03:23 UTC

Attachment: Added: onemegvalue-hr-referencedheapobjects-fix.gif
Attachment: Added: onemegvalue-hr-recordedobjects-fix.gif
Attachment: Added: onemegvalue-hr-allochotspot-fix.gif
Attachment: Added: onemegvalue-hr-alloccalltree-fix.gif

Comment 20 Richard Achmatowicz 2011-04-14 17:04:30 UTC

Going to try using this build in the heap stress tests.

Comment 21 Galder Zamarreño 2011-04-18 07:28:35 UTC

Thx Richard! The screenshots look fine now. I'm gonna resolve the linked ISPN jira now so that it can be incorporated in 5.0.0.BETA2 which should be released later today.

Comment 22 Galder Zamarreño 2011-04-29 15:24:00 UTC

Link: Added: This issue relates to JBPAPP-6411

Comment 23 Galder Zamarreño 2011-05-03 09:48:17 UTC

I consider this to be closed since associated ISPN jira is fixed.

Comment 24 Richard Achmatowicz 2011-05-03 13:53:40 UTC

Was verified as complete in an earlier comment