Hide Forgot
project_key: EDG When a key with value of size 1 Mb is stored into the grid, very large amounts of garbage is generated - on the order of 30 Mb. This garbage eventually becomes unreferenced, but it is still a surprisingly large amount of garbage and adversely affects the size of the heap and promotes the need for garbage collection. It may be that some of this memory is being used unnecessarily, and even reducing the amount of memory used would probably improve performance. Steps to reproduce: 1. Start a single node EDG cluster, instrumented for jProfiler 2. Remotely attach the jProfiler GUI to the node, with CPU and memory allocation turned off 3. Once the GUI is attached, start memory allocation recording and CPU recording 4. Use HotRod or memcached client to put a single key witha 1 Mb value to the node 5. Turn off recording The Recorded Objects memory views show that roughly 30 Mb of data have been allocated (not solely the first time through either). The same results can be seen by using jmap -heap to view the overall heap stats.
Attaching jProfiler screen dumps
Attachment: Added: onemeg-heapchange.gif Attachment: Added: onemeg-referencedheapobjects.gif Attachment: Added: onemeg-alloccalltree.gif Attachment: Added: onemeg-allochotspot.gif
The screendumps taken during the exercise: 1. onemeg-heapchange.gif shows the recorded objects allocated during the put operation. The total allocated memory is 33 Mb. 2. onemeg-referencedheapobjects.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array 3. onemeg-alloccalltree.gif shows the methods doing the allocation of the recorded objects 4. onemeg-allochotspot.gif shows the stack traces of methods doing the byte[] allocations
32 Mb of overhead seems a bit OTT for handling 1 Mb of data. Could this be reduced? This would help the data stress test results, whose heaps are under serious stress.
Richard, I'm investigating this...
Richard, this is really interesting. It would appear as if to read the 1mb key, it ends up having to create 53 times a new byte[] of 1mb. This is easily reproducible in my machine; a similar test, when profiled, shows 5 allocations of 1mb byte[]. I'll get back to you when I know more about this...
Right, I think I know what the problem is. The netty decoder, as long as it has not received all the data, will carry on re-reading the header. So, I need to put some markers to make sure that if the key is read once, it's not read again all the time, since this generates all those garbage byte[]. The same needs to happen with value, which potentially could be big as well. I'll create a JIRA for this in Infinispan.
Btw Richard, great catch! Fixing this is very important.
Link: Added: This issue depends ISPN-1035
OK, that's great. The data stress tests showed really poor throughput (time in app / time in app + GC) I tried to find out where all the garbage was coming from. Hope this fixes the problems with those tests.
Richard, I've verified it and it will fix it :) - I'm tidying up the code now, it required some decent restructuring in the decoding.
Wonderful. Let me know when I can grab a snapshot containing the changes and i'll give it a whirl.
Hey Richard, I've just uploaded a snapshot taken from my 4.2.x branch of the ISPN-1035 issue. The id of the snapshot is: infinispan-4.2.1-20110412.064620-20 Give it a go and let's see if it makes a difference :)
Hi Galder We rebuilt EDG using the snapshot here: http://hudson.qa.jboss.com/hudson/view/EDG/job/edg-51x-build-alt/14 The console log shows that the snapshot is getting picked up: [java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-memcached/4.2.1-SNAPSHOT/infinispan-server-memcached-4.2.1-20110412.064620-22.pom [java] 3K downloaded [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-parent/4.2.1-SNAPSHOT/infinispan-server-parent-4.2.1-20110412.064620-25.pom [java] 1K downloaded [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-parent/4.2.1-SNAPSHOT/infinispan-parent-4.2.1-20110412.064620-30.pom [java] 19K downloaded [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-core/4.2.1-SNAPSHOT/infinispan-server-core-4.2.1-20110412.064620-25.pom [java] 2K downloaded and the zip file has been placed here: /qa/home/mlinhard/edg-ispn-4.2.2-SNAPSHOT-20110412.064620.zip However, when I rerun my 1 Mb addition test using the ew zip, I still see 31 Mb of data being allocated in 2073 instances of byte[]. I also ran a comparative test using one of the data stress tests and saew no differences. So it looks as though your fix is not getting picked up somehow. Can you check to see if the new zip tallies with the fix you applied?
Just talked to Richard online and want to clarify something. In the profiler snapshots he attached, AbstractProtocolDecoder.readKeyParams() is generating over 50mb from reading the key multiple times. With the fix, AbstractProtocolDecoder.readKey() only generates 1mb which is the right thing, the key should only be read once. This should be an easy way to make sure the snapshot contains at least a fix for that issue. If even with the fix there's still too much byte[] being generated, it needs further investigation.
To be precise, in my local machine AsbtractProtocolDecoder.decode() is generating 1.8 mb data with 1mb coming from reading the key and 0.5mb CacheDelegate.put(). This compares to Richard's original snapshot showing 57mb being generated by AsbtractProtocolDecoder.decode()
Ok, I've inspected Richard's profiler snapshots and checkpointing partially solves the problem but even when reading the key or value, we need to do it more efficiently and avoid creating some many useless byte[]. I'm checking with Trustin to find a more efficient way of reading a ranged byte[] (maybe buffering?)
The first snapshot which Galder passed on to me recuded the garbage from about 30 Mb to 26 Mb. He then provided a second snapshot file which reduces the garbage generated from 30 Mb down to 6 Mb. Very nice improvement, Galder! Here are some jProfiler screenshots to look at, resulting from putting one key value pair with a 1 Mb value: 1. onemegvalue-hr-recordedobjects-fix.gif shows the recorded objects allocated during the put operation. The total allocated memory is about 6 Mb. 2. onemegvalue-hr-referencedheapobjects-fix.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array 3. onemegvalue-hr-alloccalltree-fix.gif shows the methods doing the allocation of the recorded objects 4. onemegvalue-hr-allochotspot-fix.gif shows the stack traces of methods doing the byte[] allocations
Attachment: Added: onemegvalue-hr-referencedheapobjects-fix.gif Attachment: Added: onemegvalue-hr-recordedobjects-fix.gif Attachment: Added: onemegvalue-hr-allochotspot-fix.gif Attachment: Added: onemegvalue-hr-alloccalltree-fix.gif
Going to try using this build in the heap stress tests.
Thx Richard! The screenshots look fine now. I'm gonna resolve the linked ISPN jira now so that it can be incorporated in 5.0.0.BETA2 which should be released later today.
Link: Added: This issue relates to JBPAPP-6411
I consider this to be closed since associated ISPN jira is fixed.
Was verified as complete in an earlier comment