| Summary: | Large volumes of garbage when adding a single 1 Mb value | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 5 | Reporter: | Richard Achmatowicz <rachmato> | ||||||||||||||||||
| Component: | Infinispan | Assignee: | Default User <jbpapp-maint> | ||||||||||||||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||
| Priority: | high | ||||||||||||||||||||
| Version: | unspecified | CC: | galder.zamarreno, rachmato | ||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||
| Target Release: | EAP 5.1.0 EDG TP | ||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| URL: | http://jira.jboss.org/jira/browse/EDG-113 | ||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2011-05-03 13:53:40 UTC | Type: | Bug | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Richard Achmatowicz
2011-03-30 23:07:34 UTC
Attaching jProfiler screen dumps Attachment: Added: onemeg-heapchange.gif Attachment: Added: onemeg-referencedheapobjects.gif Attachment: Added: onemeg-alloccalltree.gif Attachment: Added: onemeg-allochotspot.gif The screendumps taken during the exercise: 1. onemeg-heapchange.gif shows the recorded objects allocated during the put operation. The total allocated memory is 33 Mb. 2. onemeg-referencedheapobjects.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array 3. onemeg-alloccalltree.gif shows the methods doing the allocation of the recorded objects 4. onemeg-allochotspot.gif shows the stack traces of methods doing the byte[] allocations 32 Mb of overhead seems a bit OTT for handling 1 Mb of data. Could this be reduced? This would help the data stress test results, whose heaps are under serious stress. Richard, I'm investigating this... Richard, this is really interesting. It would appear as if to read the 1mb key, it ends up having to create 53 times a new byte[] of 1mb. This is easily reproducible in my machine; a similar test, when profiled, shows 5 allocations of 1mb byte[]. I'll get back to you when I know more about this... Right, I think I know what the problem is. The netty decoder, as long as it has not received all the data, will carry on re-reading the header. So, I need to put some markers to make sure that if the key is read once, it's not read again all the time, since this generates all those garbage byte[]. The same needs to happen with value, which potentially could be big as well. I'll create a JIRA for this in Infinispan. Btw Richard, great catch! Fixing this is very important. Link: Added: This issue depends ISPN-1035 OK, that's great. The data stress tests showed really poor throughput (time in app / time in app + GC) I tried to find out where all the garbage was coming from. Hope this fixes the problems with those tests. Richard, I've verified it and it will fix it :) - I'm tidying up the code now, it required some decent restructuring in the decoding. Wonderful. Let me know when I can grab a snapshot containing the changes and i'll give it a whirl. Hey Richard, I've just uploaded a snapshot taken from my 4.2.x branch of the ISPN-1035 issue. The id of the snapshot is: infinispan-4.2.1-20110412.064620-20 Give it a go and let's see if it makes a difference :) Hi Galder We rebuilt EDG using the snapshot here: http://hudson.qa.jboss.com/hudson/view/EDG/job/edg-51x-build-alt/14 The console log shows that the snapshot is getting picked up: [java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-memcached:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-memcached/4.2.1-SNAPSHOT/infinispan-server-memcached-4.2.1-20110412.064620-22.pom [java] 3K downloaded [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-parent/4.2.1-SNAPSHOT/infinispan-server-parent-4.2.1-20110412.064620-25.pom [java] 1K downloaded [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-parent:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-parent/4.2.1-SNAPSHOT/infinispan-parent-4.2.1-20110412.064620-30.pom [java] 19K downloaded [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-group [java] [INFO] snapshot org.infinispan:infinispan-server-core:4.2.1-SNAPSHOT: checking for updates from jboss-public-repository-devel-group [java] Downloading: https://repository.jboss.org/nexus/content/groups/developer//org/infinispan/infinispan-server-core/4.2.1-SNAPSHOT/infinispan-server-core-4.2.1-20110412.064620-25.pom [java] 2K downloaded and the zip file has been placed here: /qa/home/mlinhard/edg-ispn-4.2.2-SNAPSHOT-20110412.064620.zip However, when I rerun my 1 Mb addition test using the ew zip, I still see 31 Mb of data being allocated in 2073 instances of byte[]. I also ran a comparative test using one of the data stress tests and saew no differences. So it looks as though your fix is not getting picked up somehow. Can you check to see if the new zip tallies with the fix you applied? Just talked to Richard online and want to clarify something. In the profiler snapshots he attached, AbstractProtocolDecoder.readKeyParams() is generating over 50mb from reading the key multiple times. With the fix, AbstractProtocolDecoder.readKey() only generates 1mb which is the right thing, the key should only be read once. This should be an easy way to make sure the snapshot contains at least a fix for that issue. If even with the fix there's still too much byte[] being generated, it needs further investigation. To be precise, in my local machine AsbtractProtocolDecoder.decode() is generating 1.8 mb data with 1mb coming from reading the key and 0.5mb CacheDelegate.put(). This compares to Richard's original snapshot showing 57mb being generated by AsbtractProtocolDecoder.decode() Ok, I've inspected Richard's profiler snapshots and checkpointing partially solves the problem but even when reading the key or value, we need to do it more efficiently and avoid creating some many useless byte[]. I'm checking with Trustin to find a more efficient way of reading a ranged byte[] (maybe buffering?) The first snapshot which Galder passed on to me recuded the garbage from about 30 Mb to 26 Mb. He then provided a second snapshot file which reduces the garbage generated from 30 Mb down to 6 Mb. Very nice improvement, Galder! Here are some jProfiler screenshots to look at, resulting from putting one key value pair with a 1 Mb value: 1. onemegvalue-hr-recordedobjects-fix.gif shows the recorded objects allocated during the put operation. The total allocated memory is about 6 Mb. 2. onemegvalue-hr-referencedheapobjects-fix.gif shows the allocated objects after a garbage collection - most of the garbage has been collected and we can see the 1 Mb byte[] array 3. onemegvalue-hr-alloccalltree-fix.gif shows the methods doing the allocation of the recorded objects 4. onemegvalue-hr-allochotspot-fix.gif shows the stack traces of methods doing the byte[] allocations Attachment: Added: onemegvalue-hr-referencedheapobjects-fix.gif Attachment: Added: onemegvalue-hr-recordedobjects-fix.gif Attachment: Added: onemegvalue-hr-allochotspot-fix.gif Attachment: Added: onemegvalue-hr-alloccalltree-fix.gif Going to try using this build in the heap stress tests. Thx Richard! The screenshots look fine now. I'm gonna resolve the linked ISPN jira now so that it can be incorporated in 5.0.0.BETA2 which should be released later today. Link: Added: This issue relates to JBPAPP-6411 I consider this to be closed since associated ISPN jira is fixed. Was verified as complete in an earlier comment |