Bug 965220 - Regression in library mode PUT performance from JDG 6.1 to Infinispan 5.3 beta 2
Summary: Regression in library mode PUT performance from JDG 6.1 to Infinispan 5.3 beta 2
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Performance
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-20 18:02 UTC by Alan Field
Modified: 2013-07-03 14:44 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-07-03 13:49:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Log files from each node in the cluster (953.84 KB, application/zip)
2013-05-20 18:02 UTC, Alan Field
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-3126 0 Critical Resolved PUT performance degradation in Infinispan 5.3 2013-10-24 13:34:41 UTC

Description Alan Field 2013-05-20 18:02:30 UTC
Created attachment 750689 [details]
Log files from each node in the cluster

Description of problem:

I am running tests with MapReduce using RadarGun. I have a stage that takes a file path and size as parameters, then uses all active cluster nodes to write values to the cache. Using a 100MB file and Infinispan 5.2.4.Final-redhat-1/JDG 6.1 GA, this operation takes less than 8 seconds:

12:23:13,259 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]
12:23:57,819 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

Using the same file and nodes with Infinispan 5.3beta2 takes about 8 minutes:

12:24:48,332 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:46:35,276 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:21.73 minutes, 1:21.76 minutes, 2:21.78 minutes, 3:21.73 minutes, 4:21.73 minutes, 5:21.73 minutes, 6:21.78 minutes, 7:21.67 minutes]
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

The attached logs show the duration for each PUT operation in the cache. On node 0 using Infinispan 5.2 the average duration is 2.35 milliseconds. In Infinispan 5.3beta2 the average duration is 106.05 milliseconds.

Infinispan 5.2 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan52/dist-udp-no-tx.xml

Infinispan 5.3 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan53/dist-udp-no-tx.xml

Both versions are using this JGroups configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/jgroups/jgroups-udp.xml


Version-Release number of selected component (if applicable):
5.2.4.Final-redhat-1 and Infinispan 5.3.0Beta2

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Alan Field 2013-05-20 20:35:13 UTC
I just want to update this report with the correct interpretation of the durations in the log file:

12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]

The number before the ":" is the node index, and is not part of the duration. The duration on node 0 was 31.83 seconds. The duration on node 1 was 31.96 seconds, etc.

Comment 2 Alan Field 2013-05-21 13:59:27 UTC
This is the source for the LoadFileStage as well:

https://github.com/radargun/radargun/blob/master/framework/src/main/java/org/radargun/stages/LoadFileStage.java

Comment 3 Alan Field 2013-06-04 13:51:01 UTC
This regression was caused by a stale value in the JGroups configuration file. The previous version of the file had UDP.bundler_type="old". In the new version of the file, the value is UDP.bundler_type="new". Here are the results with the new configuration file in place:

17:42:51,749 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,759 INFO  [org.radargun.Slave] (pool-1-thread-1) Executing stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,761 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-7168 at position 8192
17:43:09,626 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-40967168 at position 40968192
17:43:17,487 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-81927168 at position 81928192
17:43:20,856 INFO  [org.radargun.Slave] (pool-1-thread-1) Finished stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:43:20,857 INFO  [org.radargun.Slave] (main) Ack successfully sent to the master
17:43:20,985 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0 = 29.16 seconds, 1 = 29.1 seconds, 2 = 29.2 seconds, 3 = 29.09 seconds, 4 = 29.13 seconds, 5 = 29.09 seconds, 6 = 29.22 seconds, 7 = 29.09 seconds]
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt' is 100623474 bytes
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

These values are consistent with JDG 6.1. Maybe JGroups 3.3.x should print a warning message when UDP.bundler_type="old" is being used?

Comment 4 JBoss JIRA Server 2013-06-04 14:15:59 UTC
Pedro Ruivo <pedroruivo2> made a comment on jira ISPN-3126

In any case, I've sent an email to ispn-dev warning about the use of the new bundler.

Comment 7 JBoss JIRA Server 2013-07-03 13:09:47 UTC
Mircea Markus <mmarkus> made a comment on jira ISPN-3126

Can you please close it once you confirmed that this is not a problem?

Comment 8 Alan Field 2013-07-03 13:49:11 UTC
The performance degradation is not a bug, but JGroups should warn that the old bundler is enabled. I'll enter a new JIRA for that.

Comment 9 JBoss JIRA Server 2013-07-03 14:44:11 UTC
Martin Gencur <mgencur> updated the status of jira ISPN-3126 to Resolved

Comment 10 JBoss JIRA Server 2013-07-03 14:44:11 UTC
Martin Gencur <mgencur> made a comment on jira ISPN-3126

This is not a bug. The performance regression was caused by incorrect JGroups settings.


Note You need to log in before you can comment on or make changes to this bug.