Bug 965220 - Regression in library mode PUT performance from JDG 6.1 to Infinispan 5.3 beta 2
Regression in library mode PUT performance from JDG 6.1 to Infinispan 5.3 beta 2
Status: CLOSED NOTABUG
Product: JBoss Data Grid 6
Classification: JBoss
Component: Performance (Show other bugs)
6.1.0
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Tristan Tarrant
Martin Gencur
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-20 14:02 EDT by Alan Field
Modified: 2013-07-03 10:44 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-03 09:49:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Log files from each node in the cluster (953.84 KB, application/zip)
2013-05-20 14:02 EDT, Alan Field
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
JBoss Issue Tracker ISPN-3126 Critical Resolved PUT performance degradation in Infinispan 5.3 2013-10-24 09:34:41 EDT

  None (edit)
Description Alan Field 2013-05-20 14:02:30 EDT
Created attachment 750689 [details]
Log files from each node in the cluster

Description of problem:

I am running tests with MapReduce using RadarGun. I have a stage that takes a file path and size as parameters, then uses all active cluster nodes to write values to the cache. Using a 100MB file and Infinispan 5.2.4.Final-redhat-1/JDG 6.1 GA, this operation takes less than 8 seconds:

12:23:13,259 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]
12:23:57,819 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

Using the same file and nodes with Infinispan 5.3beta2 takes about 8 minutes:

12:24:48,332 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:46:35,276 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:21.73 minutes, 1:21.76 minutes, 2:21.78 minutes, 3:21.73 minutes, 4:21.73 minutes, 5:21.73 minutes, 6:21.78 minutes, 7:21.67 minutes]
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

The attached logs show the duration for each PUT operation in the cache. On node 0 using Infinispan 5.2 the average duration is 2.35 milliseconds. In Infinispan 5.3beta2 the average duration is 106.05 milliseconds.

Infinispan 5.2 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan52/dist-udp-no-tx.xml

Infinispan 5.3 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan53/dist-udp-no-tx.xml

Both versions are using this JGroups configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/jgroups/jgroups-udp.xml


Version-Release number of selected component (if applicable):
5.2.4.Final-redhat-1 and Infinispan 5.3.0Beta2

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Alan Field 2013-05-20 16:35:13 EDT
I just want to update this report with the correct interpretation of the durations in the log file:

12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]

The number before the ":" is the node index, and is not part of the duration. The duration on node 0 was 31.83 seconds. The duration on node 1 was 31.96 seconds, etc.
Comment 2 Alan Field 2013-05-21 09:59:27 EDT
This is the source for the LoadFileStage as well:

https://github.com/radargun/radargun/blob/master/framework/src/main/java/org/radargun/stages/LoadFileStage.java
Comment 3 Alan Field 2013-06-04 09:51:01 EDT
This regression was caused by a stale value in the JGroups configuration file. The previous version of the file had UDP.bundler_type="old". In the new version of the file, the value is UDP.bundler_type="new". Here are the results with the new configuration file in place:

17:42:51,749 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,759 INFO  [org.radargun.Slave] (pool-1-thread-1) Executing stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,761 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-7168 at position 8192
17:43:09,626 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-40967168 at position 40968192
17:43:17,487 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-81927168 at position 81928192
17:43:20,856 INFO  [org.radargun.Slave] (pool-1-thread-1) Finished stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:43:20,857 INFO  [org.radargun.Slave] (main) Ack successfully sent to the master
17:43:20,985 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0 = 29.16 seconds, 1 = 29.1 seconds, 2 = 29.2 seconds, 3 = 29.09 seconds, 4 = 29.13 seconds, 5 = 29.09 seconds, 6 = 29.22 seconds, 7 = 29.09 seconds]
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt' is 100623474 bytes
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

These values are consistent with JDG 6.1. Maybe JGroups 3.3.x should print a warning message when UDP.bundler_type="old" is being used?
Comment 4 JBoss JIRA Server 2013-06-04 10:15:59 EDT
Pedro Ruivo <pedroruivo2@gmail.com> made a comment on jira ISPN-3126

In any case, I've sent an email to ispn-dev warning about the use of the new bundler.
Comment 7 JBoss JIRA Server 2013-07-03 09:09:47 EDT
Mircea Markus <mmarkus@redhat.com> made a comment on jira ISPN-3126

Can you please close it once you confirmed that this is not a problem?
Comment 8 Alan Field 2013-07-03 09:49:11 EDT
The performance degradation is not a bug, but JGroups should warn that the old bundler is enabled. I'll enter a new JIRA for that.
Comment 9 JBoss JIRA Server 2013-07-03 10:44:11 EDT
Martin Gencur <mgencur@redhat.com> updated the status of jira ISPN-3126 to Resolved
Comment 10 JBoss JIRA Server 2013-07-03 10:44:11 EDT
Martin Gencur <mgencur@redhat.com> made a comment on jira ISPN-3126

This is not a bug. The performance regression was caused by incorrect JGroups settings.

Note You need to log in before you can comment on or make changes to this bug.