Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 965220

Summary: Regression in library mode PUT performance from JDG 6.1 to Infinispan 5.3 beta 2
Product: [JBoss] JBoss Data Grid 6 Reporter: Alan Field <afield>
Component: PerformanceAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED NOTABUG QA Contact: Martin Gencur <mgencur>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: afield, jdg-bugs, mmarkus, myarboro
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-03 13:49:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Log files from each node in the cluster none

Description Alan Field 2013-05-20 18:02:30 UTC
Created attachment 750689 [details]
Log files from each node in the cluster

Description of problem:

I am running tests with MapReduce using RadarGun. I have a stage that takes a file path and size as parameters, then uses all active cluster nodes to write values to the cache. Using a 100MB file and Infinispan 5.2.4.Final-redhat-1/JDG 6.1 GA, this operation takes less than 8 seconds:

12:23:13,259 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]
12:23:57,819 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:23:57,820 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:23:57,821 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,822 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:23:57,823 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

Using the same file and nodes with Infinispan 5.3beta2 takes about 8 minutes:

12:24:48,332 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/home/hudson/tmp/afield/william-shakespeare-100MB.txt, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
12:46:35,276 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:21.73 minutes, 1:21.76 minutes, 2:21.78 minutes, 3:21.73 minutes, 4:21.73 minutes, 5:21.73 minutes, 6:21.78 minutes, 7:21.67 minutes]
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/home/hudson/tmp/afield/william-shakespeare-100MB.txt' is 100623474 bytes
12:46:35,278 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,279 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,280 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
12:46:35,281 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

The attached logs show the duration for each PUT operation in the cache. On node 0 using Infinispan 5.2 the average duration is 2.35 milliseconds. In Infinispan 5.3beta2 the average duration is 106.05 milliseconds.

Infinispan 5.2 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan52/dist-udp-no-tx.xml

Infinispan 5.3 is using this configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/ispn-configs/infinispan53/dist-udp-no-tx.xml

Both versions are using this JGroups configuration file: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/jdg-60/jgroups/jgroups-udp.xml


Version-Release number of selected component (if applicable):
5.2.4.Final-redhat-1 and Infinispan 5.3.0Beta2

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Alan Field 2013-05-20 20:35:13 UTC
I just want to update this report with the correct interpretation of the durations in the log file:

12:23:57,818 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0:31.83 seconds, 1:31.96 seconds, 2:31.9 seconds, 3:32.06 seconds, 4:44.53 seconds, 5:44.37 seconds, 6:44.5 seconds, 7:31.83 seconds]

The number before the ":" is the node index, and is not part of the duration. The duration on node 0 was 31.83 seconds. The duration on node 1 was 31.96 seconds, etc.

Comment 2 Alan Field 2013-05-21 13:59:27 UTC
This is the source for the LoadFileStage as well:

https://github.com/radargun/radargun/blob/master/framework/src/main/java/org/radargun/stages/LoadFileStage.java

Comment 3 Alan Field 2013-06-04 13:51:01 UTC
This regression was caused by a stale value in the JGroups configuration file. The previous version of the file had UDP.bundler_type="old". In the new version of the file, the value is UDP.bundler_type="new". Here are the results with the new configuration file in place:

17:42:51,749 DEBUG [org.radargun.Master] (main) Starting 'LoadFileStage' on 8 slave nodes. Details: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,759 INFO  [org.radargun.Slave] (pool-1-thread-1) Executing stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:42:51,761 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-7168 at position 8192
17:43:09,626 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-40967168 at position 40968192
17:43:17,487 INFO  [org.radargun.stages.LoadFileStage] (pool-1-thread-1) Writing 1024 bytes to cache key: 7-81927168 at position 81928192
17:43:20,856 INFO  [org.radargun.Slave] (pool-1-thread-1) Finished stage: LoadFile {bucket=null, exitBenchmarkOnSlaveFailure=false, filePath=/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt, printWriteStatistics=false, runOnAllSlaves=false, slaves=null, useSmartClassLoading=true, valueSize=1024 }
17:43:20,857 INFO  [org.radargun.Slave] (main) Ack successfully sent to the master
17:43:20,985 INFO  [org.radargun.stages.LoadFileStage] (main) Received responses from all 8 slaves. Durations [0 = 29.16 seconds, 1 = 29.1 seconds, 2 = 29.2 seconds, 3 = 29.09 seconds, 4 = 29.13 seconds, 5 = 29.09 seconds, 6 = 29.22 seconds, 7 = 29.09 seconds]
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Size of file '/qa/services/hudson/static_build_env/jdg/data/william-shakespeare-100MB.txt' is 100623474 bytes
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Value size is '1024' which will produce 98266 keys
17:43:20,987 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 0 wrote 12284 values to the cache with a total size of 12578816 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 1 wrote 12284 values to the cache with a total size of 12577906 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 2 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 3 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,988 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 4 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 5 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 6 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) Slave 7 wrote 12283 values to the cache with a total size of 12577792 bytes
17:43:20,989 INFO  [org.radargun.stages.LoadFileStage] (main) --------------------

These values are consistent with JDG 6.1. Maybe JGroups 3.3.x should print a warning message when UDP.bundler_type="old" is being used?

Comment 4 JBoss JIRA Server 2013-06-04 14:15:59 UTC
Pedro Ruivo <pedroruivo2> made a comment on jira ISPN-3126

In any case, I've sent an email to ispn-dev warning about the use of the new bundler.

Comment 7 JBoss JIRA Server 2013-07-03 13:09:47 UTC
Mircea Markus <mmarkus> made a comment on jira ISPN-3126

Can you please close it once you confirmed that this is not a problem?

Comment 8 Alan Field 2013-07-03 13:49:11 UTC
The performance degradation is not a bug, but JGroups should warn that the old bundler is enabled. I'll enter a new JIRA for that.

Comment 9 JBoss JIRA Server 2013-07-03 14:44:11 UTC
Martin Gencur <mgencur> updated the status of jira ISPN-3126 to Resolved

Comment 10 JBoss JIRA Server 2013-07-03 14:44:11 UTC
Martin Gencur <mgencur> made a comment on jira ISPN-3126

This is not a bug. The performance regression was caused by incorrect JGroups settings.