Bug 745908 (EDG-114)

Summary: performance regression AS 7.0.0.Beta3 + ISPN 5.0.0.CR4 -> AS AS7.1.0.Alpha1-SNAPSHOT + ISPN 5.0.0.CR8
Product: [JBoss] JBoss Data Grid 6 Reporter: Michal Linhard <mlinhard>
Component: InfinispanAssignee: Default User <jbpapp-maint>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.0.0CC: galder.zamarreno, jdg-bugs, mlinhard, nobody, paul.ferraro, prabhat.jha, vladimir.blagojevic
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/EDG-114
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-27 15:36:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
config-new-build.txt
none
config-old-build.txt
none
diff-CR4.txt
none
diff-CR8.txt
none
config-new-build2.txt
none
config-old-build2.txt
none
config-old-build2.txt
none
patch-5.0.0.CR4.txt
none
patch-5.0.0.CR8.txt
none
as7-patch.txt
none
Old-getCache-profiling.png
none
Patched-getCache-profiling.png none

Description Michal Linhard 2011-07-20 13:54:30 UTC
project_key: EDG

We have a regression

Version: AS 7.0.0.Beta3 + ISPN 5.0.0.CR4
----------------------------------------
build: http://www.qa.jboss.com/~mlinhard/edg6-SNAPSHOT-110609-0949-ispn-5.0.0.CR4.zip
config: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress-r17561/
results: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/49/artifact/report/log.txt

Version: AS 7.1.0.Alpha1-SNAPSHOT + ISPN 5.0.0.CR8
----------------------------------------
infinispan 5.0.0.CR8
AS 7.1.0.Alpha1-SNAPSHOT - last commit 45a663a8be42af628cd65683befa3703d38da25c, Tue Jul 19 09:03:39 2011 +0200
build: http://www.qa.jboss.com/~mlinhard/edg6-SNAPSHOT-rev0067-ispn-5.0.0.CR8-as7.1.0.Alpha1-SNAPSHOT-110721-1138.zip
config: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress/
results: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/53/artifact/report/log.txt

Comment 1 Michal Linhard 2011-07-20 14:22:56 UTC
Beta3 had a JGroups binding problem which caused it to bind to internal network without telling it so, so another run emulates this binding - the performance is better but still poor (3x worse)
https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/51/artifact/report/log.txt

Comment 2 Michal Linhard 2011-07-20 14:31:12 UTC
actually the performance is more than 5x worse wrt max throughput

Comment 3 Michal Linhard 2011-07-20 15:07:32 UTC
I connected via JMX to each EDG6 build running instance and checked the settings

jgroups settings differ in these:
{code}
                                 new         old
NAKACK.XmitTableSize             0           3
UDP.MaxBundleTimeout             30          20
{code}

(probably nothing significant)

hotrod server settings are the same

Comment 4 Michal Linhard 2011-07-21 12:02:20 UTC
reformulated for build containing ispn 5.0.0.CR8

Comment 6 Michal Linhard 2011-07-22 10:58:13 UTC
I patched infinispan so when creating DefaultCacheManager it dumps it's config to stdout, this is the first iteration of the results, there might be some issues with this approach yet, I have to debug further.

Comment 7 Michal Linhard 2011-07-22 10:58:13 UTC
Attachment: Added: config-new-build.txt
Attachment: Added: config-old-build.txt
Attachment: Added: diff-CR4.txt
Attachment: Added: diff-CR8.txt


Comment 8 Galder Zamarreño 2011-07-22 11:57:36 UTC
The only noticeable thing that has changed since CR4, is ISPN-1217, but no virtual nodes are configured here, or at least not bigger than 1. I'm having a quick look to the snapshots.

Comment 9 Michal Linhard 2011-07-22 12:49:09 UTC
attaching updated configs

Comment 10 Michal Linhard 2011-07-22 12:49:09 UTC
Attachment: Added: config-new-build2.txt
Attachment: Added: config-old-build2.txt


Comment 11 Michal Linhard 2011-07-22 12:49:49 UTC
the configuration of both instances weren't the same, I'm going to prepare new hudson runs with updated/synced configs

Comment 12 Michal Linhard 2011-07-22 13:06:30 UTC
these config properties were different: (value that was set in the new build tests is in the brackets)

{code}
infinispan.global.transport.strictPeerToPeer = true (new false)
infinispan.configuration.locking.isolationLevel = READ_COMMITTED (new REPEATABLE_READ)
infinispan.configuration.transaction.useSynchronization = false (new true)
infinispan.configuration.eviction.maxEntries = -1 (new 10000)
infinispan.configuration.clustering.stateRetrieval.timeout = 10000 (new 240000)
infinispan.configuration.clustering.hash.rehashWait = 120000 (new 60000)
infinispan.configuration.storeAsBinary.enabled = false (new true)
infinispan.configuration.lazyDeserialization.enabled = false (new true)
{code}

Comment 13 Michal Linhard 2011-07-22 13:09:07 UTC
Attachment: Added: config-old-build2.txt


Comment 14 Michal Linhard 2011-07-22 13:10:06 UTC
patch*.txt contain patches that were used to print the configs

Comment 15 Michal Linhard 2011-07-22 13:10:06 UTC
Attachment: Added: patch-5.0.0.CR4.txt
Attachment: Added: patch-5.0.0.CR8.txt


Comment 16 Michal Linhard 2011-07-25 12:31:06 UTC
checked JGroupsTransport.asyncExecutor settings (they differ in both containers)

AS 7.1.0.Alpha1-SNAPSHOT:
org.jboss.threads.JBossThreadPoolExecutor
corePoolSize 100
keepAliveTime 1000000000
maximumPoolSize	100	

AS 7.0.0.Beta3:
java.util.concurrent.ThreadPoolExecutor
corePoolSize	25	
keepAliveTime	0	
maximumPoolSize	25	

(I'm not sure however, whether this has any serious impact or, whether asyncExecutor is even in action with DIST_SYNC mode)

Comment 17 Michal Linhard 2011-07-25 15:00:04 UTC
jprofiler snapshots from one node tests:
NEW BUILD:
http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size1/5/artifact/report/jprofiler-snapshot.jps
OLD BUILD:
http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size1/6/artifact/report/jprofiler-snapshot.jps

the multinode tests (clusterSize>numOwners) have a problem of non-determinism (on each test run the keys are distributed differently), which leads to possibly unfair performance comparison (even more so when profiling just one node)

Comment 18 Michal Linhard 2011-07-25 16:05:46 UTC
this is not a problem of hot rod server only:
memcached also degraded between the said versions:
old build:
http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-memcached-size4/2/artifact/report/log.txt
new build:
http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-memcached-size4/4/artifact/report/log.txt

Comment 20 Michal Linhard 2011-07-25 20:32:21 UTC
storeAsBinary + lazyDeserialization being enabled in the new build configuration has serious performance effects due to ISPN-1267
(actually there's no way to configure this in AS7, probably for a good reason, see AS7-1385 for the discussion)

this manifested when I created a patched AS7 build where I hardwired storeAsBinary to be false.
see results of the build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/62
(build here: http://www.qa.jboss.com/~mlinhard/edg6-SNAPSHOT-rev0068-ispn-5.0.0.CR8-as7.1.0.Alpha1-SNAPSHOT-110725-2203-patched.zip)

Comment 21 Michal Linhard 2011-07-25 20:33:02 UTC
Link: Added: This issue depends ISPN-1267


Comment 23 Michal Linhard 2011-07-26 09:07:17 UTC
Attachment: Added: as7-patch.txt


Comment 24 Vladimir Blagojevic 2011-07-26 12:57:05 UTC
Michal, can't we run the same configuration described in https://issues.jboss.org/browse/JBPAPP-6865?focusedCommentId=12615897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12615897 that will give us a lot of clues if we made regressions somewhere in our code or we should look for clues in configuration settings?

Comment 25 Michal Linhard 2011-07-26 13:13:57 UTC
btw the interceptor chains are the same now:

7.1.0.Alpha1-SNAPSHOT
---------------------

	>> org.infinispan.interceptors.InvocationContextInterceptor
	>> org.infinispan.interceptors.CacheMgmtInterceptor
	>> org.infinispan.interceptors.DistTxInterceptor
	>> org.infinispan.interceptors.NotificationInterceptor
	>> org.infinispan.interceptors.DistLockingInterceptor
	>> org.infinispan.interceptors.DistributionInterceptor
	>> org.infinispan.interceptors.CallInterceptor

7.0.0.Beta3
-----------

	>> org.infinispan.interceptors.InvocationContextInterceptor
	>> org.infinispan.interceptors.CacheMgmtInterceptor
	>> org.infinispan.interceptors.DistTxInterceptor
	>> org.infinispan.interceptors.NotificationInterceptor
	>> org.infinispan.interceptors.DistLockingInterceptor
	>> org.infinispan.interceptors.DistributionInterceptor
	>> org.infinispan.interceptors.CallInterceptor

Comment 26 Galder Zamarreño 2011-07-26 17:48:33 UTC
Michal, don't forget to check the HR client configurations to see whether there's any differences, and try to provide a couple of snapshots of the client side with the old set up and the new one.

Comment 27 Michal Linhard 2011-07-26 19:06:19 UTC
yeah, actually I forgot to mention this:

all the test runs are done with the same client side infrastructure. the config is the same, smartfrog code is the same, infinispan versions are the same (5.0.0.CR8).

Comment 28 Galder Zamarreño 2011-07-27 06:25:55 UTC
Ok, so assuming client side is identical in all tests, it must be something on the server side. I don't think the AS has an impact here since the sockets are managed by ourselves. I tried to compare the server side snapshots of the patched and old config but saw no noticeable difference. It's also hard to compare them cos the number of invocations is different.

A way to narrow down the issue might be to compare single server set up with the old and patched set ups cos we'd be comparing the same amount of invocations in both cases. Besides, it could allow us to narrow down the scope of the issue, whether it's core itself, or something clustering related.

Comment 29 Galder Zamarreño 2011-07-27 08:16:37 UTC
Btw, here's the list of issues resolved in between CR5 and CR8: http://goo.gl/WpvQw

I'm looking into the patched jprofiler snapshot more closely now. I still think we should compare standalone behaviour with this test cos that might give us some clues.

For example Michal, comparing the two configs, I see different consistent hash classes being used. The old config used DefaultConsistentHash, and the patched version uses TopologyAwareConsistentHash, and this is not covered in the differences you mentioned above.

Btw, we should upgrade Hudson to the latest JProfiler version available to make sure we're getting the right data, and to avoid me seeing funny things as shown earlier with the MVI interceptor calls.

Comment 30 Galder Zamarreño 2011-07-27 08:19:45 UTC
Correction: The patched version shows DefaultConsistentHash being used too, so the old and patched version use the same consistent hash class.

Comment 31 Galder Zamarreño 2011-07-27 08:39:16 UTC
So, I've gone through the put/get operation stacktraces in the snapshots and they look almost identical at least at the Infinispan level.

The only difference I see is the retrieval of the cache from the cache manager. Whereas in the old codebase, getting the cache simply meant accessing the DefaultCacheManager (pretty fast in the old snapshot - 0.9% of time - 9.7 secs for 297.000 invocations), in the new version it's delegating to the central AS managed cache manager via org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache which takes around 4.5% of time - 30 secs for 328.011 invocations). I'd find it hard to believe this is what's making 1/2 of the performance, but this could easily be proved by taking Infinispan 5.0.0.CR8 and running it in AS 7.0.0.Beta3. If the performance is the same as with CR4, then the difference is in AS and not in Infinispan.

Comment 32 Michal Linhard 2011-07-27 09:55:59 UTC
hmm so far I haven't tried that: ispn 5.0.0.CR8 inside of AS 7.0.0.Beta3, I'm on it ...

Comment 33 Michal Linhard 2011-07-27 11:35:10 UTC
AS 7.0.0.Beta3 + 5.0.0.CR8 performs very similar to the old config (AS 7.0.0.Beta3 + 5.0.0.CR4)
http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/65/artifact/report/log.txt

so we narrowed the problem to AS7 code or EDG6 Alpha integration code or configuration issues.


Comment 34 Vladimir Blagojevic 2011-07-27 12:14:24 UTC
Very cool! I was looking into traces Michal posted and I could not find any regressions in Infinispan! Great work Galder. We both owe you one! 

Comment 35 Michal Linhard 2011-07-27 12:22:59 UTC
I compared the runtime state of executors again
I mean
ASYNC_NOTIFICATION_EXECUTOR and ASYNC_TRANSPORT_EXECUTOR
the EVICTION_SCHEDULED_EXECUTOR and ASYNC_REPLICATION_QUEUE_EXECUTOR aren't used in our runs.

{code}
7.0.0.Beta3
-----------

ASYNC_NOTIFICATION_EXECUTOR

java.util.concurrent.ThreadPoolExecutor
corePoolSize	1	
keepAliveTime	0	
maximumPoolSize	1	

ASYNC_TRANSPORT_EXECUTOR

java.util.concurrent.ThreadPoolExecutor

corePoolSize	25	
keepAliveTime	0	
maximumPoolSize	25	

7.1.0.Alpha1-SNAPSHOT
---------------------

ASYNC_NOTIFICATION_EXECUTOR

org.jboss.threads.JBossThreadPoolExecutor
corePoolSize	100	
keepAliveTime	1000000000	
maximumPoolSize	100	

ASYNC_TRANSPORT_EXECUTOR

org.jboss.threads.JBossThreadPoolExecutor
corePoolSize	100	
keepAliveTime	1000000000	
maximumPoolSize	100	
{code}

Comment 36 Galder Zamarreño 2011-07-27 13:00:05 UTC
Michal, thx for trying out the suggested combo, it's a pretty big step forward cos it means we can eliminate Infinispan from the equation.

There's a couple of listeners at least working in the HotRod server. One that calculates cluster ids for the version and another for detecting crashed members. Both are linked to view changes, so don't think they should be having an impact here. Paul Ferraro would be the person to ask on whether any other listeners are attached. Btw, I'll attach a couple of screenshots for him to look at wrt getCache() differences. At first glance, async transport executor should not be a factor here cos caches are configured with DIST_SYNC.

Tbh, the pools are just bigger in the new AS instance, so I don't think it's an issue.

Btw, any further profiling data should include at least org.jboss.as.* profiling. Gathering profiling data for org.jboss.* might be too much.

Comment 37 Galder Zamarreño 2011-07-27 13:01:17 UTC
Attachment: Added: Old-getCache-profiling.png


Comment 38 Galder Zamarreño 2011-07-27 13:01:37 UTC
Attachment: Added: Patched-getCache-profiling.png


Comment 39 Michal Linhard 2011-07-27 13:04:11 UTC
btw we use different jgroups versions in the two builds:

AS 7.0.0.Beta3 uses 2.12.0.Final
AS 7.1.0.Alpha1-SNAPSHOT uses 2.12.1.Final

Comment 40 Galder Zamarreño 2011-07-27 13:12:43 UTC
I was just thinking about that :) - Could potentially be a factor. Something worth trying is putting the same JGroups version + configuration in 7.0.0.Beta3 (+5.0.0.CR8) and see if the performance is still the expected one.

Comment 41 Michal Linhard 2011-07-27 13:19:44 UTC
jgroups runtime executor config isn't synced as well:

{code}
7.1.0.Alpha1-SNAPSHOT
---------------------

default executor:

org.jboss.threads.JBossThreadPoolExecutor
corePoolSize	100	
keepAliveTime	1000000000	
maximumPoolSize	100	

oob executor:

org.jboss.threads.JBossThreadPoolExecutor
corePoolSize	100	
keepAliveTime	1000000000	
maximumPoolSize	100	

timer executor:

org.jboss.as.threads.ScheduledThreadPoolService$ExecutorImpl
corePoolSize	1	
keepAliveTime	0	
maximumPoolSize	2147483647	

7.0.0.Beta3
-----------

default executor:

org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize	20	
keepAliveTime	5000000000	
maximumPoolSize	200	

oob executor:

org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize	20	
keepAliveTime	1000000000	
maximumPoolSize	200	

timer executor:

org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize	4	
keepAliveTime	5000000000	
maximumPoolSize	10	
{code}

so next step now is to run with synced executor settings and then the jgroups ...

Comment 42 Michal Linhard 2011-07-27 13:45:18 UTC
it won't be possible to sync the thread pool configs perfectly: threads subsystem sets corePoolSize=maximumPoolSize, so I'll set the correct maximum and corePoolSize will stay unsynced.

Comment 43 Paul Ferraro 2011-07-27 13:56:20 UTC
Are your JGroups stacks identical for each run?  Check for the existence of FLUSH in both stacks.
One noteworthy change since 7.0.0.Final was the addition of this cache decoration:
https://github.com/jbossas/jboss-as/blob/master/clustering/infinispan/src/main/java/org/jboss/as/clustering/infinispan/ClassLoaderAwareCache.java
which adds a new interceptor to the top of the interceptor stack and cache listener decorations.  This was added on July 14, so I dunno if your snapshot predates this.  This will certainly have some performance implications (due to per-invocation TCCL switching), though I would be surprised if it was solely  responsible for the 0.5x performance drop.

Comment 45 Michal Linhard 2011-07-27 14:41:46 UTC
the synced executor settings produced the same performance for 7.1.0.Alpha1-SNAPSHOT:
https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/66/artifact/report/log.txt

Comment 46 Galder Zamarreño 2011-07-27 14:43:13 UTC
For future ref, this is what the stack looked like before the synch up: http://pastebin.test.redhat.com/56781

Those JGroups thread pools where resulting in about 1/2 of performance.

Comment 47 Michal Linhard 2011-07-27 14:48:52 UTC
it's rev 18388 of https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress/standalone/configuration/standalone.xml
(should the pastebin expire)

Comment 48 Michal Linhard 2011-07-27 15:36:21 UTC
closing, as this turned out to be a config issue

Comment 49 Michal Linhard 2011-07-28 12:59:09 UTC
Link: Added: This issue is related to JBPAPP-6918


Comment 50 Anne-Louise Tangring 2011-10-11 17:09:35 UTC
Docs QE Status: Removed: NEW