| Summary: | performance regression AS 7.0.0.Beta3 + ISPN 5.0.0.CR4 -> AS AS7.1.0.Alpha1-SNAPSHOT + ISPN 5.0.0.CR8 | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Data Grid 6 | Reporter: | Michal Linhard <mlinhard> | ||||||||||||||||||||||||||
| Component: | Infinispan | Assignee: | Default User <jbpapp-maint> | ||||||||||||||||||||||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||||
| Priority: | high | ||||||||||||||||||||||||||||
| Version: | 6.0.0 | CC: | galder.zamarreno, jdg-bugs, mlinhard, nobody, paul.ferraro, prabhat.jha, vladimir.blagojevic | ||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||
| URL: | http://jira.jboss.org/jira/browse/EDG-114 | ||||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||
| Last Closed: | 2011-07-27 15:36:21 UTC | Type: | Bug | ||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
|
Description
Michal Linhard
2011-07-20 13:54:30 UTC
Beta3 had a JGroups binding problem which caused it to bind to internal network without telling it so, so another run emulates this binding - the performance is better but still poor (3x worse) https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/51/artifact/report/log.txt actually the performance is more than 5x worse wrt max throughput I connected via JMX to each EDG6 build running instance and checked the settings
jgroups settings differ in these:
{code}
new old
NAKACK.XmitTableSize 0 3
UDP.MaxBundleTimeout 30 20
{code}
(probably nothing significant)
hotrod server settings are the same
reformulated for build containing ispn 5.0.0.CR8 I profiled perf17 for each build snapshot for old build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/58/artifact/report/jprofiler-snapshot.jps snapshot for new build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/57/artifact/report/jprofiler-snapshot.jps I patched infinispan so when creating DefaultCacheManager it dumps it's config to stdout, this is the first iteration of the results, there might be some issues with this approach yet, I have to debug further. Attachment: Added: config-new-build.txt Attachment: Added: config-old-build.txt Attachment: Added: diff-CR4.txt Attachment: Added: diff-CR8.txt The only noticeable thing that has changed since CR4, is ISPN-1217, but no virtual nodes are configured here, or at least not bigger than 1. I'm having a quick look to the snapshots. attaching updated configs Attachment: Added: config-new-build2.txt Attachment: Added: config-old-build2.txt the configuration of both instances weren't the same, I'm going to prepare new hudson runs with updated/synced configs these config properties were different: (value that was set in the new build tests is in the brackets)
{code}
infinispan.global.transport.strictPeerToPeer = true (new false)
infinispan.configuration.locking.isolationLevel = READ_COMMITTED (new REPEATABLE_READ)
infinispan.configuration.transaction.useSynchronization = false (new true)
infinispan.configuration.eviction.maxEntries = -1 (new 10000)
infinispan.configuration.clustering.stateRetrieval.timeout = 10000 (new 240000)
infinispan.configuration.clustering.hash.rehashWait = 120000 (new 60000)
infinispan.configuration.storeAsBinary.enabled = false (new true)
infinispan.configuration.lazyDeserialization.enabled = false (new true)
{code}
Attachment: Added: config-old-build2.txt patch*.txt contain patches that were used to print the configs Attachment: Added: patch-5.0.0.CR4.txt Attachment: Added: patch-5.0.0.CR8.txt checked JGroupsTransport.asyncExecutor settings (they differ in both containers) AS 7.1.0.Alpha1-SNAPSHOT: org.jboss.threads.JBossThreadPoolExecutor corePoolSize 100 keepAliveTime 1000000000 maximumPoolSize 100 AS 7.0.0.Beta3: java.util.concurrent.ThreadPoolExecutor corePoolSize 25 keepAliveTime 0 maximumPoolSize 25 (I'm not sure however, whether this has any serious impact or, whether asyncExecutor is even in action with DIST_SYNC mode) jprofiler snapshots from one node tests: NEW BUILD: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size1/5/artifact/report/jprofiler-snapshot.jps OLD BUILD: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size1/6/artifact/report/jprofiler-snapshot.jps the multinode tests (clusterSize>numOwners) have a problem of non-determinism (on each test run the keys are distributed differently), which leads to possibly unfair performance comparison (even more so when profiling just one node) this is not a problem of hot rod server only: memcached also degraded between the said versions: old build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-memcached-size4/2/artifact/report/log.txt new build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-memcached-size4/4/artifact/report/log.txt jstack dumps: new build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/60/artifact/report/jstack_dumps/ old build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/61/artifact/report/jstack_dumps/ storeAsBinary + lazyDeserialization being enabled in the new build configuration has serious performance effects due to ISPN-1267 (actually there's no way to configure this in AS7, probably for a good reason, see AS7-1385 for the discussion) this manifested when I created a patched AS7 build where I hardwired storeAsBinary to be false. see results of the build: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/62 (build here: http://www.qa.jboss.com/~mlinhard/edg6-SNAPSHOT-rev0068-ispn-5.0.0.CR8-as7.1.0.Alpha1-SNAPSHOT-110725-2203-patched.zip) Link: Added: This issue depends ISPN-1267 jprofiler snapshot of the patched build run: http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size1/7/artifact/report/jprofiler-snapshot.jps Attachment: Added: as7-patch.txt Michal, can't we run the same configuration described in https://issues.jboss.org/browse/JBPAPP-6865?focusedCommentId=12615897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12615897 that will give us a lot of clues if we made regressions somewhere in our code or we should look for clues in configuration settings? btw the interceptor chains are the same now: 7.1.0.Alpha1-SNAPSHOT --------------------- >> org.infinispan.interceptors.InvocationContextInterceptor >> org.infinispan.interceptors.CacheMgmtInterceptor >> org.infinispan.interceptors.DistTxInterceptor >> org.infinispan.interceptors.NotificationInterceptor >> org.infinispan.interceptors.DistLockingInterceptor >> org.infinispan.interceptors.DistributionInterceptor >> org.infinispan.interceptors.CallInterceptor 7.0.0.Beta3 ----------- >> org.infinispan.interceptors.InvocationContextInterceptor >> org.infinispan.interceptors.CacheMgmtInterceptor >> org.infinispan.interceptors.DistTxInterceptor >> org.infinispan.interceptors.NotificationInterceptor >> org.infinispan.interceptors.DistLockingInterceptor >> org.infinispan.interceptors.DistributionInterceptor >> org.infinispan.interceptors.CallInterceptor Michal, don't forget to check the HR client configurations to see whether there's any differences, and try to provide a couple of snapshots of the client side with the old set up and the new one. yeah, actually I forgot to mention this: all the test runs are done with the same client side infrastructure. the config is the same, smartfrog code is the same, infinispan versions are the same (5.0.0.CR8). Ok, so assuming client side is identical in all tests, it must be something on the server side. I don't think the AS has an impact here since the sockets are managed by ourselves. I tried to compare the server side snapshots of the patched and old config but saw no noticeable difference. It's also hard to compare them cos the number of invocations is different. A way to narrow down the issue might be to compare single server set up with the old and patched set ups cos we'd be comparing the same amount of invocations in both cases. Besides, it could allow us to narrow down the scope of the issue, whether it's core itself, or something clustering related. Btw, here's the list of issues resolved in between CR5 and CR8: http://goo.gl/WpvQw I'm looking into the patched jprofiler snapshot more closely now. I still think we should compare standalone behaviour with this test cos that might give us some clues. For example Michal, comparing the two configs, I see different consistent hash classes being used. The old config used DefaultConsistentHash, and the patched version uses TopologyAwareConsistentHash, and this is not covered in the differences you mentioned above. Btw, we should upgrade Hudson to the latest JProfiler version available to make sure we're getting the right data, and to avoid me seeing funny things as shown earlier with the MVI interceptor calls. Correction: The patched version shows DefaultConsistentHash being used too, so the old and patched version use the same consistent hash class. So, I've gone through the put/get operation stacktraces in the snapshots and they look almost identical at least at the Infinispan level. The only difference I see is the retrieval of the cache from the cache manager. Whereas in the old codebase, getting the cache simply meant accessing the DefaultCacheManager (pretty fast in the old snapshot - 0.9% of time - 9.7 secs for 297.000 invocations), in the new version it's delegating to the central AS managed cache manager via org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache which takes around 4.5% of time - 30 secs for 328.011 invocations). I'd find it hard to believe this is what's making 1/2 of the performance, but this could easily be proved by taking Infinispan 5.0.0.CR8 and running it in AS 7.0.0.Beta3. If the performance is the same as with CR4, then the difference is in AS and not in Infinispan. hmm so far I haven't tried that: ispn 5.0.0.CR8 inside of AS 7.0.0.Beta3, I'm on it ... AS 7.0.0.Beta3 + 5.0.0.CR8 performs very similar to the old config (AS 7.0.0.Beta3 + 5.0.0.CR4) http://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/65/artifact/report/log.txt so we narrowed the problem to AS7 code or EDG6 Alpha integration code or configuration issues. Very cool! I was looking into traces Michal posted and I could not find any regressions in Infinispan! Great work Galder. We both owe you one! I compared the runtime state of executors again
I mean
ASYNC_NOTIFICATION_EXECUTOR and ASYNC_TRANSPORT_EXECUTOR
the EVICTION_SCHEDULED_EXECUTOR and ASYNC_REPLICATION_QUEUE_EXECUTOR aren't used in our runs.
{code}
7.0.0.Beta3
-----------
ASYNC_NOTIFICATION_EXECUTOR
java.util.concurrent.ThreadPoolExecutor
corePoolSize 1
keepAliveTime 0
maximumPoolSize 1
ASYNC_TRANSPORT_EXECUTOR
java.util.concurrent.ThreadPoolExecutor
corePoolSize 25
keepAliveTime 0
maximumPoolSize 25
7.1.0.Alpha1-SNAPSHOT
---------------------
ASYNC_NOTIFICATION_EXECUTOR
org.jboss.threads.JBossThreadPoolExecutor
corePoolSize 100
keepAliveTime 1000000000
maximumPoolSize 100
ASYNC_TRANSPORT_EXECUTOR
org.jboss.threads.JBossThreadPoolExecutor
corePoolSize 100
keepAliveTime 1000000000
maximumPoolSize 100
{code}
Michal, thx for trying out the suggested combo, it's a pretty big step forward cos it means we can eliminate Infinispan from the equation. There's a couple of listeners at least working in the HotRod server. One that calculates cluster ids for the version and another for detecting crashed members. Both are linked to view changes, so don't think they should be having an impact here. Paul Ferraro would be the person to ask on whether any other listeners are attached. Btw, I'll attach a couple of screenshots for him to look at wrt getCache() differences. At first glance, async transport executor should not be a factor here cos caches are configured with DIST_SYNC. Tbh, the pools are just bigger in the new AS instance, so I don't think it's an issue. Btw, any further profiling data should include at least org.jboss.as.* profiling. Gathering profiling data for org.jboss.* might be too much. Attachment: Added: Old-getCache-profiling.png Attachment: Added: Patched-getCache-profiling.png btw we use different jgroups versions in the two builds: AS 7.0.0.Beta3 uses 2.12.0.Final AS 7.1.0.Alpha1-SNAPSHOT uses 2.12.1.Final I was just thinking about that :) - Could potentially be a factor. Something worth trying is putting the same JGroups version + configuration in 7.0.0.Beta3 (+5.0.0.CR8) and see if the performance is still the expected one. jgroups runtime executor config isn't synced as well:
{code}
7.1.0.Alpha1-SNAPSHOT
---------------------
default executor:
org.jboss.threads.JBossThreadPoolExecutor
corePoolSize 100
keepAliveTime 1000000000
maximumPoolSize 100
oob executor:
org.jboss.threads.JBossThreadPoolExecutor
corePoolSize 100
keepAliveTime 1000000000
maximumPoolSize 100
timer executor:
org.jboss.as.threads.ScheduledThreadPoolService$ExecutorImpl
corePoolSize 1
keepAliveTime 0
maximumPoolSize 2147483647
7.0.0.Beta3
-----------
default executor:
org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize 20
keepAliveTime 5000000000
maximumPoolSize 200
oob executor:
org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize 20
keepAliveTime 1000000000
maximumPoolSize 200
timer executor:
org.jgroups.util.ThreadManagerThreadPoolExecutor
corePoolSize 4
keepAliveTime 5000000000
maximumPoolSize 10
{code}
so next step now is to run with synced executor settings and then the jgroups ...
it won't be possible to sync the thread pool configs perfectly: threads subsystem sets corePoolSize=maximumPoolSize, so I'll set the correct maximum and corePoolSize will stay unsynced. Are your JGroups stacks identical for each run? Check for the existence of FLUSH in both stacks. One noteworthy change since 7.0.0.Final was the addition of this cache decoration: https://github.com/jbossas/jboss-as/blob/master/clustering/infinispan/src/main/java/org/jboss/as/clustering/infinispan/ClassLoaderAwareCache.java which adds a new interceptor to the top of the interceptor stack and cache listener decorations. This was added on July 14, so I dunno if your snapshot predates this. This will certainly have some performance implications (due to per-invocation TCCL switching), though I would be surprised if it was solely responsible for the 0.5x performance drop. jgroups stacks are synced, FLUSH is in both. old one: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress-r17561/standalone/configuration/jgroups-datagrid-udp.xml new one: https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress/standalone/configuration/standalone.xml the synced executor settings produced the same performance for 7.1.0.Alpha1-SNAPSHOT: https://hudson.qa.jboss.com/hudson/view/EDG6/job/edg-60-stress-client-hotrod-size4/66/artifact/report/log.txt For future ref, this is what the stack looked like before the synch up: http://pastebin.test.redhat.com/56781 Those JGroups thread pools where resulting in about 1/2 of performance. it's rev 18388 of https://svn.devel.redhat.com/repos/jboss-qa/load-testing/etc/edg-60/stress/standalone/configuration/standalone.xml (should the pastebin expire) closing, as this turned out to be a config issue Link: Added: This issue is related to JBPAPP-6918 Docs QE Status: Removed: NEW |