Hide Forgot
Created attachment 1202903 [details] Performance graph We can see a performance drop and clients getting 503 as a response in our stress tests, which test performance of the cluster under increasing load (number of concurrent clients). The issue was hit when using an invalidation cache + a shared cache store. Scenario description: HTTP traffic accessing clustered web application that has replicated sessions (uses a mod_cluster load balancer). Delay between sending a new request after receiving a response is 100 ms (for each client). Session size is 17 KB. Configuration: - 4-node EAP cluster with an invalidation cache + a shared cache store (remote JDG cluster) - 4-node JDG cluster with distributed cache - 4 nodes generating load (number of clients increases from 400 to 6000) - cache mode: SYNC (for both invalidation and distributed caches, also "write-behind" element is set for "remote-store" element accordingly) Links to configuration files: SYNC scenario: EAP http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-remote-jdg-session-invalidation-sync-4nodes-perf17/9/artifact/report/config/jboss-perf18/standalone-ha.xml JDG http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-remote-jdg-session-invalidation-sync-4nodes-perf17/9/artifact/report/config/jboss-perf22/clustered.xml EAP cluster responded normally until the moment when a load of 5600 concurrent clients was reached. After that, these messages were logged by all the EAP servers: [JBossINF] [0m[0m09:41:43,498 INFO [org.apache.tomcat.util] (ajp-perf19/10.16.90.56:8009-Acceptor-0) JBWEB003008: Maximum number of threads (4096) created for connector with address perf19/10.16.90.56 and port 8009 [JBossINF] [0m[33m09:41:44,346 WARN [org.jboss.as.clustering.web.infinispan] (OOB-20286,shared=tcp) JBAS010325: Possible concurrency problem: Replicated version id 1772 is less than or equal to in-memory version for session 8LjrXm+TGZ-a4f-cPOUIvASX [JBossINF] [0m[33m09:42:09,949 WARN [org.jgroups.protocols.UNICAST2] (OOB-21385,shared=tcp) perf19/web: (requester=perf21/web) message perf21/web::14456540 not found in retransmission table of perf21/web: [JBossINF] [14456345 | 14456345 | 14457490] (811 elements, 334 missing) Log: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-remote-jdg-session-invalidation-sync-4nodes-perf17/9/console-perf19/ From client point of view, jvmRoute started changing around that time: 2016/09/19 09:41:31:166 EDT [INFO ][Runner - 906] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Failover detected, JvmRoute changed. perf20 -> perf18 And later, clients started getting 503. Log: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-remote-jdg-session-invalidation-sync-4nodes-perf17/9/console-perf29/ Also, approx. one minute later, perf20 started logging this error: [JBossINF] [0m[31m09:43:43,622 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host]] (ajp-perf20/10.16.90.58:8009-2946) JBWEB000233: Exception sending request initialized lifecycle event to listener instance of class org.jboss.weld.servlet.WeldListener: java.lang.RuntimeException: JBAS018060: Exception acquiring ownership of ycRzHga1gII1klBJ1vBW5By1 [JBossINF] at org.jboss.as.web.session.ClusteredSession.access(ClusteredSession.java:494) [jboss-as-web-7.5.8.Final-redhat-2-bz-1370648.jar:7.5.8.Final-redhat-SNAPSHOT] [JBossINF] at org.apache.catalina.connector.Request.doGetSession(Request.java:2638) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.Request.getSession(Request.java:2382) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:791) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.jboss.weld.context.beanstore.http.LazySessionBeanStore.getSession(LazySessionBeanStore.java:73) [weld-core-1.1.33.Final-redhat-1.jar:1.1.33.Final-redhat-1] [JBossINF] at org.jboss.weld.context.beanstore.http.LazySessionBeanStore.<init>(LazySessionBeanStore.java:59) [weld-core-1.1.33.Final-redhat-1.jar:1.1.33.Final-redhat-1] [JBossINF] at org.jboss.weld.context.http.HttpSessionContextImpl.associate(HttpSessionContextImpl.java:39) [weld-core-1.1.33.Final-redhat-1.jar:1.1.33.Final-redhat-1] [JBossINF] at org.jboss.weld.context.http.HttpSessionContextImpl.associate(HttpSessionContextImpl.java:22) [weld-core-1.1.33.Final-redhat-1.jar:1.1.33.Final-redhat-1] [JBossINF] at org.jboss.weld.servlet.WeldListener.requestInitialized(WeldListener.java:212) [weld-core-1.1.33.Final-redhat-1.jar:1.1.33.Final-redhat-1] [JBossINF] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:490) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:420) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:926) [jbossweb-7.5.16.Final-redhat-1.jar:7.5.16.Final-redhat-1] [JBossINF] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91] [JBossINF] Caused by: org.jboss.as.clustering.lock.TimeoutException: JBAS010213: Cannot acquire lock default-host/clusterbench-offload/ycRzHga1gII1klBJ1vBW5By1 from cluster [JBossINF] at org.jboss.as.clustering.lock.SharedLocalYieldingClusterLockManager.lock(SharedLocalYieldingClusterLockManager.java:455) [JBossINF] at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.acquireSessionOwnership(DistributedCacheManager.java:400) [JBossINF] at org.jboss.as.web.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:520) [jboss-as-web-7.5.8.Final-redhat-2-bz-1370648.jar:7.5.8.Final-redhat-SNAPSHOT] [JBossINF] at org.jboss.as.web.session.ClusteredSession.access(ClusteredSession.java:492) [jboss-as-web-7.5.8.Final-redhat-2-bz-1370648.jar:7.5.8.Final-redhat-SNAPSHOT] [JBossINF] ... 16 more Log: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-stress-remote-jdg-session-invalidation-sync-4nodes-perf17/9/console-perf20/ See the attachment "Performance graph" for graphs with results. Link to full report with extended CPU and memory statistics: http://download.eng.brq.redhat.com/scratch/mvinkler/reports/2016-09-20_15-55-35/stress.html Horizontal axis represents number of concurrent clients generating load. Vertical axis represents throughput = TPS (number of requests processed per second). Red line represents total TPS. Blue line represents number of requests requests processed in less than 3 seconds. In short: After reaching load of ~5600 clients, throughput dropped significantly and clients started logging errors.
Correction: the invalidation cache was not used in the above mentioned run, so the default cache "dist" in the "web" cache container was used. Nevertheless, these issues are still valid.