Seen the following exception while testing 6.2.0.ER7 in Amazon EC2. Likely unrelated to EC2 though, as it seems very similar to BZ 993041. [JBossINF] 09:52:42,723 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host]] (ajp-ec2-54-234-10-125.compute-1.amazonaws.com/10.240.62.148:8009-27) JBWEB000233: Exception sending request initialized lifecycle event to listener instance of class org.jboss.weld.servlet.WeldListener: java.lang.RuntimeException: JBAS018060: Exception acquiring ownership of Ii1200JBAh6THJcuF44u0mbC [JBossINF] at org.jboss.as.web.session.ClusteredSession.access(ClusteredSession.java:494) [jboss-as-web.jar:7.3.0.Final-redhat-10] [JBossINF] at org.apache.catalina.connector.Request.doGetSession(Request.java:2616) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.Request.getSession(Request.java:2361) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:790) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.jboss.weld.context.beanstore.http.LazySessionBeanStore.getSession(LazySessionBeanStore.java:73) [weld-core.jar:1.1.16.Final-redhat-1] [JBossINF] at org.jboss.weld.context.beanstore.http.LazySessionBeanStore.<init>(LazySessionBeanStore.java:59) [weld-core.jar:1.1.16.Final-redhat-1] [JBossINF] at org.jboss.weld.context.http.HttpSessionContextImpl.associate(HttpSessionContextImpl.java:28) [weld-core.jar:1.1.16.Final-redhat-1] [JBossINF] at org.jboss.weld.context.http.HttpSessionContextImpl.associate(HttpSessionContextImpl.java:16) [weld-core.jar:1.1.16.Final-redhat-1] [JBossINF] at org.jboss.weld.servlet.WeldListener.requestInitialized(WeldListener.java:145) [weld-core.jar:1.1.16.Final-redhat-1] [JBossINF] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:134) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:97) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:102) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:336) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:488) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.coyote.ajp.AjpProtocol$AjpConnectionHandler.process(AjpProtocol.java:420) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:920) [jbossweb.jar:7.2.2.Final-redhat-1] [JBossINF] at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_24] [JBossINF] Caused by: org.jboss.as.clustering.lock.TimeoutException: JBAS010213: Cannot acquire lock default-host/clusterbench/Ii1200JBAh6THJcuF44u0mbC from cluster [JBossINF] at org.jboss.as.clustering.lock.SharedLocalYieldingClusterLockManager.lock(SharedLocalYieldingClusterLockManager.java:439) [JBossINF] at org.jboss.as.clustering.web.infinispan.DistributedCacheManager.acquireSessionOwnership(DistributedCacheManager.java:400) [JBossINF] at org.jboss.as.web.session.ClusteredSession.acquireSessionOwnership(ClusteredSession.java:520) [jboss-as-web.jar:7.3.0.Final-redhat-10] [JBossINF] at org.jboss.as.web.session.ClusteredSession.access(ClusteredSession.java:492) [jboss-as-web.jar:7.3.0.Final-redhat-10] [JBossINF] ... 16 more The exception is thrown upon taking one of the server in the cluster down. Sessions that belonged to the failed node seem to go in vain with the server: new sessions are created upon attempt to access server using already existing sessions, while it is expected that failover would proceed the request. This implies the client will lose all their session data. The web cache-container default cache is "repl".
There are also other relevant server exceptions. In my opinion they are related to the exception in the bug description so I'm not filing another issue but adding to this one. > 09:52:43,121 WARN [org.jboss.as.clustering.web.infinispan] (Incoming-8,shared=tcp) JBAS010325: Possible concurrency problem: Replicated version id 31 is less than or equal to in-memory version for session jt-Hvt0b9x0YDY088h83XR8h ... > 09:54:26,168 ERROR [org.jboss.as.clustering] (Incoming-1,shared=tcp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address 8abf6d5e-45dc-c7dc-2f86-72794f5f87ff not registered in transport layer ... > 09:54:26,687 ERROR [org.jboss.as.clustering] (Incoming-10,shared=tcp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address a87af105-2853-f02d-cee2-041c74bd1399 not registered in transport layer ... > 09:54:26,825 WARN [org.jgroups.protocols.TCP] (ajp-ec2-54-234-10-125.compute-1.amazonaws.com/10.240.62.148:8009-39) null: logical address cache didn't contain all physical address, sending up a discovery request ... All the above lines are from config/jboss-ec2-54-234-10-125.compute-1.amazonaws.com/server.log Server logs of all servers participating in the test scenario can be found in the already attached archive. FYI during the test there were 4 servers in a cluster and 200 concurrent clients. One by one servers are killed (kill -9) and then started again. I see failures after 3 of the four server kills. Which is IMO very strange. I'm running the same test dozens of times without being able to reproduce. So having this issue happen 3 out of 4 times in a single test must mean something. I'll be running the test till tomorrow but I'm losing hope we can reproduce. P.S. for easier log reading, here are the times at which each server was killed: > domU-12-31-39-04-39-66_sf.log:2013/11/06 09:55:36:358 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'o8POK1sF4CPtw7QU' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > domU-12-31-39-06-90-08_sf.log:2013/11/06 09:51:49:086 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'HLDqw0ark9XcMNCM' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > ip-10-138-52-150_sf.log:2013/11/06 10:03:05:153 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep 'Hdf06hHdi5gDdU7g' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`; > ip-10-151-106-31_sf.log:2013/11/06 09:59:19:216 EST [DEBUG][TerminatorThread] HOST domU-12-31-39-07-C5-96.compute-1.internal:rootProcess:failoverTestExecution - Issuing command: sh -c kill -9 `ps xfu | grep '1aMS9DHcEPZFda3K' | grep -v 'grep' | sed -re '1,$s/[ \t]+/ /g' | cut -d ' ' -f 2`;
Running for extended periods of time I couldn't reproduce the above issue. The only thing is that in 3 runs relatively close to each other in time, I got total of 4 failed samples: > SFCORE_LOG - Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: Stale session data received. Expected 140, received 139, Runner: 186> I think this is unrelated though and expected to happen rarely.
*** This bug has been marked as a duplicate of bug 993041 ***