EAP 6.4.6.CP.CR2 Seen intermittently in this failover test scenario: http-session-shutdown-dist-sync Setup: 4 node cluster (perf18, perf19, perf20, perf21), one node at time is shutdown, while standalone clients keep calling the application. At some point in the test, when perf21 was down, the two nodes perf18 and perf19 started logging TimeoutException: timeout sending message to perf20/web while perf20 started logging TimeoutException: timeout sending message to both perf19/web and perf20/web. 10 seconds later, perf21 was started again. During startup, it received several cluster views, in some of them perf20 was missing completely. Perf21 was also suspected by perf20/ejb several times. In the end, perf21 logged this error: [JBossINF] [0m[31m04:20:29,382 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 55) MSC000001: Failed to start service jboss.infinispan.web.dist: org.jboss.msc.service.StartException in service jboss.infinispan.web.dist: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:91) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2] [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45] [JBossINF] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45] [JBossINF] at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45] [JBossINF] at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.2.Final-redhat-1.jar:2.1.2.Final-redhat-1] [JBossINF] Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205) [JBossINF] at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886) [JBossINF] at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657) [JBossINF] at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646) [JBossINF] at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549) [JBossINF] at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217) [JBossINF] at org.infinispan.CacheImpl.start(CacheImpl.java:582) [JBossINF] at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686) [JBossINF] at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649) [JBossINF] at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545) [JBossINF] at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:559) [JBossINF] at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113) [JBossINF] at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:104) [JBossINF] at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78) [JBossINF] at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:86) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2] [JBossINF] ... 4 more [JBossINF] Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache dist on perf21/web [JBossINF] at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216) [JBossINF] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_45] [JBossINF] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_45] [JBossINF] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_45] [JBossINF] at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_45] [JBossINF] at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203) [JBossINF] ... 18 more ..... [JBossINF] [0m[31m04:20:29,396 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS014612: *Operation ("deploy") failed* - address: ([("deployment" => "clusterbench-ee6.ear")]) - failure description: {"JBAS014671: Failed services" => { [JBossINF] "jboss.infinispan.web.default-host/clusterbench-granular" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench-granular: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench-granular on perf21/web", [JBossINF] "jboss.infinispan.web.default-host/clusterbench" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl [JBossINF] Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench on perf21/web" [JBossINF] }} Since the servers started logging TimeoutException errors, clients did not get any valid data, logging: 2016/01/15 04:19:20:881 EST [WARN ][Runner - 547] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data: <org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out> org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:251) at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52) at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:103) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:146) ... 3 more We were able to hit the bug also with EAP 6.4.5.CP.CR3 release. The issue occurs approx. in 50% of the runs.
Link to the Jenkins run: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/ Server logs: http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/console-perf18/ http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/console-perf19/ http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/console-perf20/ http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/console-perf21/ Client log (size 2.5 GB): http://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/99/console-perf17/
Dev-nacking as there is no customer case.
Triage: @Brad to find out if this is happening with customers and please comment.
Triage: Brad to add pm_nack since no customer case