Bug 1300367 - CacheException: Initial state transfer timed out for cache causes Operation ("deploy") to fail during server restart
Summary: CacheException: Initial state transfer timed out for cache causes Operation (...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering
Version: 6.4.6
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: jboss-set
QA Contact: Michal Vinkler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-20 15:09 UTC by Michal Vinkler
Modified: 2016-03-07 14:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-07 14:17:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Michal Vinkler 2016-01-20 15:09:47 UTC
EAP 6.4.6.CP.CR2
Seen intermittently in this failover test scenario:
http-session-shutdown-dist-sync

Setup: 4 node cluster (perf18, perf19, perf20, perf21), one node at time is shutdown, while standalone clients keep calling the application.

At some point in the test, when perf21 was down, the two nodes perf18 and perf19 started logging TimeoutException: timeout sending message to perf20/web while perf20 started logging TimeoutException: timeout sending message to both perf19/web and perf20/web.

10 seconds later, perf21 was started again. During startup, it received several cluster views, in some of them perf20 was missing completely. Perf21 was also suspected by perf20/ejb several times.

In the end, perf21 logged this error:
[JBossINF] [0m[31m04:20:29,382 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 55) MSC000001: Failed to start service jboss.infinispan.web.dist: org.jboss.msc.service.StartException in service jboss.infinispan.web.dist: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF] 	at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:91) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
[JBossINF] 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
[JBossINF] 	at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.2.Final-redhat-1.jar:2.1.2.Final-redhat-1]
[JBossINF] Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF] 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
[JBossINF] 	at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
[JBossINF] 	at org.infinispan.CacheImpl.start(CacheImpl.java:582)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:559)
[JBossINF] 	at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
[JBossINF] 	at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:104)
[JBossINF] 	at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78)
[JBossINF] 	at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:86) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2]
[JBossINF] 	... 4 more
[JBossINF] Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache dist on perf21/web
[JBossINF] 	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
[JBossINF] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_45]
[JBossINF] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_45]
[JBossINF] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_45]
[JBossINF] 	at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_45]
[JBossINF] 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
[JBossINF] 	... 18 more

.....
[JBossINF] [0m[31m04:20:29,396 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS014612: *Operation ("deploy") failed* - address: ([("deployment" => "clusterbench-ee6.ear")]) - failure description: {"JBAS014671: Failed services" => {
[JBossINF]     "jboss.infinispan.web.default-host/clusterbench-granular" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench-granular: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench-granular on perf21/web",
[JBossINF]     "jboss.infinispan.web.default-host/clusterbench" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench on perf21/web"
[JBossINF] }}


Since the servers started logging TimeoutException errors, clients did not get any valid data, logging:
2016/01/15 04:19:20:881 EST [WARN ][Runner - 547] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:  <org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out>
        org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out
	at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:251)
	at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52)
	at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:103)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
	at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
	at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
	at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
	at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:146)
	... 3 more


We were able to hit the bug also with EAP 6.4.5.CP.CR3 release. The issue occurs approx. in 50% of the runs.

Comment 2 Dominik Pospisil 2016-02-03 12:50:51 UTC
Dev-nacking as there is no customer case.

Comment 3 Petr Penicka 2016-02-08 14:24:56 UTC
Triage: @Brad to find out if this is happening with customers and please comment.

Comment 4 Petr Penicka 2016-02-15 14:37:41 UTC
Triage: Brad to add pm_nack since no customer case


Note You need to log in before you can comment on or make changes to this bug.