Bug 1300367 - CacheException: Initial state transfer timed out for cache causes Operation ("deploy") to fail during server restart
CacheException: Initial state transfer timed out for cache causes Operation (...
Status: CLOSED WONTFIX
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering (Show other bugs)
6.4.6
Unspecified Unspecified
high Severity unspecified
: ---
: ---
Assigned To: jboss-set
Michal Vinkler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-20 10:09 EST by Michal Vinkler
Modified: 2016-03-07 09:17 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-07 09:17:29 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michal Vinkler 2016-01-20 10:09:47 EST
EAP 6.4.6.CP.CR2
Seen intermittently in this failover test scenario:
http-session-shutdown-dist-sync

Setup: 4 node cluster (perf18, perf19, perf20, perf21), one node at time is shutdown, while standalone clients keep calling the application.

At some point in the test, when perf21 was down, the two nodes perf18 and perf19 started logging TimeoutException: timeout sending message to perf20/web while perf20 started logging TimeoutException: timeout sending message to both perf19/web and perf20/web.

10 seconds later, perf21 was started again. During startup, it received several cluster views, in some of them perf20 was missing completely. Perf21 was also suspected by perf20/ejb several times.

In the end, perf21 logged this error:
[JBossINF] [0m[31m04:20:29,382 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 55) MSC000001: Failed to start service jboss.infinispan.web.dist: org.jboss.msc.service.StartException in service jboss.infinispan.web.dist: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF] 	at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:91) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
[JBossINF] 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
[JBossINF] 	at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.2.Final-redhat-1.jar:2.1.2.Final-redhat-1]
[JBossINF] Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF] 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:886)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:657)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:646)
[JBossINF] 	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:549)
[JBossINF] 	at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:217)
[JBossINF] 	at org.infinispan.CacheImpl.start(CacheImpl.java:582)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:686)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:649)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:545)
[JBossINF] 	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:559)
[JBossINF] 	at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
[JBossINF] 	at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:104)
[JBossINF] 	at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:78)
[JBossINF] 	at org.jboss.as.clustering.msc.AsynchronousService$1.run(AsynchronousService.java:86) [jboss-as-clustering-common-7.5.6.Final-redhat-2.jar:7.5.6.Final-redhat-2]
[JBossINF] 	... 4 more
[JBossINF] Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache dist on perf21/web
[JBossINF] 	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:216)
[JBossINF] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_45]
[JBossINF] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_45]
[JBossINF] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_45]
[JBossINF] 	at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_45]
[JBossINF] 	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
[JBossINF] 	... 18 more

.....
[JBossINF] [0m[31m04:20:29,396 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS014612: *Operation ("deploy") failed* - address: ([("deployment" => "clusterbench-ee6.ear")]) - failure description: {"JBAS014671: Failed services" => {
[JBossINF]     "jboss.infinispan.web.default-host/clusterbench-granular" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench-granular: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench-granular on perf21/web",
[JBossINF]     "jboss.infinispan.web.default-host/clusterbench" => "org.jboss.msc.service.StartException in service jboss.infinispan.web.default-host/clusterbench: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
[JBossINF]     Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache default-host/clusterbench on perf21/web"
[JBossINF] }}


Since the servers started logging TimeoutException errors, clients did not get any valid data, logging:
2016/01/15 04:19:20:881 EST [WARN ][Runner - 547] HOST perf17.mw.lab.eng.bos.redhat.com:rootProcess:c - Error sampling data:  <org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out>
        org.jboss.smartfrog.loaddriver.RequestProcessingException: IO error: java.net.SocketTimeoutException: Read timed out
	at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:251)
	at org.jboss.smartfrog.loaddriver.CompoundRequestProcessorFactoryImpl$CompoundRequestProcessor.processRequest(CompoundRequestProcessorFactoryImpl.java:52)
	at org.jboss.smartfrog.loaddriver.Runner.run(Runner.java:103)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
	at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
	at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
	at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
	at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.jboss.smartfrog.loaddriver.http.HttpRequestProcessorFactoryImpl$HttpRequestProcessor.processRequest(HttpRequestProcessorFactoryImpl.java:146)
	... 3 more


We were able to hit the bug also with EAP 6.4.5.CP.CR3 release. The issue occurs approx. in 50% of the runs.
Comment 2 Dominik Pospisil 2016-02-03 07:50:51 EST
Dev-nacking as there is no customer case.
Comment 3 Petr Penicka 2016-02-08 09:24:56 EST
Triage: @Brad to find out if this is happening with customers and please comment.
Comment 4 Petr Penicka 2016-02-15 09:37:41 EST
Triage: Brad to add pm_nack since no customer case

Note You need to log in before you can comment on or make changes to this bug.