Bug 856101

Summary: Cache cannot shut down when disconnected
Product: [JBoss] JBoss Data Grid 6 Reporter: Radim Vansa <rvansa>
Component: InfinispanAssignee: Tristan Tarrant <ttarrant>
Status: CLOSED NOTABUG QA Contact: Martin Gencur <mgencur>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.0.1CC: gsheldon, jdg-bugs, myarboro
Target Milestone: ---   
Target Release: 6.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
When the cache is disconnected from the network, it may become impossible to shut down. As a result, graceful node shutdown not possible. As a workaround, the process can be killed manually. The node may continue to run if it is not manually killed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-13 14:31:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Radim Vansa 2012-09-11 08:22:49 UTC
Description of problem:
When the DISCARD protocol is set to discard all messages and then the cache is shut down using cacheManager.stop() an infinite loop is entered throwing these exceptions:

09:59:23,538 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) Installing new view CacheView{viewId=11, members=[rvansa-31823]} for cache x
09:59:23,539 ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) ISPN000172: Failed to prepare view CacheView{viewId=11, members=[rvansa-31823]} for cache  x, rolling back to view CacheView{viewId=10, members=[rvansa-12248, rvansa-31823, rvansa-2915, rvansa-56921]}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: x: Received cache view prepare request after the local node has already shut down
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:323)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
	at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:894)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalStateException: x: Received cache view prepare request after the local node has already shut down
	at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:494)
	at org.infinispan.cacheviews.CacheViewsManagerImpl$3.call(CacheViewsManagerImpl.java:314)
	... 5 more
09:59:23,540 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) x: Rolling back to cache view 10, new view id is 12
09:59:23,540 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) Rolled back to view CacheView{viewId=10, members=[rvansa-12248, rvansa-31823, rvansa-2915, rvansa-56921]} for cache x
09:59:24,538 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) Installing new view CacheView{viewId=13, members=[rvansa-31823]} for cache x
09:59:24,539 ERROR [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) ISPN000172: Failed to prepare view CacheView{viewId=13, members=[rvansa-31823]} for cache  x, rolling back to view CacheView{viewId=12, members=[rvansa-12248, rvansa-31823, rvansa-2915, rvansa-56921]}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: x: Received cache view prepare request after the local node has already shut down
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
	at java.util.concurrent.FutureTask.get(FutureTask.java:119)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:323)
	at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
	at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:894)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalStateException: x: Received cache view prepare request after the local node has already shut down
	at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:494)
	at org.infinispan.cacheviews.CacheViewsManagerImpl$3.call(CacheViewsManagerImpl.java:314)
	... 5 more
09:59:24,539 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) x: Rolling back to cache view 12, new view id is 14
09:59:24,540 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,rvansa-31823) Rolled back to view CacheView{viewId=12, members=[rvansa-12248, rvansa-31823, rvansa-2915, rvansa-56921]} for cache x

Comment 2 JBoss JIRA Server 2012-09-13 07:11:11 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-2283

I've found that this can occur not only when the node is disconnected but also when the whole cluster is shutting down and therefore the cache view is changed after the node has shut down.

Comment 3 JBoss JIRA Server 2012-09-13 12:43:02 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-2283

In fact the loop is not as infinite as I thought at first: DefaultCacheManager stops all caches and THEN stops the CacheViewManagerImpl. However, when stopping the cache, CacheViewManagerImpl.leave(cacheName) is called, which gets stuck in transport.invokeRemotely(request leave command), until a timeout in RSVP protocol which was set to 10 minutes. I'll check why the timeout is as high.

Comment 4 JBoss JIRA Server 2012-09-13 13:51:35 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-2283

In fact the loop is not as infinite as I thought at first: DefaultCacheManager stops all caches and THEN stops the CacheViewManagerImpl. However, when stopping the cache, CacheViewManagerImpl.leave(cacheName) is called, which gets stuck in transport.invokeRemotely(request leave command), until a transport.distributedSyncTimeout elapsed which was set to 10 minutes. I'll check why the timeout is as high.

Comment 5 JBoss JIRA Server 2012-09-13 14:24:01 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-2283

In fact the loop is not as infinite as I thought at first: DefaultCacheManager stops all caches and THEN stops the CacheViewManagerImpl. However, when stopping the cache, CacheViewManagerImpl.leave(cacheName) is called, which gets stuck in transport.invokeRemotely(request leave command), until a RSVP protocol timeout elapsed which was set to 10 minutes.

Comment 6 JBoss JIRA Server 2012-09-13 14:29:47 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-2283 to Resolved

Comment 7 JBoss JIRA Server 2012-09-13 14:29:47 UTC
Radim Vansa <rvansa> made a comment on jira ISPN-2283

Not a bug - caused by improper configuration of RSVP protocol, was just waiting too long due to configuration.

Comment 8 JBoss JIRA Server 2012-09-13 14:30:13 UTC
Radim Vansa <rvansa> updated the status of jira ISPN-2283 to Closed