Bug 908333

Summary:	DIST: Failed to start rebalance: null: java.lang.InterruptedException at server shutdown
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Jitka Kozana <jkudrnac>
Component:	Clustering	Assignee:	Paul Ferraro <paul.ferraro>
Status:	CLOSED DEFERRED	QA Contact:	Jitka Kozana <jkudrnac>
Severity:	medium	Docs Contact:	Russell Dickenson <rdickens>
Priority:	unspecified
Version:	6.1.1, 6.2.0	CC:	cdewolf, jkudrnac, lcosti, lthon, myarboro, rhatlapa, rhusar, rjanik, rsvoboda
Target Milestone:	---	Keywords:	Reopened
Target Release:	TBD EAP 7
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-19 11:04:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jitka Kozana 2013-02-06 13:04:46 UTC

EAP 6.1.0.DR3

When using DIST cache, we are seeing the following exception at (graceful) server shutdown:

04:15:47,719 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) ISPN000082: Stopping the RpcDispatcher
04:15:47,721 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-11) Failed to start rebalance: null: java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1302) [rt.jar:1.6.0_38]
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:227) [rt.jar:1.6.0_38]
	at java.util.concurrent.FutureTask.get(FutureTask.java:91) [rt.jar:1.6.0_38]
	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:567)
	at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:410)
	at org.infinispan.topology.ClusterTopologyManagerImpl.startRebalance(ClusterTopologyManagerImpl.java:399)
	at org.infinispan.topology.ClusterTopologyManagerImpl.access$000(ClusterTopologyManagerImpl.java:66)
	at org.infinispan.topology.ClusterTopologyManagerImpl$1.call(ClusterTopologyManagerImpl.java:128)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [rt.jar:1.6.0_38]
	at java.util.concurrent.FutureTask.run(FutureTask.java:138) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]

04:15:47,724 INFO  [org.jboss.as] (MSC service thread 1-8) JBAS015950: JBoss EAP 6.1.0.DR3 (AS 7.2.0.Alpha1-redhat-3) stopped in 33738ms

See the full log here:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Failover/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-sync-tcpStack/2/artifact/report/config/jboss-perf18/server.log

Comment 1 Ladislav Thon 2013-03-01 08:27:38 UTC

Seen something that looks related:

WARN  [org.infinispan.topology.CacheTopologyControlCommand] (transport-thread-9) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=dist, type=REBALANCE_CONFIRM, sender=perf18/ejb, joinInfo=null, topologyId=25, currentCH=null, pendingCH=null, throwable=null, viewId=10}: org.infinispan.CacheException: Received invalid rebalance confirmation from perf18/ejb for cache dist, we don't have a rebalance in progress
[JBossINF] 	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:206)
[JBossINF] 	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
[JBossINF] 	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
[JBossINF] 	at org.infinispan.topology.LocalTopologyManagerImpl$1.call(LocalTopologyManagerImpl.java:281)
[JBossINF] 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [rt.jar:1.6.0_38]
[JBossINF] 	at java.util.concurrent.FutureTask.run(FutureTask.java:138) [rt.jar:1.6.0_38]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
[JBossINF] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
[JBossINF] 	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]

Which also looks similar to https://issues.jboss.org/browse/ISPN-2349

Full log: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Clustering/view/EAP6-Failover/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-sync-3owners/6/console-perf18/

This is EAP 6.1.0.ER1.

Comment 2 Ladislav Thon 2013-03-01 08:49:38 UTC

And this one looks related too:

ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-22) Failed to start rebalance: org.infinispan.CacheException: Remote (perf20/ejb) failed unexpectedly: java.util.concurrent.ExecutionException: org.infinispan.CacheException: Remote (perf20/ejb) failed unexpectedly
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) [rt.jar:1.6.0_38]
	at java.util.concurrent.FutureTask.get(FutureTask.java:91) [rt.jar:1.6.0_38]
	at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:567)
	at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:410)
	at org.infinispan.topology.ClusterTopologyManagerImpl.startRebalance(ClusterTopologyManagerImpl.java:399)
	at org.infinispan.topology.ClusterTopologyManagerImpl.access$000(ClusterTopologyManagerImpl.java:66)
	at org.infinispan.topology.ClusterTopologyManagerImpl$1.call(ClusterTopologyManagerImpl.java:128)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [rt.jar:1.6.0_38]
	at java.util.concurrent.FutureTask.run(FutureTask.java:138) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]
Caused by: org.infinispan.CacheException: Remote (perf20/ejb) failed unexpectedly
	at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:96)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:541)
	at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:549)
	at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:546)
	... 5 more
Caused by: java.lang.IllegalStateException: transport was closed
	at org.jgroups.blocks.GroupRequest.transportClosed(GroupRequest.java:273)
	at org.jgroups.blocks.RequestCorrelator.stop(RequestCorrelator.java:269)
	at org.jgroups.blocks.MessageDispatcher.stop(MessageDispatcher.java:152)
	at org.jgroups.blocks.MessageDispatcher.channelDisconnected(MessageDispatcher.java:455)
	at org.jgroups.Channel.notifyChannelDisconnected(Channel.java:507)
	at org.jgroups.JChannel.disconnect(JChannel.java:363)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.stop(JGroupsTransport.java:258)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_38]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_38]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_38]
	at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_38]
	at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203)
	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:883)
	at org.infinispan.factories.AbstractComponentRegistry.internalStop(AbstractComponentRegistry.java:690)
	at org.infinispan.factories.AbstractComponentRegistry.stop(AbstractComponentRegistry.java:568)
	at org.infinispan.factories.GlobalComponentRegistry.stop(GlobalComponentRegistry.java:260)
	at org.infinispan.manager.DefaultCacheManager.stop(DefaultCacheManager.java:739)
	at org.infinispan.manager.AbstractDelegatingEmbeddedCacheManager.stop(AbstractDelegatingEmbeddedCacheManager.java:179)
	at org.jboss.as.clustering.infinispan.subsystem.EmbeddedCacheManagerService.stop(EmbeddedCacheManagerService.java:76)
	at org.jboss.msc.service.ServiceControllerImpl$StopTask.stopService(ServiceControllerImpl.java:1911) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
	at org.jboss.msc.service.ServiceControllerImpl$StopTask.run(ServiceControllerImpl.java:1874) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
	... 3 more

Full log: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EAP6/view/EAP6-Clustering/view/EAP6-Failover/job/eap-6x-failover-ejb-ejbremote-undeploy-dist-async/10/console-perf18/

Again EAP 6.1.0.ER1.

Comment 3 Ladislav Thon 2013-03-20 12:22:54 UTC

Maybe related: https://bugzilla.redhat.com/show_bug.cgi?id=923769

Comment 4 Ladislav Thon 2013-05-14 14:37:38 UTC

Still seeing this with EAP 6.1.0.ER8. For example:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-undeploy-dist-sync/24/artifact/report/config/jboss-perf21/server.log
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/54/artifact/report/config/jboss-perf21/server.log

Comment 5 Ladislav Thon 2013-08-26 09:58:40 UTC

Still seeing this with EAP 6.1.1.ER7. For example:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-undeploy-dist-sync/28/
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-sync/58/

Comment 6 Richard Janík 2013-12-10 09:50:42 UTC

Seen in 6.2.0.CR3 (GA), this is one of the more common warnings:

14:16:22,320 WARN  [org.infinispan.topology.CacheTopologyControlCommand] (OOB-13,shared=udp) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=default-host/clusterbench, type=REBALANCE_CONFIRM, sender=perf19/web, joinInfo=null, topologyId=26, currentCH=null, pendingCH=null, throwable=null, viewId=9}: org.infinispan.CacheException: Received invalid rebalance confirmation from perf19/web for cache default-host/clusterbench, we don't have a rebalance in progress
	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:207)
	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:169)
	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:146)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:253)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600)
	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1025)
	at org.jgroups.protocols.RSVP.up(RSVP.java:172)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:645)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:147)
	at org.jgroups.protocols.FD.up(FD.java:253)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2607)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1260)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1822)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1795)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]

Link to sample build:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-async/67/

Links to server logs:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-async/67/artifact/report/config/jboss-perf18/server.log

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-async/67/artifact/report/config/jboss-perf19/server.log

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-shutdown-dist-async/67/artifact/report/config/jboss-perf21/server.log

Comment 7 Radoslav Husar 2013-12-16 23:26:03 UTC

Graceful shutdown is not support in the affected version of EAP 6. The 
described behavior is to be expected.

The RFE to implement graceful shutdown is 
https://issues.jboss.org/browse/EAP6-7

Issues relating to HTTP can be avoided using mod_cluster and session 
draining prior to shutdown/undeploy.

Comment 10 Paul Ferraro 2014-05-21 13:26:10 UTC

This bug pertains to clean shutdown which is not scheduled to be implemented in EAP 6.x - and is targeted to be addressed in 7.0.  Setting resolution to WONTFIX.

Comment 12 Carlo de Wolf 2015-03-19 11:04:13 UTC

Moved to https://issues.jboss.org/browse/EAP7-86

Comment 13 JBoss JIRA Server 2015-08-11 01:39:55 UTC

Jason Greene <jason.greene> updated the status of jira EAP7-86 to Resolved

Comment 14 JBoss JIRA Server 2015-08-13 12:59:34 UTC

Radim Hatlapatka <rhatlapa> updated the status of jira EAP7-86 to Reopened

Comment 15 JBoss JIRA Server 2015-12-09 16:23:51 UTC

Jason Greene <jason.greene> updated the status of jira EAP7-86 to Resolved

Comment 16 JBoss JIRA Server 2015-12-16 13:13:03 UTC

Radim Hatlapatka <rhatlapa> updated the status of jira EAP7-86 to Reopened

Comment 17 JBoss JIRA Server 2016-02-28 16:47:39 UTC

Jason Greene <jason.greene> updated the status of jira EAP7-86 to Resolved