Bug 923769

Summary: Infinispan tries to transfer 0 segments
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Ladislav Thon <lthon>
Component: ClusteringAssignee: Paul Ferraro <paul.ferraro>
Status: CLOSED NOTABUG QA Contact: Jitka Kozana <jkudrnac>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.1CC: jkudrnac, myarboro, smumford
Target Milestone: ---   
Target Release: EAP 6.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-09 14:48:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ladislav Thon 2013-03-20 12:00:43 UTC
Found this exception during EAP 6.1.0.ER3 testing:

WARN  [org.infinispan.statetransfer.InboundTransferTask] (transport-thread-10) ISPN000210: Failed to request segments [] of cache org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl from node perf21/ejb (node will not be retried): org.infinispan.remoting.RemoteException: ISPN000217: Received exception from perf21/ejb, see cause for remote stack trace
	at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:72)
	at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:102)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:541)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:169)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:195)
	at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:141)
	at org.infinispan.statetransfer.StateConsumerImpl$1.run(StateConsumerImpl.java:707)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [rt.jar:1.6.0_43]
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [rt.jar:1.6.0_43]
	at java.util.concurrent.FutureTask.run(FutureTask.java:138) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43]
Caused by: java.lang.IllegalArgumentException: Segments must not be null or empty
	at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:103)
	at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:286)
	at org.infinispan.statetransfer.StateRequestCommand.perform(StateRequestCommand.java:91)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:101)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:122)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:86)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.RSVP.up(RSVP.java:188)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
	at org.jgroups.protocols.FD.up(FD.java:253)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
	... 3 more

Seen in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-jvmkill-dist-sync/35/console-perf19/ and indeed, the perf21 log contains the causing exception ("Segments must not be null or empty").

Note that this exception is logged immediately after this one:

03:40:18,218 WARN  [org.infinispan.topology.CacheTopologyControlCommand] (OOB-426,shared=udp) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl, type=REBALANCE_CONFIRM, sender=perf21/ejb, joinInfo=null, topologyId=32, currentCH=null, pendingCH=null, throwable=null, viewId=12}: org.infinispan.CacheException: Received invalid rebalance confirmation from perf21/ejb for cache org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl, we don't have a rebalance in progress
	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:206)
	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:253)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.RSVP.up(RSVP.java:172)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
	at org.jgroups.protocols.FD.up(FD.java:253)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43]

This happens while the entire cluster is being brought down as the test finishes. This particular node (perf19) is still running, and the other nodes mentioned in the exceptions (perf21) are running as well. But note that the exceptions on perf19 start to appear immediately after it receives new view, because perf18 (the first node that is being shut down) is already gone.

Comment 3 Paul Ferraro 2013-12-09 14:48:39 UTC
This is expected when shutting down a cluster that uses DIST, you need to either:
* Stagger node shutdown
* Ignore exceptions during shutdown

Comment 4 Scott Mumford 2014-04-23 02:14:07 UTC
Marking for exclusion from Release Notes documentation as not a bug.