Bug 923769

Summary:	Infinispan tries to transfer 0 segments
Product:	[JBoss] JBoss Enterprise Application Platform 6	Reporter:	Ladislav Thon <lthon>
Component:	Clustering	Assignee:	Paul Ferraro <paul.ferraro>
Status:	CLOSED NOTABUG	QA Contact:	Jitka Kozana <jkudrnac>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.1.1	CC:	jkudrnac, myarboro, smumford
Target Milestone:	---
Target Release:	EAP 6.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-12-09 14:48:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ladislav Thon 2013-03-20 12:00:43 UTC

Found this exception during EAP 6.1.0.ER3 testing:

WARN  [org.infinispan.statetransfer.InboundTransferTask] (transport-thread-10) ISPN000210: Failed to request segments [] of cache org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl from node perf21/ejb (node will not be retried): org.infinispan.remoting.RemoteException: ISPN000217: Received exception from perf21/ejb, see cause for remote stack trace
	at org.infinispan.remoting.transport.AbstractTransport.checkResponse(AbstractTransport.java:72)
	at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:102)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:541)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:169)
	at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:195)
	at org.infinispan.statetransfer.InboundTransferTask.requestSegments(InboundTransferTask.java:141)
	at org.infinispan.statetransfer.StateConsumerImpl$1.run(StateConsumerImpl.java:707)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) [rt.jar:1.6.0_43]
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [rt.jar:1.6.0_43]
	at java.util.concurrent.FutureTask.run(FutureTask.java:138) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43]
Caused by: java.lang.IllegalArgumentException: Segments must not be null or empty
	at org.infinispan.statetransfer.OutboundTransferTask.<init>(OutboundTransferTask.java:103)
	at org.infinispan.statetransfer.StateProviderImpl.startOutboundTransfer(StateProviderImpl.java:286)
	at org.infinispan.statetransfer.StateRequestCommand.perform(StateRequestCommand.java:91)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleInternal(InboundInvocationHandlerImpl.java:101)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handleWithWaitForBlocks(InboundInvocationHandlerImpl.java:122)
	at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:86)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:247)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.RSVP.up(RSVP.java:188)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
	at org.jgroups.protocols.FD.up(FD.java:253)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
	... 3 more

Seen in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-jvmkill-dist-sync/35/console-perf19/ and indeed, the perf21 log contains the causing exception ("Segments must not be null or empty").

Note that this exception is logged immediately after this one:

03:40:18,218 WARN  [org.infinispan.topology.CacheTopologyControlCommand] (OOB-426,shared=udp) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl, type=REBALANCE_CONFIRM, sender=perf21/ejb, joinInfo=null, topologyId=32, currentCH=null, pendingCH=null, throwable=null, viewId=12}: org.infinispan.CacheException: Received invalid rebalance confirmation from perf21/ejb for cache org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl, we don't have a rebalance in progress
	at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:206)
	at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160)
	at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:253)
	at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:220)
	at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:484)
	at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:391)
	at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:249)
	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598)
	at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
	at org.jgroups.JChannel.up(JChannel.java:707)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
	at org.jgroups.protocols.RSVP.up(RSVP.java:172)
	at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245)
	at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:765)
	at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:420)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143)
	at org.jgroups.protocols.FD.up(FD.java:253)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:290)
	at org.jgroups.protocols.Discovery.up(Discovery.java:359)
	at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1263)
	at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1825)
	at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1798)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43]

This happens while the entire cluster is being brought down as the test finishes. This particular node (perf19) is still running, and the other nodes mentioned in the exceptions (perf21) are running as well. But note that the exceptions on perf19 start to appear immediately after it receives new view, because perf18 (the first node that is being shut down) is already gone.

Comment 1 Jitka Kozana 2013-05-13 14:01:13 UTC

Again, seen in ER8.

Link to job:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbservlet-undeploy-dist-sync/14/

Link to server log:
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbservlet-undeploy-dist-sync/14/artifact/report/config/jboss-perf20/server.log
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbservlet-undeploy-dist-sync/14/artifact/report/config/jboss-perf19/server.log

Comment 2 Ladislav Thon 2013-08-23 12:27:11 UTC

Still seeing this with EAP 6.1.1.ER7. For example:

https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-sync/26/
https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-undeploy-dist-async/22/

Comment 3 Paul Ferraro 2013-12-09 14:48:39 UTC

This is expected when shutting down a cluster that uses DIST, you need to either:
* Stagger node shutdown
* Ignore exceptions during shutdown

Comment 4 Scott Mumford 2014-04-23 02:14:07 UTC

Marking for exclusion from Release Notes documentation as not a bug.