1087244 – Infinispan issue: A startup issue in cluster mode when the in-memory state is not fetched

Bug 1087244 - Infinispan issue: A startup issue in cluster mode when the in-memory state is not fetched

Summary: Infinispan issue: A startup issue in cluster mode when the in-memory state is...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	JBoss Enterprise Application Platform 6
Classification:	JBoss
Component:	Clustering
Sub Component:
Version:	6.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	ER9
Target Release:	EAP 6.3.0
Assignee:	Paul Ferraro
QA Contact:	Jitka Kozana
Docs Contact:	Russell Dickenson
URL:
Whiteboard:
Depends On:	1087264
Blocks:
TreeView+	depends on / blocked

Reported:	2014-04-14 07:28 UTC by Boleslaw Dawidowicz
Modified:	2014-07-07 06:45 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-07-07 06:45:02 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description Boleslaw Dawidowicz 2014-04-14 07:28:56 UTC

Described by Nicolas Filotto: 

"A startup issue in cluster mode when we don't fetch the in-memory state,
to workaround it, I simply enabled fecthInMemoryState but it is not how we
want to configure the cache, it will add significant latency in case we add
a new node to an already running cluster. This is the most annoying issue
as we have to configure the cache in a non expected manner.


 I'm facing a very annoying issue with ISPN 5.2.7.Final + Synchronous Replication + a state transfert with fetchInMemoryState set to false + udp. With this particular configuration which is unfortunately the target configuration of JCR 1.16, I get some deadlocks at cache startup that cause issues of next type on the master:

09.04.2014 12:39:27,901 *ERROR* [transport-thread-1] ClusterTopologyManagerImpl: ISPN000230: Failed to start rebalance for cache foo (ClusterTopologyManagerImpl.java, line 132) 

org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)

at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)

at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterAsync(ClusterTopologyManagerImpl.java:607)

at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:405)

at org.infinispan.topology.ClusterTopologyManagerImpl.startRebalance(ClusterTopologyManagerImpl.java:395)

at org.infinispan.topology.ClusterTopologyManagerImpl.access$000(ClusterTopologyManagerImpl.java:66)

at org.infinispan.topology.ClusterTopologyManagerImpl$1.call(ClusterTopologyManagerImpl.java:129)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)

Caused by: org.jgroups.TimeoutException: TimeoutException

at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)

at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)

at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)

at org.jgroups.protocols.RSVP.down(RSVP.java:118)

at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:238)

at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)

at org.jgroups.JChannel.down(JChannel.java:722)

at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)

at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:204)

at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:43)

at org.jgroups.blocks.Request.execute(Request.java:83)

at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:370)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)

... 11 more

To reproduce, I launch 2 JVM on my local machine with -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1 that will both call the following code:

public class StartUpTest extends TestCase

{

   public void testStartUp() throws Exception

   {

      GlobalConfigurationBuilder configBuilder = new GlobalConfigurationBuilder();

      configBuilder.transport().defaultTransport().addProperty("configurationFile", "udp.xml");

      EmbeddedCacheManager manager = new DefaultCacheManager(configBuilder.build());

      ConfigurationBuilder confBuilder = new ConfigurationBuilder();

      confBuilder.clustering().cacheMode(CacheMode.REPL_SYNC).stateTransfer().fetchInMemoryState(false);

      Configuration conf = confBuilder.build();

      manager.defineConfiguration("foo", conf);

      Cache<Object, Object> cache = manager.getCache("foo");

      cache.start();

      System.out.println("Fully Started");

      synchronized (this)

      {

         wait();

      }

   }

}
The first instance will start normally, the second one will start after a pause (that is actually a timeout) and on the first instance we get the stack trace described above.
I have tested with 5.3.0.Alpha1 and it works normally. I have also tested with ISPN 5.2.8.Final and it fails."

Comment 1 Boleslaw Dawidowicz 2014-04-14 07:31:48 UTC

Needed for JBoss Portal 6.2

Comment 2 Dan Berindei 2014-04-17 15:52:50 UTC

There is another workaround: setting RSVP.ack_on_delivery=false in the JGroups configuration (or even removing RSVP from the stack completely).

Comment 3 Mircea Markus 2014-04-23 15:49:57 UTC

Nicolas Filotto ack that the setting Dan suggested works for him, so this should be rejected.

Note You need to log in before you can comment on or make changes to this bug.