Bug 1087244 - Infinispan issue: A startup issue in cluster mode when the in-memory state is not fetched
Summary: Infinispan issue: A startup issue in cluster mode when the in-memory state is...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: Clustering
Version: 6.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ER9
: EAP 6.3.0
Assignee: Paul Ferraro
QA Contact: Jitka Kozana
Russell Dickenson
URL:
Whiteboard:
Depends On: 1087264
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-14 07:28 UTC by Boleslaw Dawidowicz
Modified: 2014-07-07 06:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-07-07 06:45:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Boleslaw Dawidowicz 2014-04-14 07:28:56 UTC
Described by Nicolas Filotto: 

"A startup issue in cluster mode when we don't fetch the in-memory state,
to workaround it, I simply enabled fecthInMemoryState but it is not how we
want to configure the cache, it will add significant latency in case we add
a new node to an already running cluster. This is the most annoying issue
as we have to configure the cache in a non expected manner.


 I'm facing a very annoying issue with ISPN 5.2.7.Final + Synchronous Replication + a state transfert with fetchInMemoryState set to false + udp. With this particular configuration which is unfortunately the target configuration of JCR 1.16, I get some deadlocks at cache startup that cause issues of next type on the master:

09.04.2014 12:39:27,901 *ERROR* [transport-thread-1] ClusterTopologyManagerImpl: ISPN000230: Failed to start rebalance for cache foo (ClusterTopologyManagerImpl.java, line 132) 

org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException

at org.infinispan.util.Util.rewrapAsCacheException(Util.java:542)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)

at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)

at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterAsync(ClusterTopologyManagerImpl.java:607)

at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:405)

at org.infinispan.topology.ClusterTopologyManagerImpl.startRebalance(ClusterTopologyManagerImpl.java:395)

at org.infinispan.topology.ClusterTopologyManagerImpl.access$000(ClusterTopologyManagerImpl.java:66)

at org.infinispan.topology.ClusterTopologyManagerImpl$1.call(ClusterTopologyManagerImpl.java:129)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)

Caused by: org.jgroups.TimeoutException: TimeoutException

at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145)

at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40)

at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93)

at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287)

at org.jgroups.protocols.RSVP.down(RSVP.java:118)

at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:238)

at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)

at org.jgroups.JChannel.down(JChannel.java:722)

at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616)

at org.jgroups.blocks.RequestCorrelator.sendUnicastRequest(RequestCorrelator.java:204)

at org.jgroups.blocks.UnicastRequest.sendRequest(UnicastRequest.java:43)

at org.jgroups.blocks.Request.execute(Request.java:83)

at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:370)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)

at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)

... 11 more

To reproduce, I launch 2 JVM on my local machine with -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1 that will both call the following code:

public class StartUpTest extends TestCase

{

   public void testStartUp() throws Exception

   {

      GlobalConfigurationBuilder configBuilder = new GlobalConfigurationBuilder();

      configBuilder.transport().defaultTransport().addProperty("configurationFile", "udp.xml");

      EmbeddedCacheManager manager = new DefaultCacheManager(configBuilder.build());

      ConfigurationBuilder confBuilder = new ConfigurationBuilder();

      confBuilder.clustering().cacheMode(CacheMode.REPL_SYNC).stateTransfer().fetchInMemoryState(false);

      Configuration conf = confBuilder.build();

      manager.defineConfiguration("foo", conf);

      Cache<Object, Object> cache = manager.getCache("foo");

      cache.start();

      System.out.println("Fully Started");

      synchronized (this)

      {

         wait();

      }

   }

}
The first instance will start normally, the second one will start after a pause (that is actually a timeout) and on the first instance we get the stack trace described above.
I have tested with 5.3.0.Alpha1 and it works normally. I have also tested with ISPN 5.2.8.Final and it fails."

Comment 1 Boleslaw Dawidowicz 2014-04-14 07:31:48 UTC
Needed for JBoss Portal 6.2

Comment 2 Dan Berindei 2014-04-17 15:52:50 UTC
There is another workaround: setting RSVP.ack_on_delivery=false in the JGroups configuration (or even removing RSVP from the stack completely).

Comment 3 Mircea Markus 2014-04-23 15:49:57 UTC
Nicolas Filotto ack that the setting Dan suggested works for him, so this should be rejected.


Note You need to log in before you can comment on or make changes to this bug.