1208429 – Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and causing deadlock

Bug 1208429 - Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and causing deadlock

Summary: Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and cau...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	JBoss Data Grid 6
Classification:	JBoss
Component:	Infinispan
Sub Component:
Version:	6.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	ER4
Target Release:	6.5.0
Assignee:	Dan Berindei
QA Contact:	Martin Gencur
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-04-02 08:57 UTC by Osamu Nagano
Modified:	2019-05-20 11:41 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-06-23 12:24:14 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)
p2.coord-deadlock.waitForView.zip (194.21 KB, application/zip) 2015-04-02 08:58 UTC, Osamu Nagano	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	ISPN-5420	0	Critical	Resolved	Thread pools are depleted by ClusterTopologyManagerImpl.waitForView() and causing deadlock	2019-07-30 08:44:06 UTC

Description Osamu Nagano 2015-04-02 08:57:40 UTC

Description of problem:
All threads in pools of "remote-thread-NN" and "OOB-NN" of the coordinator are depleted by org.infinispan.topology.ClusterTopologyManagerImpl.waitForView().  This method is waiting to receive new view but such view update message is queued and never been processed, causing deadlock.  Here are 2 example stacks from each pool.

~~~
"remote-thread-172" daemon prio=10 tid=0x00007f0cc48c0000 nid=0x28ca4 in Object.wait() [0x00007f0c5f25b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.infinispan.topology.ClusterTopologyManagerImpl.waitForView(ClusterTopologyManagerImpl.java:357)
        - locked <0x00000000ff3bd900> (a java.lang.Object)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
        at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:162)
        at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:144)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:276)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

"OOB-210,shared=udp" prio=10 tid=0x00007f0c90154800 nid=0x28ca3 in Object.wait() [0x00007f0c5f35a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.infinispan.topology.ClusterTopologyManagerImpl.waitForView(ClusterTopologyManagerImpl.java:357)
        - locked <0x00000000ff3bd900> (a java.lang.Object)
        at org.infinispan.topology.ClusterTopologyManagerImpl.handleJoin(ClusterTopologyManagerImpl.java:123)
        at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:162)
        at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:144)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$4.run(CommandAwareRpcDispatcher.java:276)
        at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2025)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
        at org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl.execute(BlockingTaskAwareExecutorServiceImpl.java:134)
        at org.infinispan.executors.LazyInitializingBlockingTaskAwareExecutorService.execute(LazyInitializingBlockingTaskAwareExecutorService.java:128)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:268)
        at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:209)
        at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:460)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:377)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:250)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:677)
        at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:130)
        at org.jgroups.JChannel.up(JChannel.java:733)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1029)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
        at org.jgroups.protocols.FlowControl.up(FlowControl.java:383)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:1042)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:234)
        at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1064)
        at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:779)
        at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:426)
        at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:652)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
        at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:297)
        at org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
        at org.jgroups.protocols.Discovery.up(Discovery.java:291)
        at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2837)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1572)
        at org.jgroups.protocols.TP$SingleMessageHandler.run(TP.java:1823)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
~~~


Version-Release number of selected component (if applicable):
JDG 6.4.0 (infinispan-core-6.2.0.Final-redhat-4.jar)
JDG 6.4.1 is probably the same because implementation of ClusterTopologyManagerImpl.waitForView() is not changed.


How reproducible:
Once in the customer's environment when upgrading to JDG 6.4.0.  It is possible to reproduce locally with the following setting but the reproducibility is not so high (about once per 20 trials or less with 5 node cluster and 400 caches).


Attachment:
p2.coord-deadlock.waitForView.zip
    ./config/clustered.conf
    ./config/clustered.xml
    ./server_log/coord.server.log
    ./server_log/node1.server.log
    ./server_log/node2.server.log
    ./server_log/node3.server.log
    ./server_log/node4.server.log
    ./stack/coord.log
    ./stack/node1.log
    ./stack/node2.log
    ./stack/node3.log
    ./stack/node4.log


Steps to Reproduce:
Start one node first to be the coordinator, then start the rest of all nodes at once.  Its configuration is as follows.
- System property, -Dorg.jboss.server.bootstrap.maxThreads=500
- 400 caches defined
- Tweak JGroups thread pools as follows.
~~~
    <transport type="UDP" socket-binding="jgroups-udp">
        <property name="internal_thread_pool.enabled">true</property>
        <property name="internal_thread_pool.min_threads">20</property>
        <property name="internal_thread_pool.max_threads">20</property>
        <property name="internal_thread_pool.queue_enabled">true</property>
        <property name="internal_thread_pool.queue_max_size">1000000</property>
        <property name="internal_thread_pool.rejection_policy">Abort</property>
        <property name="thread_pool.enabled">true</property>
        <property name="thread_pool.min_threads">20</property>
        <property name="thread_pool.max_threads">20</property>
        <property name="thread_pool.queue_enabled">true</property>
        <property name="thread_pool.queue_max_size">1000000</property>
        <property name="thread_pool.rejection_policy">Abort</property>
        <property name="oob_thread_pool.enabled">true</property>
        <property name="oob_thread_pool.min_threads">210</property>
        <property name="oob_thread_pool.max_threads">210</property>
        <property name="oob_thread_pool.queue_enabled">true</property>
        <property name="oob_thread_pool.queue_max_size">1000000</property>
        <property name="oob_thread_pool.rejection_policy">discard</property>
    </transport>
~~~
- Tweak infinispan-transport thread pool as follows.
~~~
    <subsystem xmlns="urn:jboss:domain:threads:1.1">
        <thread-factory name="infinispan-factory" group-name="infinispan" priority="5" thread-name-pattern="%G %f-%t"/>
        <blocking-bounded-queue-thread-pool name="infinispan-transport" allow-core-timeout="true">
           <core-threads count="1025"/>
           <max-threads count="1025"/>
           <queue-length count="1000000"/>
           <keepalive-time time="0" unit="milliseconds"/>
           <thread-factory name="infinispan-factory"/>
        </blocking-bounded-queue-thread-pool>
    </subsystem>
~~~


Actual results:
Coordinator's remote-thread and OOB thread pools are all filled by ClusterTopologyManagerImpl.waitForView().


Expected results:
No deadlock.


Additional info:
System property "org.jboss.server.bootstrap.maxThreads" defines the number of MSC threads.  Its default is 2 * numOfCpus.  Each cache send a JOIN request in this thread at boot time.  So a cluster using many core CPUs and having number of caches can hit this issue.

Comment 1 Osamu Nagano 2015-04-02 08:58:28 UTC

Created attachment 1010069 [details]
p2.coord-deadlock.waitForView.zip

Comment 5 Dan Berindei 2015-05-07 10:11:12 UTC

Copied from JIRA:

The fix for bug 1217380 (ISPN-5106) already fixed the waitForView() problem partially, but it wasn't enough. When the coordinator installed two views in quick succession, the thread updating the cache members list for the first view would block waiting for the CacheTopologyControlCommand(POLICY_GET_STATUS) response from the other members. Then, because the other members got the newer view before sending the join requests, all the remote-executor and OOB threads would block in waitForView(), and there would be no way to receive the POLICY_GET_STATUS responses (since processing a response also needs an OOB thread). The solution was to update the cache members asynchronously.

Testing with a limited amount of OOB/remote-executor threads also exposed some other deadlocks, and the pull request tries to plug as many of them as possible. However, because the caches will not start in the same order on every node, there is always the possibility of 2 nodes sending state transfer requests to each other (for different caches) and not being able to process the responses (because the OOB threads are all blocked, actually waiting for those responses). So a deadlock is still possible if remote-executor.max-threads + OOB.max_threads < number of caches.

The remote executor configuration is ignored in JDG server (bug 1219417), the pull request includes a fix for that as well.

The attached configuration didn't specify anything for the remote executor, I would suggest defining a blocking-queueless-thread-pool:

    <transport ... remote-command-executor="infinispan-remote"/>
    ...
    <blocking-queueless-thread-pool name="infinispan-remote">
       <max-threads count="400"/>
       <keepalive-time time="0" unit="milliseconds"/>
       <thread-factory name="infinispan-factory"/>
    </blocking-queueless-thread-pool>

Comment 7 Alan Field 2015-06-02 20:08:39 UTC

Ran several tests using the following scenario and JDG 6.5.0 CR1:

1) Start 4 servers hosting 1000 caches
2) Repeat the following 20 times:
   a) Start writing to the caches using 3 clients
   b) Kill all 4 servers
   c) Restart all 4 servers and wait until there are no topology changes or cache rehashes

According to Dan, this scenario should be sufficient to verify this issue

Note You need to log in before you can comment on or make changes to this bug.