Bug 1115229 - Loading cache with LevelDB store enabled causes concurrency errors
Summary: Loading cache with LevelDB store enabled causes concurrency errors
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.0.0
Hardware: All
OS: All
unspecified
high
Target Milestone: CR3
: 6.3.0
Assignee: Pedro Ruivo
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-02 01:38 UTC by Babak Mozaffari
Modified: 2015-01-26 14:04 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: LevelDB expire queue is getting full, and the queue-size parameter was not parsed. Consequence: Write operations are blocking when the queue is full. Fix: The parser is fixed and now it is possible to configure the queue-size. Result:
Clone Of:
Environment:
Last Closed: 2015-01-26 14:04:46 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Assets to reproduce leveldb concurrency issue (18.28 KB, application/octet-stream)
2014-07-03 21:58 UTC, Babak Mozaffari
no flags Details

Description Babak Mozaffari 2014-07-02 01:38:31 UTC
Description of problem:

With LevelDB configured as a cache store for a distributed cache and passivation set to false, loading the cache with large amounts of data causes a lock timeout for the Hot Rod client.


Version-Release number of selected component (if applicable):

6.2.1

How reproducible:

Consistent

Steps to Reproduce:
1. Configure distributed cache with leveldb store as follows:

<distributed-cache name="prices" mode="ASYNC" segments="20" owners="2" remote-timeout="30000" start="EAGER" l1-lifespan="0">
    <compatibility enabled="true"/>
    <locking isolation="READ_COMMITTED" acquire-timeout="30000"
				concurrency-level="1000" striping="true"/>
    <transaction mode="NONE"/>

    <leveldb-store passivation="false" />
</distributed-cache>

2. The environment typically consists of a cluster of three nodes but it suffices to start one node to reproduce the issue

3. Create a large map of entries (e.g., 179180 entries) and use Hot Rod to store all the data in the cache through a putAll call.

Actual results:

01:23:29,807 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (HotRodServerWorker-67) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [30 seconds] on key [ad6ff7e2-1e26-4962-a92a-210769a043f4_1404238912585] for requestor [Thread[HotRodServerWorker-67,5,main]]! Lock held by [Thread[HotRodServerWorker-65,5,main]]
	at org.infinispan.util.concurrent.locks.LockManagerImpl.lock(LockManagerImpl.java:198) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.util.concurrent.locks.LockManagerImpl.acquireLockNoCheck(LockManagerImpl.java:181) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockKey(AbstractLockingInterceptor.java:149) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockKey(AbstractLockingInterceptor.java:145) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitPutKeyValueCommand(AbstractLockingInterceptor.java:69) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.statetransfer.StateTransferInterceptor.handleNonTxWriteCommand(StateTransferInterceptor.java:219) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.statetransfer.StateTransferInterceptor.visitPutKeyValueCommand(StateTransferInterceptor.java:141) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.CacheMgmtInterceptor.updateStoreStatistics(CacheMgmtInterceptor.java:148) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.CacheMgmtInterceptor.visitPutKeyValueCommand(CacheMgmtInterceptor.java:134) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.compat.BaseTypeConverterInterceptor.visitPutKeyValueCommand(BaseTypeConverterInterceptor.java:63) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:110) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:73) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.AbstractVisitor.visitPutKeyValueCommand(AbstractVisitor.java:32) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.commands.write.PutKeyValueCommand.acceptVisitor(PutKeyValueCommand.java:71) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:333) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1404) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.CacheImpl.putInternal(CacheImpl.java:882) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.CacheImpl.put(CacheImpl.java:874) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.DecoratedCache.put(DecoratedCache.java:472) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.AbstractDelegatingAdvancedCache.put(AbstractDelegatingAdvancedCache.java:182) [infinispan-core-6.0.3.Final-redhat-3.jar:6.0.3.Final-redhat-3]
	at org.infinispan.server.core.AbstractProtocolDecoder.put(AbstractProtocolDecoder.scala:183) [infinispan.jar:6.0.3.Final-redhat-3]
	at org.infinispan.server.core.AbstractProtocolDecoder.decodeValue(AbstractProtocolDecoder.scala:140) [infinispan.jar:6.0.3.Final-redhat-3]
	at org.infinispan.server.core.AbstractProtocolDecoder.decode(AbstractProtocolDecoder.scala:55) [infinispan.jar:6.0.3.Final-redhat-3]
	at org.infinispan.server.core.AbstractProtocolDecoder.decode(AbstractProtocolDecoder.scala:29) [infinispan.jar:6.0.3.Final-redhat-3]
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.infinispan.server.core.AbstractProtocolDecoder.messageReceived(AbstractProtocolDecoder.scala:377) [infinispan.jar:6.0.3.Final-redhat-3]
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.6.Final-redhat-1.jar:3.6.6.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_40]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_40]
	at java.lang.Thread.run(Thread.java:724) [rt.jar:1.7.0_40]


Expected results:


Additional info:

The default setup per documentation is JNI, which I assume is the only supported mode but our server schemas also support the JAVA implementation while our documentation does not describe it and the code for it is missing. Iā€™m not sure if the default cache writing behavior is synchronous and whether changing it to asynchronous (write-behind) would resolve the concurrency issue cited above, but our documentation on LevelDB lacks any information on the subject and I have not been able to configure it with write-behind the way that other cache stores would be configured (and our schema implies).

Comment 2 Mircea Markus 2014-07-02 14:18:53 UTC
can you please try without lock striping set to "true"? I.e. Set it to false. If it still fails, then we should run a further analysis.

Comment 3 Babak Mozaffari 2014-07-02 17:27:18 UTC
Hi Mircea, it still fails and the stack looks the same.

Comment 4 Pedro Ruivo 2014-07-03 18:16:45 UTC
can you attach test code? (please) 

So far, I'm unable to reproduce it :/

Comment 5 Tristan Tarrant 2014-07-03 20:28:35 UTC
Also, is the choice of cache store actually relevant ? Does this happen with the singlefilestore ? Without a store altogether ?

Comment 6 Babak Mozaffari 2014-07-03 21:45:20 UTC
Tristan:

No, I tried a simple file-store setup instead of leveldb-store and stored my sample data successfully 3 times in a row while the leveldb-store fails every time:

<file-store name="myfilestore" passivation="false"/>

(In reply to Tristan Tarrant from comment #5)
> Also, is the choice of cache store actually relevant ? Does this happen with
> the singlefilestore ? Without a store altogether ?

Comment 7 Babak Mozaffari 2014-07-03 21:58:56 UTC
Created attachment 914591 [details]
Assets to reproduce leveldb concurrency issue

The tar file includes a required module to set up on JDG and the modified infinispan module to declare a dependency on it. It also includes the updated clustered.xml. Once the modules and configuration file have been updated, start a single node (to make it easy) in clustered mode.

Also included in the tar.gz is a JAR file to load the cache and reproduce the error. Alongside GasShopperClient.jar, create a directory called GasShopperClient_lib and copy the following into it:
* jboss-datagrid-6.2.1-server/client/hotrod/java/* * jdg/modules/system/layers/base/com/redhat/refarch/jdg/gasshopper/main/gasshopper.jar (from the tar file)
* opencsv-2.3.jar from http://sourceforge.net/projects/opencsv/files/opencsv/2.3/

Also download http://www.essostations.com/poi/garmin_loader.zip and place the CSV files alongside GasShopperClient.jar

Now use the jar file to populate the cache with sample data, pointing to your JDG server IP address and the CSV files you've downloaded:

java -Djdg.host=10.19.139.101 -jar GasShopperClient.jar load ./esso.csv,./exxon_usa.csv,./mobil_usa.csv

Comment 8 Babak Mozaffari 2014-07-03 22:00:32 UTC
Pedro,

I tried very hard to reproduce the issue using a simple test case with OOTB Java types but could not. However it reproduces consistently in my reference architecture environment. I tried my best to attach all the assets and instructions to help you recreate the environment but let me know if you find anything missing.



(In reply to Pedro Ruivo from comment #4)
> can you attach test code? (please) 
> 
> So far, I'm unable to reproduce it :/

Comment 9 Pedro Ruivo 2014-07-04 13:33:49 UTC
Thanks Babak. 
I'm able to reproduce the problem. I'll start investigating it .

Comment 10 Pedro Ruivo 2014-07-04 14:56:08 UTC
Hi all,

I've found the problem.

In LevelDBStore, a blocking queue is getting full (https://github.com/infinispan/infinispan/blob/master/persistence/leveldb/src/main/java/org/infinispan/persistence/leveldb/LevelDBStore.java#L467) and it is only empty when the eviction thread is triggered. 

I don't have a solution for it now, but you can try to increase the list max size (default value is 10000). Unfortunately, I don't see any way to configure in the server parser :/

I think the only workaround for now is to kick the eviction thread more frequently (default is 1 minute). You can do this in <expiration interval=<time in milliseconds> /> (I think)

Comment 11 Divya Mehra 2014-07-06 23:21:04 UTC
Pedro, is this issue only reproducible with LevelDB cache store, or can it potentially be reproduced with another cache store ?

Comment 12 Pedro Ruivo 2014-07-07 09:27:21 UTC
I don't think so Divya (Babak already tried with file-store and it works finie there). It is a LevelDB implementation issue.

Comment 13 Babak Mozaffari 2014-07-07 22:13:40 UTC
Pedro, is that not the expiration queue? I tried this but it didn't help:

<expiration queue-size="1000000" />

I don't believe the leveldb-store has an expiration interval attribute to let me change the eviction frequency.

Comment 14 Pedro Ruivo 2014-07-08 13:05:51 UTC
No, you don't have access to the LevelDB queue. And that is one problem...

The expiration interval that I was talking about is not in LevelDB. example:

<distributed-cache name="prices" mode="ASYNC" segments="20" owners="2" remote-timeout="30000" start="EAGER" l1-lifespan="0">
    <compatibility enabled="true"/>
    <locking isolation="READ_COMMITTED" acquire-timeout="30000" concurrency-level="1000" striping="true"/>
    <transaction mode="NONE"/>
    <leveldb-store passivation="false" />

    <expiration interval=<time in milliseconds> />

</distributed-cache>

Comment 15 Babak Mozaffari 2014-07-08 16:51:19 UTC
Thanks, Pedro, I was expecting expiration to be set as a child of the leveldb-store element but now I see that it's under the cache.

I tried setting it to 1 second and when that didn't work, to 100ms. I still get the same error.

Comment 16 Pedro Ruivo 2014-07-09 15:19:02 UTC
Babak, it really needs a fix. I'm working on it (and I will fix the parser issue)

Comment 17 Pedro Ruivo 2014-07-09 16:14:11 UTC
just to be clear the bug is in the parser that is unable to parse the queue-size correctly (I'm trying to find the cause).

Comment 18 Pedro Ruivo 2014-07-09 16:44:04 UTC
Pull Request done.

Now it is possible to properly set the queue size.

Comment 19 Pedro Ruivo 2014-07-10 10:34:08 UTC
(In reply to Babak Mozaffari from comment #15)
> Thanks, Pedro, I was expecting expiration to be set as a child of the
> leveldb-store element but now I see that it's under the cache.
> 
> I tried setting it to 1 second and when that didn't work, to 100ms. I still
> get the same error.

Hi Babak,

I found why setting the interval didn't work. it is a bug in 6.2.

it is fixed in 6.3 and you should be able to configure the queue-size + expiration interval. 

Cheers.


Note You need to log in before you can comment on or make changes to this bug.