Bug 1112886 - Hot Rod client fails to retrieve data after a few tries
Summary: Hot Rod client fails to retrieve data after a few tries
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Server
Version: 6.2.1
Hardware: All
OS: Linux
unspecified
high
Target Milestone: CR2
: 6.3.0
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks: 1104639
TreeView+ depends on / blocked
 
Reported: 2014-06-25 00:59 UTC by Babak Mozaffari
Modified: 2015-01-26 14:05 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-01-26 14:05:51 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Test client code that reproduces the issue (3.42 KB, application/java-archive)
2014-06-25 00:59 UTC, Babak Mozaffari
no flags Details
Test client code for follow-up issue with keySet().size() (3.36 KB, application/java-archive)
2014-06-30 18:10 UTC, Babak Mozaffari
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-4437 0 Major Resolved L1 cache is enabled by default in server 2017-11-14 06:34:26 UTC
Red Hat Issue Tracker ISPN-4438 0 Major Resolved Entry is not properly unmarshalled by HotRod client in compatibility mode when L1 enabled 2017-11-14 06:34:26 UTC
Red Hat Issue Tracker ISPN-4467 0 Major Resolved keySet operation via HotRod in compatibility mode throws ClassCastException 2017-11-14 06:34:26 UTC

Description Babak Mozaffari 2014-06-25 00:59:40 UTC
Created attachment 911902 [details]
Test client code that reproduces the issue

Description of problem:

In a distributed cache where the number of owners is less than the number of cluster members, the Hot Rod client is only able to get an object from the cache a few times and afterwards retrieves an invalid byte sequence.

Version-Release number of selected component (if applicable):

6.2.1, remote client-server JDG with Hot Rod client

How reproducible:

Consistently and easily when fetching data through Hot Rod from a distributed cache where the number of owners is less than the number of cluster members


Steps to Reproduce:
1. Set up a JDG cluster with at least two nodes. Add the following cache:

<distributed-cache name="test" mode="ASYNC" segments="20" owners="1" remote-timeout="30000" start="EAGER">
    <compatibility enabled="true"/>
    <locking isolation="READ_COMMITTED" acquire-timeout="30000" concurrency-level="1000" striping="true"/>
    <transaction mode="NONE"/>
</distributed-cache>

2. Download the attached TestCase.jar

3. Create a directory called TestCase_lib alongside the above JAR file and copy all the libraries from jboss-datagrid-6.2.1-server/client/hotrod/java to there.

4. Run the test case by pointing it to one of your JDG hosts:

java -Djdg.host=10.19.139.101 -jar TestCase.jar


Actual results:

After a few tries, instead of a proper UUID object, a byte[] is returned:

Inserting UUID object #1

Retrieval of UUID, attempt #1
Got UUID correctly!
Retrieval of UUID, attempt #2
Instead, got [[B@40334c25]


Expected results:

The UUID object that was inserted should always be returned.


Additional info:

The code creates a UUID object, puts it in the cache and gets it from the cache multiple times. By default, it gets each object 5 times and then creates a new object, up to 100 objects in total. It stops as soon as it reproduces the issue. To change the number of puts and gets, pass them as arguments to the JAR:

java -Djdg.host=10.19.139.101 -jar TestCase.jar 100 5

Comment 2 Martin Gencur 2014-06-25 08:54:51 UTC
Hello Babak, thanks for the reproducer. Do you think you could attach sources as well? Right now I'm able to get the same output as yours but I'd really like to see the code and possibly tweak it for debugging purposes. Thanks!

Comment 3 Martin Gencur 2014-06-25 09:59:03 UTC
After some debugging, it seems it has something to do with L1 cache. The returned value is wrong if the cache entry is returned from L1 cache and not from a remote node. Looking into my TRACE logs, it says that L1 is enabled but I don't have L1 enabled in my configs:-/ Looking into this...

Comment 4 Babak Mozaffari 2014-06-25 10:33:13 UTC
Hi Martin,

It makes sense for it to have to do with an L1 cache because once it breaks (starts returning wrong value for a given key), it will no longer work for a period of time but if left alone for some time, it will return the right value again after that!

However I'm surprised to hear that the problem happens when it's the local node and not the remote node. I say that because when I have only one node, or when number of owners is the same as cluster size, the problem does not seem to appear.

The source code for the test is in the same JAR file!

Comment 5 Martin Gencur 2014-06-25 11:22:48 UTC
So there are two problems:
1) L1 cache is enabled even without configuring L1, we will fix that soon
2) It turns our that keys/values are not unmarshalled when L1 is used, we hope to have the fix soon too

Babak,
if you don't need L1 cache, can you try setting l1-lifespan="0" on distributed-cache? This will disable L1 and should fix your problem too. 
If you need L1, then you'll need to wait for the fix for 2)

Comment 6 Babak Mozaffari 2014-06-25 19:21:35 UTC
Workaround confirmed!

Comment 7 Babak Mozaffari 2014-06-25 22:15:36 UTC
I may have spoken too soon. Setting this attribute helps me avoid the error in 90+% of the cases but I still get the same error when I do:

remoteCache.keySet().size()

Comment 8 Martin Gencur 2014-06-26 06:48:46 UTC
Babak,
how can you get the same error when remoteCache.keySet().size() returns just an integer? I don't understand.

Comment 9 JBoss JIRA Server 2014-06-26 07:16:02 UTC
Martin Gencur <mgencur> updated the status of jira ISPN-4438 to Resolved

Comment 10 Babak Mozaffari 2014-06-26 07:21:22 UTC
Martin,

Right, so this time the class cast exception happens in the infinispan code and not my code, when getting the keyset.

Error stack below:

Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash
INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=4) containing 3 addresses: [/10.19.139.101:11222, /10.19.139.102:11222, /10.19.139.103:11222]
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.103:11222), adding to the pool.
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool.
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.RemoteCacheManager start
INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3
Jun 26, 2014 7:19:08 AM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus
WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B
Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[8] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B
	at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143)
	at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99)
	at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19)
	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46)
	at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612)
	at com.redhat.refarch.jdg.gasshopper.Client.main(Client.java:74)


(In reply to Martin Gencur from comment #8)
> Babak,
> how can you get the same error when remoteCache.keySet().size() returns just
> an integer? I don't understand.

Comment 11 Martin Gencur 2014-06-26 08:13:23 UTC
Babak, could you please attach a reproducer? The keySet() operation takes forever on my side so wasn't able to reproduce it. Thanks

As a workaround, does it work for you to just call remoteCache.size() instead of remoteCache.keySet().size()? Or do you need the keySet?

Comment 12 Babak Mozaffari 2014-06-26 17:08:36 UTC
Martin, I will work on it when I get a chance but so far I've only had with my real application.

For a distributed cache, calling cache.size() only returns the number of entries held by the contacted JDG node. cache.keySet().size() gets all keys and therefore returns total cache size. My client code calls both to make a point of the difference between these two.

Comment 13 Babak Mozaffari 2014-06-30 18:10:48 UTC
Created attachment 913516 [details]
Test client code for follow-up issue with keySet().size()

Attached is a test case to reproduce the keySet() issue. Instructions for setup and run are the same. Source code is in the JAR file.

It easily reproduces for me on the first try using default settings:

java -Djdg.host=10.19.139.101 -jar TestCase.jar 

Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash
INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=2) containing 2 addresses: [/10.19.139.101:11222, /10.19.139.102:11222]
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool.
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.RemoteCacheManager start
INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3
Inserted 100 UUID objects
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus
WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B
Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[103] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B
	at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143)
	at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99)
	at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19)
	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46)
	at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612)
	at com.redhat.refarch.jdg.bug.Test.main(Test.java:39)

Comment 14 Martin Gencur 2014-07-01 13:59:46 UTC
I was able to reproduce the problem with keySet and found the root cause. JIRA attached.

Comment 15 Martin Gencur 2014-07-08 10:03:20 UTC
The fix is in CR2.


Note You need to log in before you can comment on or make changes to this bug.