Created attachment 911902 [details] Test client code that reproduces the issue Description of problem: In a distributed cache where the number of owners is less than the number of cluster members, the Hot Rod client is only able to get an object from the cache a few times and afterwards retrieves an invalid byte sequence. Version-Release number of selected component (if applicable): 6.2.1, remote client-server JDG with Hot Rod client How reproducible: Consistently and easily when fetching data through Hot Rod from a distributed cache where the number of owners is less than the number of cluster members Steps to Reproduce: 1. Set up a JDG cluster with at least two nodes. Add the following cache: <distributed-cache name="test" mode="ASYNC" segments="20" owners="1" remote-timeout="30000" start="EAGER"> <compatibility enabled="true"/> <locking isolation="READ_COMMITTED" acquire-timeout="30000" concurrency-level="1000" striping="true"/> <transaction mode="NONE"/> </distributed-cache> 2. Download the attached TestCase.jar 3. Create a directory called TestCase_lib alongside the above JAR file and copy all the libraries from jboss-datagrid-6.2.1-server/client/hotrod/java to there. 4. Run the test case by pointing it to one of your JDG hosts: java -Djdg.host=10.19.139.101 -jar TestCase.jar Actual results: After a few tries, instead of a proper UUID object, a byte[] is returned: Inserting UUID object #1 Retrieval of UUID, attempt #1 Got UUID correctly! Retrieval of UUID, attempt #2 Instead, got [[B@40334c25] Expected results: The UUID object that was inserted should always be returned. Additional info: The code creates a UUID object, puts it in the cache and gets it from the cache multiple times. By default, it gets each object 5 times and then creates a new object, up to 100 objects in total. It stops as soon as it reproduces the issue. To change the number of puts and gets, pass them as arguments to the JAR: java -Djdg.host=10.19.139.101 -jar TestCase.jar 100 5
Hello Babak, thanks for the reproducer. Do you think you could attach sources as well? Right now I'm able to get the same output as yours but I'd really like to see the code and possibly tweak it for debugging purposes. Thanks!
After some debugging, it seems it has something to do with L1 cache. The returned value is wrong if the cache entry is returned from L1 cache and not from a remote node. Looking into my TRACE logs, it says that L1 is enabled but I don't have L1 enabled in my configs:-/ Looking into this...
Hi Martin, It makes sense for it to have to do with an L1 cache because once it breaks (starts returning wrong value for a given key), it will no longer work for a period of time but if left alone for some time, it will return the right value again after that! However I'm surprised to hear that the problem happens when it's the local node and not the remote node. I say that because when I have only one node, or when number of owners is the same as cluster size, the problem does not seem to appear. The source code for the test is in the same JAR file!
So there are two problems: 1) L1 cache is enabled even without configuring L1, we will fix that soon 2) It turns our that keys/values are not unmarshalled when L1 is used, we hope to have the fix soon too Babak, if you don't need L1 cache, can you try setting l1-lifespan="0" on distributed-cache? This will disable L1 and should fix your problem too. If you need L1, then you'll need to wait for the fix for 2)
Workaround confirmed!
I may have spoken too soon. Setting this attribute helps me avoid the error in 90+% of the cases but I still get the same error when I do: remoteCache.keySet().size()
Babak, how can you get the same error when remoteCache.keySet().size() returns just an integer? I don't understand.
Martin Gencur <mgencur> updated the status of jira ISPN-4438 to Resolved
Martin, Right, so this time the class cast exception happens in the infinispan code and not my code, when getting the keyset. Error stack below: Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=4) containing 3 addresses: [/10.19.139.101:11222, /10.19.139.102:11222, /10.19.139.103:11222] Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers INFO: ISPN004014: New server added(/10.19.139.103:11222), adding to the pool. Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool. Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.RemoteCacheManager start INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3 Jun 26, 2014 7:19:08 AM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[8] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143) at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99) at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56) at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37) at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19) at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46) at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612) at com.redhat.refarch.jdg.gasshopper.Client.main(Client.java:74) (In reply to Martin Gencur from comment #8) > Babak, > how can you get the same error when remoteCache.keySet().size() returns just > an integer? I don't understand.
Babak, could you please attach a reproducer? The keySet() operation takes forever on my side so wasn't able to reproduce it. Thanks As a workaround, does it work for you to just call remoteCache.size() instead of remoteCache.keySet().size()? Or do you need the keySet?
Martin, I will work on it when I get a chance but so far I've only had with my real application. For a distributed cache, calling cache.size() only returns the number of entries held by the contacted JDG node. cache.keySet().size() gets all keys and therefore returns total cache size. My client code calls both to make a point of the difference between these two.
Created attachment 913516 [details] Test client code for follow-up issue with keySet().size() Attached is a test case to reproduce the keySet() issue. Instructions for setup and run are the same. Source code is in the JAR file. It easily reproduces for me on the first try using default settings: java -Djdg.host=10.19.139.101 -jar TestCase.jar Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=2) containing 2 addresses: [/10.19.139.101:11222, /10.19.139.102:11222] Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool. Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.RemoteCacheManager start INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3 Inserted 100 UUID objects Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[103] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143) at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99) at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56) at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37) at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19) at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46) at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612) at com.redhat.refarch.jdg.bug.Test.main(Test.java:39)
I was able to reproduce the problem with keySet and found the root cause. JIRA attached.
The fix is in CR2.