Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1112886

Summary:

Hot Rod client fails to retrieve data after a few tries

Product:

[JBoss] JBoss Data Grid 6

Reporter:

Babak Mozaffari <bmozaffa>

Component:

Server

Assignee:

Tristan Tarrant <ttarrant>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Martin Gencur <mgencur>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

6.2.1

CC:

dstahl, jdg-bugs

Target Milestone:

CR2

Target Release:

6.3.0

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-01-26 14:05:51 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1104639

Attachments:

Description	Flags
Test client code that reproduces the issue	none
Test client code for follow-up issue with keySet().size()	none

Description Babak Mozaffari 2014-06-25 00:59:40 UTC

Created attachment 911902 [details]
Test client code that reproduces the issue

Description of problem:

In a distributed cache where the number of owners is less than the number of cluster members, the Hot Rod client is only able to get an object from the cache a few times and afterwards retrieves an invalid byte sequence.

Version-Release number of selected component (if applicable):

6.2.1, remote client-server JDG with Hot Rod client

How reproducible:

Consistently and easily when fetching data through Hot Rod from a distributed cache where the number of owners is less than the number of cluster members


Steps to Reproduce:
1. Set up a JDG cluster with at least two nodes. Add the following cache:

<distributed-cache name="test" mode="ASYNC" segments="20" owners="1" remote-timeout="30000" start="EAGER">
    <compatibility enabled="true"/>
    <locking isolation="READ_COMMITTED" acquire-timeout="30000" concurrency-level="1000" striping="true"/>
    <transaction mode="NONE"/>
</distributed-cache>

2. Download the attached TestCase.jar

3. Create a directory called TestCase_lib alongside the above JAR file and copy all the libraries from jboss-datagrid-6.2.1-server/client/hotrod/java to there.

4. Run the test case by pointing it to one of your JDG hosts:

java -Djdg.host=10.19.139.101 -jar TestCase.jar


Actual results:

After a few tries, instead of a proper UUID object, a byte[] is returned:

Inserting UUID object #1

Retrieval of UUID, attempt #1
Got UUID correctly!
Retrieval of UUID, attempt #2
Instead, got [[B@40334c25]


Expected results:

The UUID object that was inserted should always be returned.


Additional info:

The code creates a UUID object, puts it in the cache and gets it from the cache multiple times. By default, it gets each object 5 times and then creates a new object, up to 100 objects in total. It stops as soon as it reproduces the issue. To change the number of puts and gets, pass them as arguments to the JAR:

java -Djdg.host=10.19.139.101 -jar TestCase.jar 100 5

Comment 2 Martin Gencur 2014-06-25 08:54:51 UTC

Hello Babak, thanks for the reproducer. Do you think you could attach sources as well? Right now I'm able to get the same output as yours but I'd really like to see the code and possibly tweak it for debugging purposes. Thanks!

Comment 3 Martin Gencur 2014-06-25 09:59:03 UTC

After some debugging, it seems it has something to do with L1 cache. The returned value is wrong if the cache entry is returned from L1 cache and not from a remote node. Looking into my TRACE logs, it says that L1 is enabled but I don't have L1 enabled in my configs:-/ Looking into this...

Comment 4 Babak Mozaffari 2014-06-25 10:33:13 UTC

Hi Martin,

It makes sense for it to have to do with an L1 cache because once it breaks (starts returning wrong value for a given key), it will no longer work for a period of time but if left alone for some time, it will return the right value again after that!

However I'm surprised to hear that the problem happens when it's the local node and not the remote node. I say that because when I have only one node, or when number of owners is the same as cluster size, the problem does not seem to appear.

The source code for the test is in the same JAR file!

Comment 5 Martin Gencur 2014-06-25 11:22:48 UTC

So there are two problems:
1) L1 cache is enabled even without configuring L1, we will fix that soon
2) It turns our that keys/values are not unmarshalled when L1 is used, we hope to have the fix soon too

Babak,
if you don't need L1 cache, can you try setting l1-lifespan="0" on distributed-cache? This will disable L1 and should fix your problem too. 
If you need L1, then you'll need to wait for the fix for 2)

Comment 6 Babak Mozaffari 2014-06-25 19:21:35 UTC

Workaround confirmed!

Comment 7 Babak Mozaffari 2014-06-25 22:15:36 UTC

I may have spoken too soon. Setting this attribute helps me avoid the error in 90+% of the cases but I still get the same error when I do:

remoteCache.keySet().size()

Comment 8 Martin Gencur 2014-06-26 06:48:46 UTC

Babak,
how can you get the same error when remoteCache.keySet().size() returns just an integer? I don't understand.

Comment 9 JBoss JIRA Server 2014-06-26 07:16:02 UTC

Martin Gencur <mgencur> updated the status of jira ISPN-4438 to Resolved

Comment 10 Babak Mozaffari 2014-06-26 07:21:22 UTC

Martin,

Right, so this time the class cast exception happens in the infinispan code and not my code, when getting the keyset.

Error stack below:

Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash
INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=4) containing 3 addresses: [/10.19.139.101:11222, /10.19.139.102:11222, /10.19.139.103:11222]
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.103:11222), adding to the pool.
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool.
Jun 26, 2014 7:19:03 AM org.infinispan.client.hotrod.RemoteCacheManager start
INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3
Jun 26, 2014 7:19:08 AM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus
WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B
Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[8] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B
	at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143)
	at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99)
	at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19)
	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46)
	at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612)
	at com.redhat.refarch.jdg.gasshopper.Client.main(Client.java:74)


(In reply to Martin Gencur from comment #8)
> Babak,
> how can you get the same error when remoteCache.keySet().size() returns just
> an integer? I don't understand.

Comment 11 Martin Gencur 2014-06-26 08:13:23 UTC

Babak, could you please attach a reproducer? The keySet() operation takes forever on my side so wasn't able to reproduce it. Thanks

As a workaround, does it work for you to just call remoteCache.size() instead of remoteCache.keySet().size()? Or do you need the keySet?

Comment 12 Babak Mozaffari 2014-06-26 17:08:36 UTC

Martin, I will work on it when I get a chance but so far I've only had with my real application.

For a distributed cache, calling cache.size() only returns the number of entries held by the contacted JDG node. cache.keySet().size() gets all keys and therefore returns total cache size. My client code calls both to make a point of the difference between these two.

Comment 13 Babak Mozaffari 2014-06-30 18:10:48 UTC

Created attachment 913516 [details]
Test client code for follow-up issue with keySet().size()

Attached is a test case to reproduce the keySet() issue. Instructions for setup and run are the same. Source code is in the JAR file.

It easily reproduces for me on the first try using default settings:

java -Djdg.host=10.19.139.101 -jar TestCase.jar 

Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 readNewTopologyAndHash
INFO: ISPN004006: /10.19.139.101:11222 sent new topology view (id=2) containing 2 addresses: [/10.19.139.101:11222, /10.19.139.102:11222]
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.transport.tcp.TcpTransportFactory updateServers
INFO: ISPN004014: New server added(/10.19.139.102:11222), adding to the pool.
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.RemoteCacheManager start
INFO: ISPN004021: Infinispan version: 6.0.3.Final-redhat-3
Inserted 100 UUID objects
Jun 30, 2014 6:07:40 PM org.infinispan.client.hotrod.impl.protocol.Codec10 checkForErrorsInResponseStatus
WARN: ISPN004005: Error received from the server: java.lang.ClassCastException: java.lang.String cannot be cast to [B
Exception in thread "main" org.infinispan.client.hotrod.exceptions.HotRodClientException:Request for message id[103] returned server error (status=0x85): java.lang.ClassCastException: java.lang.String cannot be cast to [B
	at org.infinispan.client.hotrod.impl.protocol.Codec10.checkForErrorsInResponseStatus(Codec10.java:143)
	at org.infinispan.client.hotrod.impl.protocol.Codec10.readHeader(Codec10.java:99)
	at org.infinispan.client.hotrod.impl.operations.HotRodOperation.readHeaderAndValidate(HotRodOperation.java:56)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:37)
	at org.infinispan.client.hotrod.impl.operations.BulkGetKeysOperation.executeOperation(BulkGetKeysOperation.java:19)
	at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:46)
	at org.infinispan.client.hotrod.impl.RemoteCacheImpl.keySet(RemoteCacheImpl.java:612)
	at com.redhat.refarch.jdg.bug.Test.main(Test.java:39)

Comment 14 Martin Gencur 2014-07-01 13:59:46 UTC

I was able to reproduce the problem with keySet and found the root cause. JIRA attached.

Comment 15 Martin Gencur 2014-07-08 10:03:20 UTC

The fix is in CR2.