Bug 875151 - Storage-only example: HotRodDecoder: NoSuchElementException: key not found: node1/clustered
Summary: Storage-only example: HotRodDecoder: NoSuchElementException: key not found: n...
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Server
Version: 6.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ER8
: 6.1.0
Assignee: Tristan Tarrant
QA Contact: Tomas Sykora
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-09 16:28 UTC by Tomas Sykora
Modified: 2014-03-17 04:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
TRACE from JDG started with standalone-ha.xml (513.03 KB, text/plain)
2012-11-09 16:28 UTC, Tomas Sykora
no flags Details
RACE from JDG started with standalone-storage-only.xml (663.14 KB, text/plain)
2012-11-09 16:29 UTC, Tomas Sykora
no flags Details
StorageOnlyConfigExampleTest failure (8.46 KB, text/plain)
2012-12-11 08:59 UTC, Tomas Sykora
no flags Details
"NORMAL" node standalone-ha.xml config file (13.94 KB, text/xml)
2012-12-11 11:21 UTC, Tomas Sykora
no flags Details
"STORAGE-ONLY" node config file (12.05 KB, text/xml)
2012-12-11 11:22 UTC, Tomas Sykora
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-2624 Major Resolved JDG: Storage-only example: HotRodDecoder: NSEE: key not found: node1/clustered 2014-04-08 02:42:40 UTC

Description Tomas Sykora 2012-11-09 16:28:56 UTC
Created attachment 641648 [details]
TRACE from JDG started with standalone-ha.xml

This occurs when we testing standalone-storage-only.xml example configuration. The first JDG server is started with usual standalone-ha.xml configuration and the second JDG server with provided example configuration.

I am attaching 2 TRACE logs - one is from "main" server and the second is from "storage-only" server.

I will provide any other information if needed.

Comment 1 Tomas Sykora 2012-11-09 16:29:40 UTC
Created attachment 641649 [details]
RACE from JDG started with standalone-storage-only.xml

Comment 2 Michal Linhard 2012-11-26 16:03:24 UTC
I'm seeing this in resilience tests for 6.1.0.ER4,
I've created more general JIRA for this.

Comment 3 JBoss JIRA Server 2012-12-04 13:54:28 UTC
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-2550

Galder, I think I have a fix for this issue: https://github.com/danberindei/infinispan/commit/3712ffac1ec1503f17b3f9de022bfc98a20b90e1

The problem is that I don't have a test to go with it, so I'm not sure if it really works. So I'm not issuing a PR, but I'm leaving it here for reference.

Comment 4 JBoss JIRA Server 2012-12-05 17:18:36 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Dan, did you check the functional test Michal's referring to? You might be able to create a test out of that? I'm assigning to you since you're more familiar with these changes.

Comment 5 JBoss JIRA Server 2012-12-05 18:24:18 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Michal, can you share the test so that we can map it to an Infinispan unit test and verify Dan's fix?

Comment 6 JBoss JIRA Server 2012-12-06 09:47:52 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

I found this using resilience test that's implemented in the distributed smartfrog framework that we run in our perflab , I don't have it in a simple test method.

What it does is this:
1. start 4 nodes
2. let them run 5 min
3. kill node2
4. wait for cluster of node1,node3,node4
5. wait 5 min
6. start node2
7. wait for cluster node1 - node4
8. wait 5 min

the exception happens in step 3 right after killing the node2.
I also managed to  reproduce this locally running 4 nodes on my laptop - that's how I debugged it.

Comment 7 JBoss JIRA Server 2012-12-06 10:02:20 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

And one more important thing: during the whole test, constant small load of multiple hotrod clients is applied. I think I had to have at least 10 locally for the bug to appear. Seems like it happens when they're receiving the new topology and it fails for some of them...

Comment 8 JBoss JIRA Server 2012-12-10 14:27:51 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Michal, in the beginning you mentioned something about Tomas finding this in a functional test, that's the test I'm looking for :)

Also, if you can replicate the issue easily, can you try Dan's fix to see if it works?

Comment 9 JBoss JIRA Server 2012-12-10 15:08:04 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

Right, that's true, I've just spoken with Tomas, he's gonna supply the way how to test this in his scenario.
I'll try to test dan's fix as well.

Comment 10 Tomas Sykora 2012-12-10 15:15:37 UTC
Hi Galder,

I experience this problem in our functional test suite for remote mode (server).

preNOTE: you probably don'e need to install Arquillian project as it's CR1 is published already.
preNOTE: you need to create empty directory with name "bundles" in edg0/ edg1/ etc.

Please see this doc: https://docspace.corp.redhat.com/docs/DOC-87715
Download our tests from svn and run this specific test. (for storage only example)

Just sink to edgTest/trunk/remote and run

 mvn -s ~/programs/eclipseWorkspace/settings_mead_jdg_plus_local.xml  clean verify -Dstack=udp -pl config-examples/standalone-storage-only  -Dnode0.edghome=/home/tsykora/edg0 -Dnode1.edghome=/home/tsykora/edg1 -Dnode2.edghome=/home/tsykora/edg2 -Dmaven.test.failure.ignore=true 

NOTE: this user specific mvn setting file (-s) is pointing to my "local" repo which is comming with regular ER builds. You can ignore it and simply run this with these settings using MEAD repo:

https://svn.devel.redhat.com/repos/jboss-qa/jdg/scripts/settings_mead_jdg.xml

You can obtain latest JDG server from here: http://download.lab.bos.redhat.com/devel/jdg/stage/JDG-6.1.0-ER5/

I hope I didn't forget anything. In case of any problem, anything, let me know.

Comment 11 JBoss JIRA Server 2012-12-10 16:47:00 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Tomas, I was wondering which of the functional tests you had developed was failing, and where (stacktrace of failure...etc). The idea is to replicate that specific test in the Infinispan codebase. Thanks.

Comment 12 JBoss JIRA Server 2012-12-10 17:19:47 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

Tomas' tracelog shows exactly the same spot as my scenario: https://bugzilla.redhat.com/attachment.cgi?id=641649 (I'm not sure about his test scenario though)

Comment 13 Tomas Sykora 2012-12-11 08:59:54 UTC
Created attachment 661308 [details]
StorageOnlyConfigExampleTest failure

I attached surefire report from our test suite.
Galder, please, see test: trunk/remote/config-examples/standalone-storage-only/src/test/java/com.jboss.datagrid.test.examples.StorageOnlyConfigExampleTest.java

It is failing on line 73: rc1.put("k", "v"); 

This put caused attached stack trace.
We are starting one JDG server with standalone-ha.xml and the second JDG with standalone-storage-only.xml which you can find in jsgServer/docs/examples/configs.

Comment 14 JBoss JIRA Server 2012-12-11 10:38:37 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

I've run tests locally with dan's fix and I'm seeing these exceptions:
{code}
11:19:23,919 ERROR [org.infinispan.server.hotrod.HotRodDecoder] (HotRodClientMaster-5) ISPN005009: Unexpected error before any request parameters read
java.lang.IndexOutOfBoundsException: 2
	at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:44)
	at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:96)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:92)
	at scala.collection.immutable.Range.foreach(Range.scala:81)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1.apply$mcVI$sp(AbstractTopologyAwareEncoder1x.scala:92)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
	at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x.writeHashTopologyHeader(AbstractTopologyAwareEncoder1x.scala:89)
	at org.infinispan.server.hotrod.AbstractEncoder1x.writeHeader(AbstractEncoder1x.scala:62)
	at org.infinispan.server.hotrod.HotRodEncoder.encode(HotRodEncoder.scala:63)
	at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:67)
	at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60)
	at org.jboss.netty.channel.Channels.write(Channels.java:712)
	at org.jboss.netty.channel.Channels.write(Channels.java:679)
	at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
	at org.infinispan.server.core.AbstractProtocolDecoder.exceptionCaught(AbstractProtocolDecoder.scala:295)
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533)
	at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:49)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
{code}

Comment 16 Tomas Sykora 2012-12-11 11:21:54 UTC
Created attachment 661373 [details]
"NORMAL" node standalone-ha.xml config file

Comment 17 JBoss JIRA Server 2012-12-11 11:21:58 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Hey [~dan.berindei], can you check that IndexOutOfBoundsException issue? Let's see if Michal can upload TRACE.

[~NadirX], Tomas issue appears to show a storage only node (which shouldn't have any endpoints, log ending in 49...) responding to a client request, so the endpoint is somehow active. Can you check the JDG configuration he's using to see if there's any issues there?

Comment 18 Tomas Sykora 2012-12-11 11:22:30 UTC
Created attachment 661374 [details]
"STORAGE-ONLY" node config file

Comment 19 Martin Gencur 2012-12-11 13:00:08 UTC
Just a note about the test: When we create a RemoteCacheManager and passing just one address to it, it does *not* mean that all requsts through cache.put/get will go just to this one address but possibly to all nodes in the cluster. Is that right? AFAIK the HotRod client is dynamically getting the information about all clustered nodes and autonomously chooses one of the cluster nodes to send requests to. If my assumption is correct, we would need to use Memcached or REST client to properly test the storage-only example, not HotRod.

Comment 20 Tristan Tarrant 2012-12-11 13:08:41 UTC
Yes, RCMs get the server list dynamically from the servers. However only the servers with an endpoint should add their address to the list.

Comment 21 JBoss JIRA Server 2012-12-11 13:14:31 UTC
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-2550

The IndexOutOfBoundsException seems to appear because we're generating numOwners (2) "denormalized" hash ids for each segment, but the consistent hash has more than owners for one segment (3). This can happen during a join, when the write CH is a union between the previous CH and the new, balanced, CH.

Tomas, I've updated my branch to use the read CH instead, could you try again?

Comment 22 JBoss JIRA Server 2012-12-11 14:23:06 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

Dan, I wanted to try your change, but I don't see any further commit on the branch https://github.com/danberindei/infinispan/tree/t_2550_m

Comment 23 JBoss JIRA Server 2012-12-11 15:34:28 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

900MB of tasty tracelogs from runs with 5.2.0.Beta5 (resilience tests on hudson / perflab)

http://www.qa.jboss.com/~mlinhard/test_results/serverlogs-trace-ispn2550.zip

njoy!

Comment 24 JBoss JIRA Server 2012-12-12 14:07:45 UTC
Dan Berindei <dberinde@redhat.com> made a comment on jira ISPN-2550

Michal, what is the last commit you had when you ran the test?

Comment 25 JBoss JIRA Server 2012-12-12 14:39:51 UTC
Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-2550

The IndexOutOfBoundsException was found when running with https://github.com/danberindei/infinispan/commit/c3325b134704016fa556343529d6a3a5b9a96bcb

btw now i can see another commit on the t_2550_m branch, would it still be helpful to test with it ?

Comment 26 JBoss JIRA Server 2012-12-12 14:46:23 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2550

Tomas, seems like the config that you provided works fine as storage only.

Can you create a separate issue to follow your issue? Don't wanna mix with node kill issue.

Also, if you can replicate the issue again and provide JDG version information, TRACE logs...etc? Can you try to replicate the issue on master of JDG too?

Comment 27 JBoss JIRA Server 2012-12-13 11:48:21 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2624

To recap:

- According to linked BZ, https://bugzilla.redhat.com/attachment.cgi?id=641649 should not have an endpoint started, but it does have (see references to Hot Rod decoder and Netty). Is this some misconfiguration?
- [~NadirX] verified that the storage node only configuration works in JDG master as expected, not starting any endpoints.

So Tomas, can you verify the issue is still present in JDG master?

If it is, can you attach once again configurations of each server and TRACE logs with org.infinispan category to this JIRA?

Comment 28 JBoss JIRA Server 2012-12-13 11:49:03 UTC
Galder Zamarreño <galder.zamarreno@redhat.com> made a comment on jira ISPN-2624

Please assign JIRA back to me when confirmed this is an issue in Infinispan.

Comment 29 JBoss JIRA Server 2012-12-14 16:05:15 UTC
Tomas Sykora <tsykora@redhat.com> made a comment on jira ISPN-2624

2 config files + 2 TRACE log. 2 the same runs, in one was enabled TRACE in "NORMAL" node (started with standalone-ha.xml) and in the second run was enabled TRACE in "STORAGE-ONLY" node (i.e. node started with standalone-storage-only.xml configuration)

These nodes were build from jdg master December 14.


Note You need to log in before you can comment on or make changes to this bug.