Created attachment 641648 [details] TRACE from JDG started with standalone-ha.xml This occurs when we testing standalone-storage-only.xml example configuration. The first JDG server is started with usual standalone-ha.xml configuration and the second JDG server with provided example configuration. I am attaching 2 TRACE logs - one is from "main" server and the second is from "storage-only" server. I will provide any other information if needed.
Created attachment 641649 [details] RACE from JDG started with standalone-storage-only.xml
I'm seeing this in resilience tests for 6.1.0.ER4, I've created more general JIRA for this.
Dan Berindei <dberinde> made a comment on jira ISPN-2550 Galder, I think I have a fix for this issue: https://github.com/danberindei/infinispan/commit/3712ffac1ec1503f17b3f9de022bfc98a20b90e1 The problem is that I don't have a test to go with it, so I'm not sure if it really works. So I'm not issuing a PR, but I'm leaving it here for reference.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Dan, did you check the functional test Michal's referring to? You might be able to create a test out of that? I'm assigning to you since you're more familiar with these changes.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Michal, can you share the test so that we can map it to an Infinispan unit test and verify Dan's fix?
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 I found this using resilience test that's implemented in the distributed smartfrog framework that we run in our perflab , I don't have it in a simple test method. What it does is this: 1. start 4 nodes 2. let them run 5 min 3. kill node2 4. wait for cluster of node1,node3,node4 5. wait 5 min 6. start node2 7. wait for cluster node1 - node4 8. wait 5 min the exception happens in step 3 right after killing the node2. I also managed to reproduce this locally running 4 nodes on my laptop - that's how I debugged it.
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 And one more important thing: during the whole test, constant small load of multiple hotrod clients is applied. I think I had to have at least 10 locally for the bug to appear. Seems like it happens when they're receiving the new topology and it fails for some of them...
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Michal, in the beginning you mentioned something about Tomas finding this in a functional test, that's the test I'm looking for :) Also, if you can replicate the issue easily, can you try Dan's fix to see if it works?
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 Right, that's true, I've just spoken with Tomas, he's gonna supply the way how to test this in his scenario. I'll try to test dan's fix as well.
Hi Galder, I experience this problem in our functional test suite for remote mode (server). preNOTE: you probably don'e need to install Arquillian project as it's CR1 is published already. preNOTE: you need to create empty directory with name "bundles" in edg0/ edg1/ etc. Please see this doc: https://docspace.corp.redhat.com/docs/DOC-87715 Download our tests from svn and run this specific test. (for storage only example) Just sink to edgTest/trunk/remote and run mvn -s ~/programs/eclipseWorkspace/settings_mead_jdg_plus_local.xml clean verify -Dstack=udp -pl config-examples/standalone-storage-only -Dnode0.edghome=/home/tsykora/edg0 -Dnode1.edghome=/home/tsykora/edg1 -Dnode2.edghome=/home/tsykora/edg2 -Dmaven.test.failure.ignore=true NOTE: this user specific mvn setting file (-s) is pointing to my "local" repo which is comming with regular ER builds. You can ignore it and simply run this with these settings using MEAD repo: https://svn.devel.redhat.com/repos/jboss-qa/jdg/scripts/settings_mead_jdg.xml You can obtain latest JDG server from here: http://download.lab.bos.redhat.com/devel/jdg/stage/JDG-6.1.0-ER5/ I hope I didn't forget anything. In case of any problem, anything, let me know.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Tomas, I was wondering which of the functional tests you had developed was failing, and where (stacktrace of failure...etc). The idea is to replicate that specific test in the Infinispan codebase. Thanks.
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 Tomas' tracelog shows exactly the same spot as my scenario: https://bugzilla.redhat.com/attachment.cgi?id=641649 (I'm not sure about his test scenario though)
Created attachment 661308 [details] StorageOnlyConfigExampleTest failure I attached surefire report from our test suite. Galder, please, see test: trunk/remote/config-examples/standalone-storage-only/src/test/java/com.jboss.datagrid.test.examples.StorageOnlyConfigExampleTest.java It is failing on line 73: rc1.put("k", "v"); This put caused attached stack trace. We are starting one JDG server with standalone-ha.xml and the second JDG with standalone-storage-only.xml which you can find in jsgServer/docs/examples/configs.
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 I've run tests locally with dan's fix and I'm seeing these exceptions: {code} 11:19:23,919 ERROR [org.infinispan.server.hotrod.HotRodDecoder] (HotRodClientMaster-5) ISPN005009: Unexpected error before any request parameters read java.lang.IndexOutOfBoundsException: 2 at scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:44) at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:96) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1$$anonfun$apply$mcVI$sp$1.apply(AbstractTopologyAwareEncoder1x.scala:92) at scala.collection.immutable.Range.foreach(Range.scala:81) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x$$anonfun$writeHashTopologyHeader$1.apply$mcVI$sp(AbstractTopologyAwareEncoder1x.scala:92) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78) at org.infinispan.server.hotrod.AbstractTopologyAwareEncoder1x.writeHashTopologyHeader(AbstractTopologyAwareEncoder1x.scala:89) at org.infinispan.server.hotrod.AbstractEncoder1x.writeHeader(AbstractEncoder1x.scala:62) at org.infinispan.server.hotrod.HotRodEncoder.encode(HotRodEncoder.scala:63) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:67) at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60) at org.jboss.netty.channel.Channels.write(Channels.java:712) at org.jboss.netty.channel.Channels.write(Channels.java:679) at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248) at org.infinispan.server.core.AbstractProtocolDecoder.exceptionCaught(AbstractProtocolDecoder.scala:295) at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:533) at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:49) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:472) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:333) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code}
https://svn.devel.redhat.com/repos/jboss-qa/jdg/jdg-functional-tests/trunk/remote/config-examples/standalone-storage-only/src/test/java/com/jboss/datagrid/test/examples/StorageOnlyConfigExampleTest.java
Created attachment 661373 [details] "NORMAL" node standalone-ha.xml config file
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Hey [~dan.berindei], can you check that IndexOutOfBoundsException issue? Let's see if Michal can upload TRACE. [~NadirX], Tomas issue appears to show a storage only node (which shouldn't have any endpoints, log ending in 49...) responding to a client request, so the endpoint is somehow active. Can you check the JDG configuration he's using to see if there's any issues there?
Created attachment 661374 [details] "STORAGE-ONLY" node config file
Just a note about the test: When we create a RemoteCacheManager and passing just one address to it, it does *not* mean that all requsts through cache.put/get will go just to this one address but possibly to all nodes in the cluster. Is that right? AFAIK the HotRod client is dynamically getting the information about all clustered nodes and autonomously chooses one of the cluster nodes to send requests to. If my assumption is correct, we would need to use Memcached or REST client to properly test the storage-only example, not HotRod.
Yes, RCMs get the server list dynamically from the servers. However only the servers with an endpoint should add their address to the list.
Dan Berindei <dberinde> made a comment on jira ISPN-2550 The IndexOutOfBoundsException seems to appear because we're generating numOwners (2) "denormalized" hash ids for each segment, but the consistent hash has more than owners for one segment (3). This can happen during a join, when the write CH is a union between the previous CH and the new, balanced, CH. Tomas, I've updated my branch to use the read CH instead, could you try again?
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 Dan, I wanted to try your change, but I don't see any further commit on the branch https://github.com/danberindei/infinispan/tree/t_2550_m
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 900MB of tasty tracelogs from runs with 5.2.0.Beta5 (resilience tests on hudson / perflab) http://www.qa.jboss.com/~mlinhard/test_results/serverlogs-trace-ispn2550.zip njoy!
Dan Berindei <dberinde> made a comment on jira ISPN-2550 Michal, what is the last commit you had when you ran the test?
Michal Linhard <mlinhard> made a comment on jira ISPN-2550 The IndexOutOfBoundsException was found when running with https://github.com/danberindei/infinispan/commit/c3325b134704016fa556343529d6a3a5b9a96bcb btw now i can see another commit on the t_2550_m branch, would it still be helpful to test with it ?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2550 Tomas, seems like the config that you provided works fine as storage only. Can you create a separate issue to follow your issue? Don't wanna mix with node kill issue. Also, if you can replicate the issue again and provide JDG version information, TRACE logs...etc? Can you try to replicate the issue on master of JDG too?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2624 To recap: - According to linked BZ, https://bugzilla.redhat.com/attachment.cgi?id=641649 should not have an endpoint started, but it does have (see references to Hot Rod decoder and Netty). Is this some misconfiguration? - [~NadirX] verified that the storage node only configuration works in JDG master as expected, not starting any endpoints. So Tomas, can you verify the issue is still present in JDG master? If it is, can you attach once again configurations of each server and TRACE logs with org.infinispan category to this JIRA?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2624 Please assign JIRA back to me when confirmed this is an issue in Infinispan.
Tomas Sykora <tsykora> made a comment on jira ISPN-2624 2 config files + 2 TRACE log. 2 the same runs, in one was enabled TRACE in "NORMAL" node (started with standalone-ha.xml) and in the second run was enabled TRACE in "STORAGE-ONLY" node (i.e. node started with standalone-storage-only.xml configuration) These nodes were build from jdg master December 14.