see https://issues.jboss.org/browse/ISPN-1948
Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-1948 to Coding In Progress
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 Reappeared in ER6 tests: http://www.qa.jboss.com/~mlinhard/hyperion/run48-elas-dist-size16/hyperion1135.log
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 First of all, thx Michal for getting trace logs for this. I must say that this is quite an odd one. I've posted the relevant bits of the logs to https://gist.github.com/2290090. Basically, the client sends the request, server handles it correctly and encodes a response without any issues, yet the client gets messed up. My gut feeling is that there's either a mix up in sockets/buffers/channels on the server, or in the client of some sort. I'm gonna add some extra logging information to both the client and the server to find out more about these situations. With this info as well, we should be able to get more targeted logging for next round of testing.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 @Michal, what do you know about the circumstances when this happens? It happens when a node is added or removed, right?
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 first occurence: edg-perf05.log occurences: 1113 first: 15:18:01,998 last: 15:19:00,510 first time: 203.233 last time: 261.745 this started shortly after node03 join (15:17:56,382) and shortly after cluster formed (15:18:54,651)
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 occurence on size16 test in hyperion: hyperion1135.log occurences: 9242 first: 08:50:05,420 last: 08:58:01,491 first time: 378.689 last time: 854.760 begin: shortly after node 12 started (08:49:55,463) end: 08:58:01,259 shortly after node 3 was killed It definitely happens when the view changes, but I can't connect it with a specific event.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 @Michal, I've added a pull req to get more logging. Once that's in, next tests please get the following logging: - TRACE on org.infinispan.client.hotrod on the client - TRACE on org.infinispan.server for the server.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: CCFR - Manik
Mircea Markus <mmarkus> made a comment on jira ISPN-1948 @Galderz - your pull request with extra logging is now integrated.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,5 @@ -CCFR - Manik+CCFR + +This is a suspected issue. We haven't completely proven or disproven its existence as yet and this is still work in progress. W have created additional logging in the upstream version for QE to test against and help gather more information on the potential bug. + +Until we have more details I cannot really comment too much on the issue except that it appears to happen under load from Hot Rod clients when connected to certain servers which are then taken offline part way during an operation.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,5 +1,4 @@ -CCFR +This is a suspected issue. We haven't completely proven or disproven its existence as yet and this is still work in progress. We have created additional logging in the upstream version for QE to test against and help gather more information on the potential bug. - +</para> -This is a suspected issue. We haven't completely proven or disproven its existence as yet and this is still work in progress. W have created additional logging in the upstream version for QE to test against and help gather more information on the potential bug. +<para> - +This problem occurs under load from Hot Rod clients when connected to certain servers which are then taken offline part way during an operation.-Until we have more details I cannot really comment too much on the issue except that it appears to happen under load from Hot Rod clients when connected to certain servers which are then taken offline part way during an operation.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,4 @@ This is a suspected issue. We haven't completely proven or disproven its existence as yet and this is still work in progress. We have created additional logging in the upstream version for QE to test against and help gather more information on the potential bug. </para> <para> -This problem occurs under load from Hot Rod clients when connected to certain servers which are then taken offline part way during an operation.+This problem occurs under load from Hot Rod clients when connected to certain servers which are then taken offline during an operation.
Please download http://www.dataforte.net/data/jboss-datagrid-server-bz807741.tar.bz2 and unpack it inside an ER6 installation and rerun the tests. This should give us extra logs (trace level obviously).
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 The new tracelog http://www.qa.jboss.com/~mlinhard/hyperion/run61
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Michal, thx for the link but do you know if there was any error related to "Invalid..." message in the clients? I've looked at all the hyperion* files in the link and they show no error message. And what's the difference between those files and the nodeX log files in http://www.qa.jboss.com/~mlinhard/hyperion/run61/report/ ?
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 in http://www.qa.jboss.com/~mlinhard/hyperion/run61 : hyperion1132.log.gz - hyperion1150.log.gz - Client trace logs in http://www.qa.jboss.com/~mlinhard/hyperion/run61/report : node0001.log.gz - node0016.log.gz - Server trace logs hyperion1096.log - hyperion1111.log - sfDaemon logs on server side hyperion1112.log - hyperion1131.log - sfDaemons logs on unused nodes hyperion1151.log - controller sfDaemon log
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 If you haven't found anything in hyperion1132 - 1150 than it's probably not there. I'd have to try again. I haven't checked them myself yet.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Indeed, 32 to 51 didn't show anything: {code} [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1132.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1133.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1134.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1135.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1136.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1137.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1138.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1139.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1140.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1141.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1142.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1143.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1144.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1145.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1146.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1147.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1148.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1149.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1150.log [g@dhcp-144-239:~/Go/jira/1948i/1304]% grep Invalid hyperion1151.log{code}
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Btw, these are huge files, so I unzipped them individually and checked.
So far I haven't reproduced this in elasticity/resilience tests with ER7 up to cluster size 16. I have couple of 32 node tests in front of me, I'll mark VERIFIED once I've done those.
Unfortunately the 32 node elasticity tests showed this again: http://www.qa.jboss.com/~mlinhard/hyperion/run94-elas-dist-32/report/loganalysis/client/
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 In hyperion1132.log there's also a special kind of GET exception: {code} 08:32:25,567 ERROR [org.jboss.smartfrog.jdg.loaddriver.DriverNode] (Client-18) Error doing GET(key454518) to node node0025 (lastOpTime 50 ms) org.infinispan.client.hotrod.exceptions.TransportException:: Unable to unmarshall byte stream at org.infinispan.client.hotrod.impl.RemoteCacheImpl.bytes2obj(RemoteCacheImpl.java:450) at org.infinispan.client.hotrod.impl.RemoteCacheImpl.get(RemoteCacheImpl.java:341) at org.jboss.qa.jdg.adapter.HotRodAdapter$HotRodRemoteCacheAdapter.get(HotRodAdapter.java:244) at org.jboss.smartfrog.jdg.loaddriver.DriverNodeImpl$ClientThread.makeRequest(DriverNodeImpl.java:233) at org.jboss.smartfrog.jdg.loaddriver.DriverNodeImpl$ClientThread.run(DriverNodeImpl.java:386) Caused by: java.io.IOException: Unsupported protocol version 49 at org.jboss.marshalling.river.RiverUnmarshaller.start(RiverUnmarshaller.java:1184) at org.infinispan.marshall.jboss.AbstractJBossMarshaller.startObjectInput(AbstractJBossMarshaller.java:148) at org.infinispan.marshall.jboss.AbstractJBossMarshaller.objectFromByteBuffer(AbstractJBossMarshaller.java:129) at org.infinispan.marshall.AbstractMarshaller.objectFromByteBuffer(AbstractMarshaller.java:90) at org.infinispan.client.hotrod.impl.RemoteCacheImpl.bytes2obj(RemoteCacheImpl.java:448) ... 4 more {code} I'm not creating a special JIRA for this cause I think this is related to erroneously shifted data in some of the streams.
Per Tristan triage, moved from ER8 to ER9 target release.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Don't know what's causing this yet. hyperion1132.log shows some connection refused errors on the start which I think should be solved with ISPN-1995. Then, the first thing that appears is the "IOException: Unsupported protocol version 49" error. node0025 shows no errors/warns/exceptions but it appears to be in the middle of rehashing ___hotRodTopologyCache, memcachedCache and testCache. I'd suggest repeating the test with an ER version that has ISPN-1995 fixed (in case this is some sort of side effect of that jira but I doubt it), and add some further logging: Server: TRACE on org.infinispan.server.hotrod Client: DEBUG on org.infinispan.client.hotrod
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 There're more oddities here: {code}[g@:~/Go/jira/1948i]% grep -r "InvalidResponseException:: Invalid message id" . | grep 201326591 Invalid message id. Expected 743598 and received 201326591 Invalid message id. Expected 761419 and received 201326591 Invalid message id. Expected 783396 and received 201326591{code} Seems like client receives 3 messages with the same 201326591 message id (0xBFFFFFF)? It looks more like a different part of the response is treated as the msg id, rather than the actual message id.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Don't know what's causing this yet. hyperion1132.log shows some connection refused errors on the start which I think should be solved with ISPN-1995. Then, the first thing that appears is the "IOException: Unsupported protocol version 49" error. node0025 shows no errors/warns/exceptions but it appears to be in the middle of rehashing ___hotRodTopologyCache, memcachedCache and testCache. I'd suggest repeating the test with an ER version that has ISPN-1995 fixed (in case this is some sort of side effect of that jira but I doubt it), and add some further logging: {code}Server: TRACE on org.infinispan.server.hotrod Client: DEBUG on org.infinispan.client.hotrod{code}
I did these tests with ER8.1 with fix for Netty and BZ 809631 (uneven requests): Resilience 16 nodes (3owners, crash2) http://www.qa.jboss.com/~mlinhard/hyperion/run105-resi-dist-16-ER8.1-nettyfix/logs/analysis/client/ Resilience 32 nodes (5owners, crash4) http://www.qa.jboss.com/~mlinhard/hyperion/run104-resi-dist-32-ER8.1-nettyfix/logs/analysis/client/ Elasticity 8->16->8 http://www.qa.jboss.com/~mlinhard/hyperion/run106-elas-dist-16-ER8.1-nettyfix/logs/analysis/client/ Elasticity 16->32->16 http://www.qa.jboss.com/~mlinhard/hyperion/run107-elas-dist-32-ER8.1-nettyfix/logs/analysis/client/ And couldn't see the "Invalid magic number issue" in any of them. Is it possible that the netty fix solved it ?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Maybe... :)
I haven't seen this in these tests with ER9: http://www.qa.jboss.com/~mlinhard/hyperion/run110-resi-dist-32-ER9-own5/ http://www.qa.jboss.com/~mlinhard/hyperion/run111-resi-dist-16-ER9-own3/ http://www.qa.jboss.com/~mlinhard/hyperion/run112-elas-dist-16-ER9/ but I'd rather wait before marking this verified after one full round of testing after https://bugzilla.redhat.com/show_bug.cgi?id=809631 is fixed.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 FYI: ISPN-2052 - I don't think you guys send a remove with force return previous value flag, but the exception is similar.
Appeared again in ER10 32 node resilience tests: http://www.qa.jboss.com/~mlinhard/hyperion/run124-resi-dist-32-own2-ER10/logs/analysis/client/
Decision: Not a blocker for JDG 6. Will list as known issue in Release Notes.
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 recent trace logs: http://www.qa.jboss.com/~mlinhard/test_results/ispn1948tracelog2.zip
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Michal, I don't have permissions to download that.
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 Fixed, please try again.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 A summary of what I'm seeing in the latest logs. Effectively, what seems to be happening is that the (buffered)input stream in the client is being reset/rewinded somehow. It seems to go back in the buffer to re-read host information when it should be reading the value of the get() response. Why this happens? No idea...
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Well, either there's a reset, or the client is reading from the wrong socket input stream. IOW, if it's reading from socket stream belonging to a different thread.
Manik Surtani <manik.surtani> made a comment on jira ISPN-1948 Can't you detect if the client is reading from the wrong socket stream by logging what it is reading from each time?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 No sign of thread safety issue. Same transport instance borrowed from the pool and returned for the client thread showing the problem, and no other thread using the same transport instance (buffer input stream is a final instance variable of transport). Attached updated summary.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 @Manik, I'm pretty sure that's not the case here because it uses the same Transport instance throughout sending the request and reading from it, and all socket instance variables in Transport are final, and we're logging what transport is reading each time. See my updated summary.
Dan Berindei <dberinde> made a comment on jira ISPN-1948 Galder, I think it's a concurrency issue after all. The header logged by the client looks like this: A1E1BC030400 01 - topology changed 04 - view id 00 02 - num owners 02 - hash function FFFFFFFF07 - hash space (MAX_INT) 1F - members size (31) 80 04 - numVirtualNodes (512) So the number of servers written by the server is 31, but there are actually 32 server addresses afterwards. I think this is because the topology address cache modified after the server wrote the size but before it wrote the actual addresses.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 The confusion came from lack of logging on the server side and on the client side of how many nodes came in the view. So, I reverted to doing counts rather than decoding the message. I'm adding some clearer messages for the future and will check what's wrong with the concurrency.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Great... we've got the most reliable set of notifications... not... the following test (https://gist.github.com/2819113) fails with: java.lang.AssertionError: size is: 0 at org.infinispan.notifications.cachelistener.SizeAndListenerTest.test000(SizeAndListenerTest.java:78) How do we expect people to use these correctly if we provide no basic guarantees like cache size? Jeez... It doesn't work with CacheEntryModified either cos the container is updated *after* all notifications have been fired. Wonderful...
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Great... we've got the most reliable set of notifications... not... the following test (https://gist.github.com/2819113) fails with: {code}java.lang.AssertionError: size is: 0 at org.infinispan.notifications.cachelistener.SizeAndListenerTest.test000(SizeAndListenerTest.java:78) {code} How do we expect people to use these correctly if we provide no basic guarantees like cache size? Jeez... It doesn't work with CacheEntryModified either cos the container is updated *after* all notifications have been fired. Wonderful...
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 There's no easy fix for this in the current architecture based on cache listener. The idea with ViewIdUpdater listener was to avoid the need to cache the view id itself after updating the address cache, since this would two operations to the cache and would require batching or transactions to make it atomic...etc. There is however a method that will provide the guarantees that I want, and that is interceptors, cos they can be injected anywhere we want, and they can be placed after we know the container has been updated, i.e. EntryWrappingInterceptor. I thin there's another JIRA on cache size guarantees with listeners, but this really needs to be revisited when JSR-107 cache listeners are implemented.
Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-1948 to Reopened
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 The issue is not yet fixed. Just added some further logging yesterday.
Dan Berindei <dberinde> made a comment on jira ISPN-1948 Galder, I think saving the members list in the listener itself should work and it should be easier to implement than using an interceptor.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Dan, I'm not so sure but I'll investigate it. I'm not even sure that there're any guarantees that cache.get() will contents with the entry added or not from CacheEntryCreated. Thanks for the input.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Dan, your suggestion won't work because of https://docs.jboss.org/author/pages/viewpage.action?pageId=5832860, and to prove it, here's a pretty comprehensive test I've built: https://gist.github.com/2844743 - All tests fail. The only guarantees you have is wrt event.getKey/event.getValue. So, for the listener solution to still work, I'd need to catch all the events in the cache and maintain a copy in the listener itself that can be guaranteed to be the expected one. Might as well use a notification bus then... ;) I think an interceptor, located in the right place, can provide me with the right guarantees. I'll create a test to verify it.
Dan Berindei <dberinde> made a comment on jira ISPN-1948 Hmmm, you're right, I guess I was relying on the invocation context being thread-local... it may still work with transactional caches, because the listener should reuse the thread's transaction, but it wouldn't be useful for your use case anyway.
Appeared in ER11 elasticity tests sizes 8->16->8 and 16->32->16
This bug is nominated for JDG 6 GA Release Notes. Assigning NEEDINFO to assignee to ensure that the technical note text still accurately reflects the problem.
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Michal, does ER11 has the last proposed fix in it?
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 Just in the process of running the tests with your patch. I'll run at least 5 different tests - similar to those that demonstrated it lately. First one didn't show the magic number problem.
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 ER11 doesn't contain the fix, it's based on 5.1.5.Final
Triaged to post-6.0 release: Rationale - potential fix too hard to verify, and test risk too high this late in the release cycle.
Galders patch for ER11 worked in 5 tests 3 elasticity tests with 32 nodes 2 elasticity tests with 16 nodes http://www.qa.jboss.com/~mlinhard/hyperion/run164-elas-dist-32-ER11-ISPN-1948-fix/logs/analysis/client/ http://www.qa.jboss.com/~mlinhard/hyperion/run165-elas-dist-32-ER11-ISPN-1948-fix/logs/analysis/client/ http://www.qa.jboss.com/~mlinhard/hyperion/run166-elas-dist-16-ER11-ISPN-1948-fix/logs/analysis/client/ http://www.qa.jboss.com/~mlinhard/hyperion/run167-elas-dist-16-ER11-ISPN-1948-fix/logs/analysis/client/ http://www.qa.jboss.com/~mlinhard/hyperion/run171-elas-dist-32-ER11-ISPN-1948-fix/logs/analysis/client/ in ER11 test cycle there was one occurence of ISPN-1948 in both 32 and 16 node elasticity tests http://www.qa.jboss.com/~mlinhard/hyperion/run148-elas-dist-32-ER11/logs/analysis/client/ http://www.qa.jboss.com/~mlinhard/hyperion/run157-elas-dist-16-ER11/logs/analysis/client/
Created attachment 590128 [details] ISPN-1948 patch attaching the patch infinispan-server-hotrod-5.1.5.FINAL-redhat-1.jar md5: 0970619a4dfeb20440004be5d3fb02ff
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 So, the patch looking good so far? Do you plan any further testing in the near future?
Michal Linhard <mlinhard> made a comment on jira ISPN-1948 Yes. The patch looks good. The magic number problem doesn't appear, nor it causes any other problems in elasticity tests, do you think we should run more tests or different types of tests ?
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-1948 Your elasticity tests where the best tests we had, and we'd easily see the failure if it was there. I'm confident this now works as expected since I was able to reproduce similar scenarios in smaller scale, and with an interceptor, we can avoid this type of issues. I'll make a note in the pull req to get this integrated. Thx for all your help Michal!!!
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,3 @@ -This is a suspected issue. We haven't completely proven or disproven its existence as yet and this is still work in progress. We have created additional logging in the upstream version for QE to test against and help gather more information on the potential bug. +The issue is a race condition on the Hot Rod server, which can lead to sending topologies erroneously as a result of addition of a new node to the cluster. When the issue appears, clients will start seeing "Invalid magic number" error messages as a result of the stream containing unexpected data. </para> -<para> +When the issue appears, the safest thing to do is to restart the client, but the client might recover itself once all the unexpected data has been consumed. If the client recovers, the view topology it has will be lacking one of the added nodes, so although it would work relatively Ok, it would lead to some uneven request distribution.-This problem occurs under load from Hot Rod clients when connected to certain servers which are then taken offline during an operation.
Dan Berindei <dberinde> updated the status of jira ISPN-1948 to Resolved
Occured in CR1 elasticity tests: http://www.qa.jboss.com/~mlinhard/hyperion/run177-elas-dist-32-CR1/logs/analysis/client/
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,4 @@ -The issue is a race condition on the Hot Rod server, which can lead to sending topologies erroneously as a result of addition of a new node to the cluster. When the issue appears, clients will start seeing "Invalid magic number" error messages as a result of the stream containing unexpected data. +The issue is a race condition on the Hot Rod server, which can lead to topologies being sent erroneously as a result of addition of a new node to the cluster. When the issue appears, clients will start seeing "Invalid magic number" error messages as a result of unexpected data within the stream. </para> -When the issue appears, the safest thing to do is to restart the client, but the client might recover itself once all the unexpected data has been consumed. If the client recovers, the view topology it has will be lacking one of the added nodes, so although it would work relatively Ok, it would lead to some uneven request distribution.+<para> +When this problem is encountered, the recommended approach is to restart the client. If the client is not restarted, on some occasions, the client may recover after the unexpected data is consumed but this is not guaranteed. If the client recovers with a restart, the view topology displayed does not display one of the nodes added, resulting in uneven request distribution.
Need Galder to look at this again. Was release noted for 6.0.0. We'd like to fix it for 6.0.1 if valid, but is not a regression so can be deferred if necessary.
Relinked to a new Jira to avoid pollution of old issue
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2209 @Tristan, the fix for this is in 5.1.6.FINAL. Can you confirm the issue is present with that version? You've marked 5.1.5 as affects...
Tristan Tarrant <ttarrant> made a comment on jira ISPN-2209 5.1.5.FINAL was the version that was tested by [~mlinhard] (JDG 6.0.0.CR1 which was then renamed to JDG 6.0.0.GA). I need QE to get some runs on the 6.0.1.ER2 builds (which use 5.1.6.FINAL).
Not a blocker for 6.0.1.
Galder Zamarreño <galder.zamarreno> updated the status of jira ISPN-2209 to Resolved
Galder Zamarreño <galder.zamarreno> made a comment on jira ISPN-2209 I have no reason to believe this is an issue any more with the latest versions. If you think otherwise after testing 5.1.6 or later, then re-open :)
ttarrant will add jira links as appropriate.
ttarrant requests re-create attempt using upcoming jdg61-ER5 build.
I've run 2 tests with JDG 6.1.0.ER5 that previously demonstrated this issue and haven't seen it. ER5 build is burdened with other errors that might affect manifestation of this issue, namely bugzillas 886565, 886549, 887323 This heisenbug wasn't easy to spot even in builds where it was confirmed, however for this build, I think these two tests are enough, if the issue reappears, I'll reopen. The tests I did were elasticity tests 8-16-8 and 16-32-16 in hyperion: http://www.qa.jboss.com/~mlinhard/hyperion3/run0009 http://www.qa.jboss.com/~mlinhard/hyperion3/run0010