Bug 760895 - Reopened: Error detecting crashed member during shutdown of EDG 6.0.0.Beta
Summary: Reopened: Error detecting crashed member during shutdown of EDG 6.0.0.Beta
Keywords:
Status: VERIFIED
Alias: None
Product: JBoss Data Grid 6
Classification: JBoss
Component: Infinispan
Version: 6.0.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ER6
: 6.1.0
Assignee: Tristan Tarrant
QA Contact: Martin Gencur
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-07 09:59 UTC by Ondrej Nevelik
Modified: 2018-09-12 22:30 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Occasionally, when shutting down nodes in a cluster, the following message is reported: "ERROR [org.infinispan.server.hotrod.HotRodServer] ISPN006002: Error detecting crashed member: java.lang.IllegalStateException: Cache '___hotRodTopologyCache' is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container." </para><para> This is due to the fact that a node has detected another node's shutdown and is attempting to update the topology cache while itself is also shutting down. The message is harmless, and it will be removed in a future release
Clone Of:
Environment:
Last Closed: 2012-04-04 12:58:47 UTC
Type: ---


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ISPN-2062 0 'Major' 'Resolved' 'CrashedMemberDetectorListener should check whether invocations are allowed' 2019-11-14 16:11:32 UTC

Description Ondrej Nevelik 2011-12-07 09:59:12 UTC
Description of problem:
Given a cluster of a few EDG servers an error is occuring in ~25% of our performance test runs (independent of client type tested) while gracefully shutting down the servers - stopping in parallel (kill "edg_pid", without -9 switch): 
ERROR [org.infinispan.server.hotrod.HotRodServer] ISPN006002: Error detecting crashed member: java.lang.IllegalStateException: Cache '___hotRodTopologyCache' is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container.

The whole log can be found at http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-rest/13/console-edg-perf02/

Comment 1 Tristan Tarrant 2012-03-02 08:22:10 UTC
The log referenced in the above comment is missing. Does this still happen ?

Comment 2 mark yarborough 2012-04-04 12:58:47 UTC
Tristan Tarrant indicates has not been reproduced in recent builds. Reopen if necessary.

Comment 3 Ondrej Nevelik 2012-04-05 06:21:26 UTC
I am seeing this exception again in ER6 (rest client stress test) - see server log of node01: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/edg-60-perf-client-stress-test-rest/68/artifact/report/size4/serverlogs.zip

Comment 4 Tristan Tarrant 2012-04-20 13:07:10 UTC
I think this ERROR is harmless. Asking Galder

Comment 5 Galder Zamarreño 2012-05-23 13:39:34 UTC
The error log is noisy but should be harmless. What happens is that a node A has detected that node B has gone down, and node A is trying to remove node B from its address cache. However, while doing that, node A is shutting down too, so it cannot update the address cache. The reason this is harmless is because each node tries to do this locally, so if any node is left still running, they'll still remove the node from their address cache.

I can probably improve the code in CrashedMemberDetectorListener to check whether invocations are allowed, rather than only checking whether the cache is terminated. I'll add a jira for this.

Comment 6 mark yarborough 2012-06-06 13:32:05 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Tristan will provide CCFR or will route to appropriate developer.

Comment 7 Tristan Tarrant 2012-06-12 15:36:12 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-Tristan will provide CCFR or will route to appropriate developer.+Occasionally, when shutting down nodes in a cluster, the following message is reported: 
+"ERROR [org.infinispan.server.hotrod.HotRodServer] ISPN006002: Error detecting crashed member: java.lang.IllegalStateException: Cache '___hotRodTopologyCache' is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container."
+This is due to the fact that a node has detected another node's shutdown and is attempting to update the topology cache while itself is also shutting down. The message is harmless, and it will be removed in a future release

Comment 8 Misha H. Ali 2012-06-12 15:39:07 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,4 @@
 Occasionally, when shutting down nodes in a cluster, the following message is reported: 
 "ERROR [org.infinispan.server.hotrod.HotRodServer] ISPN006002: Error detecting crashed member: java.lang.IllegalStateException: Cache '___hotRodTopologyCache' is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container."
+</para><para>
 This is due to the fact that a node has detected another node's shutdown and is attempting to update the topology cache while itself is also shutting down. The message is harmless, and it will be removed in a future release

Comment 9 mark yarborough 2012-11-14 14:42:32 UTC
ttarrant will add jira links as appropriate.

Comment 10 Michal Linhard 2012-12-18 16:40:58 UTC
Not present in 6.1.0.ER6

tested: 
- start 8 nodes (with hotrod endpoint + some test caches)
- fill some values to test cache
- stop (gracefully) all nodes
- wait till all java processes naturally die
- couldn't see any IllegalStateException in any of the server logs


Note You need to log in before you can comment on or make changes to this bug.