During EAP 6.1.0.ER3 testing, I've seen the following set of exceptions in server logs. The job is https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-netDown-repl-async/7/ On perf18: 09:46:01,772 INFO [org.jboss.as.clustering] (Incoming-8,shared=udp) JBAS010225: New cluster view for partition ejb (id: 5, delta: 1, merge: true) : [perf18/ejb, perf21/ejb, perf19/ejb] 09:46:01,773 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,shared=udp) ISPN000093: Received new, MERGED cluster view: MergeView::[perf18/ejb|5] [perf18/ejb, perf21/ejb, perf19/ejb], subgroups=[perf21/ejb|3] [perf21/ejb, perf20/ejb], [perf19/ejb|0] [perf19/ejb], [perf18/ejb|4] [perf18/ejb, perf20/ejb] 09:46:01,817 ERROR [org.infinispan.topology.ClusterTopologyManagerImpl] (notification-thread-0) ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.IllegalStateException: Trying to set a topology with invalid members for cache org.jboss.test.clusterbench.ejb.stateful.RemoteStatefulSBImpl: members = [perf18/ejb, perf21/ejb, perf19/ejb], topology = CacheTopology{id=10, currentCH=ReplicatedConsistentHash{members=[perf18/ejb, perf20/ejb, perf21/ejb, perf19/ejb]}, pendingCH=null} at org.infinispan.topology.ClusterCacheStatus.updateCacheTopology(ClusterCacheStatus.java:165) at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheStatusAfterMerge(ClusterTopologyManagerImpl.java:315) at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:236) at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.handleViewChange(ClusterTopologyManagerImpl.java:579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.6.0_43] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [rt.jar:1.6.0_43] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_43] at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_43] at org.infinispan.notifications.AbstractListenerImpl$ListenerInvocation$1.run(AbstractListenerImpl.java:212) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43] at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43] At the same time on perf19: 09:46:01,772 ERROR [org.jboss.as.clustering] (Incoming-20,shared=udp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address 36d32c21-a759-5f85-76e6-1fe25f038bc6 not registered in transport layer at org.jboss.as.clustering.impl.CoreGroupCommunicationService$ClusterNodeFactoryImpl.getClusterNode(CoreGroupCommunicationService.java:1350) at org.jboss.as.clustering.impl.CoreGroupCommunicationService.translateAddresses(CoreGroupCommunicationService.java:1010) at org.jboss.as.clustering.impl.CoreGroupCommunicationService$GroupView.<init>(CoreGroupCommunicationService.java:1109) at org.jboss.as.clustering.impl.CoreGroupCommunicationService.processViewChange(CoreGroupCommunicationService.java:945) at org.jboss.as.clustering.impl.CoreGroupCommunicationService$MembershipListenerImpl.viewAccepted(CoreGroupCommunicationService.java:1403) at org.jgroups.blocks.MessageDispatcher.handleUpEvent(MessageDispatcher.java:523) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:600) at org.jgroups.blocks.mux.MuxUpHandler.passToAllHandlers(MuxUpHandler.java:156) at org.jgroups.blocks.mux.MuxUpHandler.up(MuxUpHandler.java:123) at org.jgroups.JChannel.up(JChannel.java:707) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020) at org.jgroups.protocols.RSVP.up(RSVP.java:188) at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:638) at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:251) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:799) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:245) at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:453) at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:763) at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:574) at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143) at org.jgroups.protocols.FD.up(FD.java:253) at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) at org.jgroups.protocols.MERGE3.up(MERGE3.java:290) at org.jgroups.protocols.Discovery.up(Discovery.java:359) at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2616) at org.jgroups.protocols.TP.passMessageUp(TP.java:1263) at org.jgroups.protocols.TP$4.run(TP.java:1181) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_43] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_43] at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_43] At the same time, perf20 is separated from the cluster by network failure. At the same time on perf21: 09:46:01,781 INFO [org.jboss.as.clustering] (Incoming-19,shared=udp) JBAS010226: New cluster view for partition ejb: 5 (org.jboss.as.clustering.impl.CoreGroupCommunicationService$GroupView@4e21ce1 delta: 1, merge: true) 09:46:01,784 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-19,shared=udp) ISPN000093: Received new, MERGED cluster view: MergeView::[perf18/ejb|5] [perf18/ejb, perf21/ejb, perf19/ejb], subgroups=[perf21/ejb|3] [perf21/ejb, perf20/ejb], [perf19/ejb|0] [perf19/ejb], [perf18/ejb|4] [perf18/ejb, perf20/ejb] Searching for keywords, bug 924220 and https://issues.jboss.org/browse/ISPN-2752 come up, but I don't really think they are related.
And also seen in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-netDown-repl-sync/7/console-perf19/ (only the first exception, the second doesn't seem to appear anywhere).
Still seeing this with EAP 6.1.0.ER8. For example: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-netDown-repl-sync/11/artifact/report/config/jboss-perf21/server.log https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-http-session-netDown-repl-async/11/artifact/report/config/jboss-perf21/server.log
The error 02:31:14,431 ERROR [org.jboss.as.clustering] (Incoming-12,shared=tcp) JBAS010223: ViewAccepted failed: java.lang.IllegalStateException: JBAS010240: Address ec1de289-4782-a45c-cb61-d662009b71e7 not registered in transport layer was seen during 6.1.1.ER3 testing here: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/eap-6x-failover-ejb-ejbremote-shutdown-dist-sync-tcpStack/17/artifact/report/config/jboss-perf20/server.log however without the first exception (Failed to recover cluster state after the current node became the coordinator: ...).
Not seen in 6.1.1. Please retest against 6.3.0.GA.
I was instructed to retest against 6.3.0.DR1. Haven't seen it there so I consider it verified.
6.3.0.DR2 I've got a lovely {noformat} ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.NullPointerException {noformat} with a dead-simple scenario, start, deploy, undeploy, shutdown. Two nodes, <distributable/> app. I'm attaching the log.
Created attachment 871331 [details] 6.3.0.DR2_ISPN000196_NPE
Added the upstream counterpart.
I'm sorry, the extend of my attention disorder can be hardly exaggerated. I accidentally swapped logs with the previous test run :-)
Comment on attachment 871331 [details] 6.3.0.DR2_ISPN000196_NPE The [attachment 871331 [details]] is invalid. Ignore it.
Paul, what version was this actually fixed in? Your comment (4) seem to suggest it's been fixed since 6.1.1. Does it need a release note for 6.3.0? If so, can you please supply an outline of the situation using the prompts in the Doc Text field above. Cheers
This does not need a release note since it was fixed in a previous release cycle, even though it was only verified this cycle.
Tristan Tarrant <ttarrant> updated the status of jira ISPN-3395 to Resolved
The original issue in this BZ was fixed as part of ISPN-2935.
Created attachment 1102190 [details] ISPN000196: Failed to recover cluster state after the current node became the coordinator: java.lang.NullPointerException
I hit the issue again with JBoss EAP 6.4.4.GA during a simple mod_cluster failover scenario on 3 nodes.
I believe that what mbabacek is seeing in comment 15 is in fact bug 1283465.
@LaDiSlAv, hmm, yes, it appears to be the case.
When will this bug be fixed in EAP 6.3.2 GA ?
The issue is tracked by bug 1283465.