Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 900667 (JBPAPP6-889)

Summary: HornetQ cluster is not created when network connection is re-established
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Jeff Mesnil <jmesnil>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.0.0CC: brian.stansberry, jmesnil, mnovak, pgier, sgilda, twells
Target Milestone: ---   
Target Release: EAP 6.0.1   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/JBPAPP6-889
Whiteboard: eap601candidate
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-11-05 17:11:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs.zip
none
logs.zip none

Description Miroslav Novak 2012-06-26 12:33:09 UTC
Affects: Release Notes
project_key: JBPAPP6

Test scenario:
1. Change standalone-full-ha.xml configuration:
Add:
{code}<reconnect-attempts>5</reconnect-attempts>{code}
To:
{code}
<cluster-connections>
 <cluster-connection name="my-cluster">
 ...
 <reconnect-attempts>5</reconnect-attempts>
 ... 
 </cluster-connection>
</cluster-connections>
{code}

2. Start two EAP 6 servers in standalone-full-ha.xml profile on two different machines so they create a (HornetQ) cluster.

3. Disconnect network cable between those machines.

4. Wait until both of the machines stop core bridges.

5. Re-connect network cable

Result:
HornetQ cluster is not re-created. Only when one of the servers is restarted then cluster is established.

Comment 1 Miroslav Novak 2012-06-26 16:33:09 UTC
Attachment: Added: logs.zip


Comment 2 Jeff Mesnil 2012-07-11 08:08:31 UTC
@miroslav, the logs you attached are about EAP 5.1.2 but the issue is about EAP 6.0.0 CR1.
Did you attach the correct logs?

Comment 3 Rajesh Rajasekaran 2012-07-11 20:28:04 UTC
Labels: Added: eap601candidate


Comment 6 Miroslav Novak 2012-08-06 09:28:55 UTC
Attachment: Added: logs.zip


Comment 7 Jeff Mesnil 2012-08-06 10:27:01 UTC
After looking at the latest attached logs, I think this is a bug related to the storage of deleted topology member.

Reading the logs, if I have not missed anything, the scenario looks like:

After 5 reconnection attempts, the cluster connection bridge is stopped and the topology member is deleted

{noformat}
11:06:54,325 WARN  [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 achieved 6 maxattempts=5 it will stop retrying to reconnect
11:06:54,326 DEBUG [org.hornetq.core.server.cluster.impl.ClusterConnectionBridge] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Cluster Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 failed, permanently=true
...
11:06:54,331 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-0 (HornetQ-client-global-threads-446181564)) removeMember Topology@5a6b54ef[owner=ClusterConnectionImpl@618055402[nodeUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-1, address=jms, server=HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060]] removing nodeID=b6b46983-dfa5-11e1-b4d6-895e08425368, result=null, size = 1: java.lang.Exception: trace
	at org.hornetq.core.client.impl.Topology.removeMember(Topology.java:322) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at org.hornetq.core.client.impl.ServerLocatorImpl.notifyNodeDown(ServerLocatorImpl.java:1360) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$2.run(ClientSessionFactoryImpl.java:1507) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.6.0_22]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.6.0_22]
	at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_22]
{noformat}

at this point, the TopologyMember is stored in topology's getMapDelete() with a value uniqueEventID > 0.

Later on, the cable is plugged and the node is discovered again by the discovery groups and ServerLocatoImpl#connectorsChanged() is called.

{noformat}
TopologyMember member = new TopologyMember(entry.getConnector(), null);
// on this case we set it as zero as any update coming from server should be accepted
topology.updateMember(0, entry.getNodeID(), member);
{noformat}

But the updateMember() call will fail because this topology member is known and stored in the getMapDelete() with a uniqueEventID > 0

{noformat}
11:16:41,575 DEBUG [org.hornetq.core.client.impl.Topology] (hornetq-discovery-group-thread-dg-group1) Update uniqueEvent=0, nodeId=b6b46983-dfa5-11e1-b4d6-895e08425368, memberInput=TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-4, b=null]] being rejected as there was a delete done after that
{noformat} 

Since the topology member is not added back to the cluster, the cluster connection will not be started again.



Comment 8 Jeff Mesnil 2012-08-06 10:27:25 UTC
Link: Added: This issue Cloned to JBPAPP-9647


Comment 13 Paul Gier 2012-09-05 19:58:28 UTC
Hornetq 2.2.19.Final has been build for EAP

Comment 14 Miroslav Novak 2012-10-01 11:21:47 UTC
Based on comments in HORNETQ-1002 I'm closing jira.

Comment 15 sgilda 2012-10-04 14:28:59 UTC
Does this need a release note for EAP 6.0.1? 
If so, the issue needs to be re-opened and the release notes flags set (Affects Release Notes, Not Yet Documented).
If not, the issue needs to be re-opened and the release notes flag set to "Release notes not required".


Comment 16 Tom WELLS 2012-10-09 07:21:01 UTC
Updating the release notes fields.

Comment 17 Tom WELLS 2012-10-09 07:21:01 UTC
Release Notes Docs Status: Added: Needs More Info


Comment 18 Dana Mison 2012-10-16 07:07:22 UTC
Writer: Added: sgilda


Comment 19 sgilda 2012-10-18 15:22:56 UTC
Miroslav, Brian, or Jeff, does this need a release note?

Comment 20 Dana Mison 2012-10-19 04:40:09 UTC
Affects: Added: Release Notes


Comment 21 Jeff Mesnil 2012-10-19 07:20:05 UTC
Sande, yes, we can add a release note about this 

Comment 22 sgilda 2012-10-19 12:13:23 UTC
Jeff, is this release note correct?

Comment 23 sgilda 2012-10-19 12:13:23 UTC
Release Notes Text: Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. 


Comment 24 sgilda 2012-10-19 12:14:16 UTC
Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.  Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. 


Comment 25 sgilda 2012-10-19 13:54:24 UTC
Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.  Added: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted. 


Comment 26 Jeff Mesnil 2012-10-19 14:46:51 UTC
I couldn't have worded it better :) thanks

Comment 27 sgilda 2012-10-19 15:16:27 UTC
Thanks Jeff!

Comment 28 sgilda 2012-10-19 15:16:27 UTC
Release Notes Docs Status: Removed: Needs More Info Added: Documented as Resolved Issue


Comment 29 Miroslav Novak 2012-11-05 17:11:33 UTC
Closing.

Comment 30 Anne-Louise Tangring 2012-11-13 20:07:44 UTC
Release Notes Docs Status: Removed: Documented as Resolved Issue 
Writer: Removed: sgilda 
Release Notes Text: Removed: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted.  
Docs QE Status: Removed: NEW