Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 900667 (JBPAPP6-889)

Summary:

HornetQ cluster is not created when network connection is re-established

Product:

[JBoss] JBoss Enterprise Application Platform 6

Reporter:

Miroslav Novak <mnovak>

Component:

HornetQ

Assignee:

Jeff Mesnil <jmesnil>

Status:

CLOSED NEXTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

high

Version:

6.0.0

CC:

brian.stansberry, jmesnil, mnovak, pgier, sgilda, twells

Target Milestone:

---

Target Release:

EAP 6.0.1

Hardware:

Unspecified

OS:

Unspecified

URL:

http://jira.jboss.org/jira/browse/JBPAPP6-889

Whiteboard:

eap601candidate

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-11-05 17:11:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs.zip	none
logs.zip	none

Description Miroslav Novak 2012-06-26 12:33:09 UTC

Affects: Release Notes
project_key: JBPAPP6

Test scenario:
1. Change standalone-full-ha.xml configuration:
Add:
{code}<reconnect-attempts>5</reconnect-attempts>{code}
To:
{code}
<cluster-connections>
 <cluster-connection name="my-cluster">
 ...
 <reconnect-attempts>5</reconnect-attempts>
 ... 
 </cluster-connection>
</cluster-connections>
{code}

2. Start two EAP 6 servers in standalone-full-ha.xml profile on two different machines so they create a (HornetQ) cluster.

3. Disconnect network cable between those machines.

4. Wait until both of the machines stop core bridges.

5. Re-connect network cable

Result:
HornetQ cluster is not re-created. Only when one of the servers is restarted then cluster is established.

Comment 1 Miroslav Novak 2012-06-26 16:33:09 UTC

Attachment: Added: logs.zip

Comment 2 Jeff Mesnil 2012-07-11 08:08:31 UTC

@miroslav, the logs you attached are about EAP 5.1.2 but the issue is about EAP 6.0.0 CR1.
Did you attach the correct logs?

Comment 3 Rajesh Rajasekaran 2012-07-11 20:28:04 UTC

Labels: Added: eap601candidate

Comment 6 Miroslav Novak 2012-08-06 09:28:55 UTC

Attachment: Added: logs.zip

Comment 7 Jeff Mesnil 2012-08-06 10:27:01 UTC

After looking at the latest attached logs, I think this is a bug related to the storage of deleted topology member.

Reading the logs, if I have not missed anything, the scenario looks like:

After 5 reconnection attempts, the cluster connection bridge is stopped and the topology member is deleted

{noformat}
11:06:54,325 WARN  [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 achieved 6 maxattempts=5 it will stop retrying to reconnect
11:06:54,326 DEBUG [org.hornetq.core.server.cluster.impl.ClusterConnectionBridge] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Cluster Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 failed, permanently=true
...
11:06:54,331 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-0 (HornetQ-client-global-threads-446181564)) removeMember Topology@5a6b54ef[owner=ClusterConnectionImpl@618055402[nodeUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-1, address=jms, server=HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060]] removing nodeID=b6b46983-dfa5-11e1-b4d6-895e08425368, result=null, size = 1: java.lang.Exception: trace
	at org.hornetq.core.client.impl.Topology.removeMember(Topology.java:322) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at org.hornetq.core.client.impl.ServerLocatorImpl.notifyNodeDown(ServerLocatorImpl.java:1360) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$2.run(ClientSessionFactoryImpl.java:1507) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.6.0_22]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.6.0_22]
	at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_22]
{noformat}

at this point, the TopologyMember is stored in topology's getMapDelete() with a value uniqueEventID > 0.

Later on, the cable is plugged and the node is discovered again by the discovery groups and ServerLocatoImpl#connectorsChanged() is called.

{noformat}
TopologyMember member = new TopologyMember(entry.getConnector(), null);
// on this case we set it as zero as any update coming from server should be accepted
topology.updateMember(0, entry.getNodeID(), member);
{noformat}

But the updateMember() call will fail because this topology member is known and stored in the getMapDelete() with a uniqueEventID > 0

{noformat}
11:16:41,575 DEBUG [org.hornetq.core.client.impl.Topology] (hornetq-discovery-group-thread-dg-group1) Update uniqueEvent=0, nodeId=b6b46983-dfa5-11e1-b4d6-895e08425368, memberInput=TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-4, b=null]] being rejected as there was a delete done after that
{noformat} 

Since the topology member is not added back to the cluster, the cluster connection will not be started again.

Comment 8 Jeff Mesnil 2012-08-06 10:27:25 UTC

Link: Added: This issue Cloned to JBPAPP-9647

Comment 13 Paul Gier 2012-09-05 19:58:28 UTC

Hornetq 2.2.19.Final has been build for EAP

Comment 14 Miroslav Novak 2012-10-01 11:21:47 UTC

Based on comments in HORNETQ-1002 I'm closing jira.

Comment 15 sgilda 2012-10-04 14:28:59 UTC

Does this need a release note for EAP 6.0.1? 
If so, the issue needs to be re-opened and the release notes flags set (Affects Release Notes, Not Yet Documented).
If not, the issue needs to be re-opened and the release notes flag set to "Release notes not required".

Comment 16 Tom WELLS 2012-10-09 07:21:01 UTC

Updating the release notes fields.

Comment 17 Tom WELLS 2012-10-09 07:21:01 UTC

Release Notes Docs Status: Added: Needs More Info

Comment 18 Dana Mison 2012-10-16 07:07:22 UTC

Writer: Added: sgilda

Comment 19 sgilda 2012-10-18 15:22:56 UTC

Miroslav, Brian, or Jeff, does this need a release note?

Comment 20 Dana Mison 2012-10-19 04:40:09 UTC

Affects: Added: Release Notes

Comment 21 Jeff Mesnil 2012-10-19 07:20:05 UTC

Sande, yes, we can add a release note about this

Comment 22 sgilda 2012-10-19 12:13:23 UTC

Jeff, is this release note correct?

Comment 23 sgilda 2012-10-19 12:13:23 UTC

Release Notes Text: Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.

Comment 24 sgilda 2012-10-19 12:14:16 UTC

Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.  Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.

Comment 25 sgilda 2012-10-19 13:54:24 UTC

Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted.  Added: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted.

Comment 26 Jeff Mesnil 2012-10-19 14:46:51 UTC

I couldn't have worded it better :) thanks

Comment 27 sgilda 2012-10-19 15:16:27 UTC

Thanks Jeff!

Comment 28 sgilda 2012-10-19 15:16:27 UTC

Release Notes Docs Status: Removed: Needs More Info Added: Documented as Resolved Issue

Comment 29 Miroslav Novak 2012-11-05 17:11:33 UTC

Closing.

Comment 30 Anne-Louise Tangring 2012-11-13 20:07:44 UTC

Release Notes Docs Status: Removed: Documented as Resolved Issue 
Writer: Removed: sgilda 
Release Notes Text: Removed: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted.  
Docs QE Status: Removed: NEW