Bug 900667 (JBPAPP6-889)
| Summary: | HornetQ cluster is not created when network connection is re-established | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [JBoss] JBoss Enterprise Application Platform 6 | Reporter: | Miroslav Novak <mnovak> | ||||||
| Component: | HornetQ | Assignee: | Jeff Mesnil <jmesnil> | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | |||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 6.0.0 | CC: | brian.stansberry, jmesnil, mnovak, pgier, sgilda, twells | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | EAP 6.0.1 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| URL: | http://jira.jboss.org/jira/browse/JBPAPP6-889 | ||||||||
| Whiteboard: | eap601candidate | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2012-11-05 17:11:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Attachment: Added: logs.zip @miroslav, the logs you attached are about EAP 5.1.2 but the issue is about EAP 6.0.0 CR1. Did you attach the correct logs? Labels: Added: eap601candidate Attachment: Added: logs.zip After looking at the latest attached logs, I think this is a bug related to the storage of deleted topology member.
Reading the logs, if I have not missed anything, the scenario looks like:
After 5 reconnection attempts, the cluster connection bridge is stopped and the topology member is deleted
{noformat}
11:06:54,325 WARN [org.hornetq.core.server.cluster.impl.BridgeImpl] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 achieved 6 maxattempts=5 it will stop retrying to reconnect
11:06:54,326 DEBUG [org.hornetq.core.server.cluster.impl.ClusterConnectionBridge] (Thread-12 (HornetQ-server-HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060-1606257774)) Cluster Bridge sf.my-cluster.b6b46983-dfa5-11e1-b4d6-895e08425368 failed, permanently=true
...
11:06:54,331 DEBUG [org.hornetq.core.client.impl.Topology] (Thread-0 (HornetQ-client-global-threads-446181564)) removeMember Topology@5a6b54ef[owner=ClusterConnectionImpl@618055402[nodeUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-1, address=jms, server=HornetQServerImpl::serverUUID=b449f8bf-dfa5-11e1-9278-bf78e68d2060]] removing nodeID=b6b46983-dfa5-11e1-b4d6-895e08425368, result=null, size = 1: java.lang.Exception: trace
at org.hornetq.core.client.impl.Topology.removeMember(Topology.java:322) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
at org.hornetq.core.client.impl.ServerLocatorImpl.notifyNodeDown(ServerLocatorImpl.java:1360) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
at org.hornetq.core.client.impl.ClientSessionFactoryImpl$Channel0Handler$2.run(ClientSessionFactoryImpl.java:1507) [hornetq-core-2.2.16.Final-redhat-1.jar:2.2.16.Final (HQ_2_2_16_FINAL, 122)]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.6.0_22]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.6.0_22]
at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_22]
{noformat}
at this point, the TopologyMember is stored in topology's getMapDelete() with a value uniqueEventID > 0.
Later on, the cable is plugged and the node is discovered again by the discovery groups and ServerLocatoImpl#connectorsChanged() is called.
{noformat}
TopologyMember member = new TopologyMember(entry.getConnector(), null);
// on this case we set it as zero as any update coming from server should be accepted
topology.updateMember(0, entry.getNodeID(), member);
{noformat}
But the updateMember() call will fail because this topology member is known and stored in the getMapDelete() with a uniqueEventID > 0
{noformat}
11:16:41,575 DEBUG [org.hornetq.core.client.impl.Topology] (hornetq-discovery-group-thread-dg-group1) Update uniqueEvent=0, nodeId=b6b46983-dfa5-11e1-b4d6-895e08425368, memberInput=TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=192-168-10-4, b=null]] being rejected as there was a delete done after that
{noformat}
Since the topology member is not added back to the cluster, the cluster connection will not be started again.
Link: Added: This issue Cloned to JBPAPP-9647 Hornetq 2.2.19.Final has been build for EAP Based on comments in HORNETQ-1002 I'm closing jira. Does this need a release note for EAP 6.0.1? If so, the issue needs to be re-opened and the release notes flags set (Affects Release Notes, Not Yet Documented). If not, the issue needs to be re-opened and the release notes flag set to "Release notes not required". Updating the release notes fields. Release Notes Docs Status: Added: Needs More Info Writer: Added: sgilda Miroslav, Brian, or Jeff, does this need a release note? Affects: Added: Release Notes Sande, yes, we can add a release note about this Jeff, is this release note correct? Release Notes Text: Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later ree-stablished, the cluster node was not recreated and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. Added: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. Release Notes Text: Removed: When starting Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not started. The topology member is now created and added back to the cluster and the cluster connection is restarted. Added: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted. I couldn't have worded it better :) thanks Thanks Jeff! Release Notes Docs Status: Removed: Needs More Info Added: Documented as Resolved Issue Closing. Release Notes Docs Status: Removed: Documented as Resolved Issue Writer: Removed: sgilda Release Notes Text: Removed: When running Enterprise Application Platform 6 on two different machines in a HornetQ cluster, if the network connection between the machines was lost and later re-established, the cluster node was not created and the cluster connection was not restarted. The topology member is now created and added back to the cluster and the cluster connection is restarted. Docs QE Status: Removed: NEW |
Affects: Release Notes project_key: JBPAPP6 Test scenario: 1. Change standalone-full-ha.xml configuration: Add: {code}<reconnect-attempts>5</reconnect-attempts>{code} To: {code} <cluster-connections> <cluster-connection name="my-cluster"> ... <reconnect-attempts>5</reconnect-attempts> ... </cluster-connection> </cluster-connections> {code} 2. Start two EAP 6 servers in standalone-full-ha.xml profile on two different machines so they create a (HornetQ) cluster. 3. Disconnect network cable between those machines. 4. Wait until both of the machines stop core bridges. 5. Re-connect network cable Result: HornetQ cluster is not re-created. Only when one of the servers is restarted then cluster is established.