Bug 959789

Summary: HQ core bridge does not failover
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Clebert Suconic <csuconic>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: anmiller, ataylor, cdewolf, csuconic, dandread, lcosti, myarboro, nziakova, rdickens
Target Milestone: ER5   
Target Release: EAP 6.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In previous versions of JBoss EAP 6, a HornetQ core bridge server would not properly failover to a backup HornetQ server when the primary HornetQ server became unavailable. This issue occurred because the HornetQ core bridge server would attempt to reconnect to any other server node, rather than the correct backup HornetQ server. This issue has been fixed in this release of JBoss EAP 6, and a HornetQ core bridge server will now always retry to connect to the backup HornetQ server when the primary HornetQ server becomes unavailable.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-16 20:27:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 994214    

Description Miroslav Novak 2013-05-05 18:42:22 UTC
HornetQ core bridge does not failover from live to backup. I wrote this test during trying to verify bz#900764. But this appears to be a little different scenario.

Test scenario:
1. Start two EAP 6.1.0.ER6 servers - HornetQ live/backup pair with deployed OutQueue
2. Start third EAP 6.1.0.ER6 which has deployed HQ core bridge and InQueue:
                <bridges>
                    <bridge name="myBridge">
                        <queue-name>jms.queue.InQueue</queue-name>
                        <forwarding-address>jms.queue.OutQueue</forwarding-address>
                        <ha>true</ha>
                        <reconnect-attempts>-1</reconnect-attempts>
                        <use-duplicate-detection>true</use-duplicate-detection>
                           <discovery-group-ref discovery-group-name="dg-group1"/>
                    </bridge>
                </bridges>
3. Start producer which sends messages to InQueue to third server
4. Start consumer which reads messages from OutQueue from first live server
5. Kill first live server
6. Check whether consumer from step 4. is still receiving messages from OutQueue. This will verify that HQ core bridge and consumer failovered to backup.

Result:
After step 6. consumer failovered to backup but can't read any more messages. HQ core bridge did not failover.

Console log from third server:
20:38:47,765 INFO  [org.hornetq.core.server] (Thread-2 (HornetQ-server-HornetQServerImpl::serverUUID=918186ea-b5b1-11e2-81e1-77319a9992e3-2075307577)) HQ221027: Bridge BridgeImpl@1a349871 [name=myBridge, queue=QueueImpl[name=jms.queue.InQueue, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=918186ea-b5b1-11e2-81e1-77319a9992e3]]@73043027 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=192-168-40-1], discoveryGroupConfiguration=DiscoveryGroupConfiguration{name='dg-group1', refreshTimeout=10000, discoveryInitialWaitTimeout=10000}]] is connected
20:39:29,976 WARN  [org.hornetq.core.server] (Thread-1 (HornetQ-client-global-threads-824748279)) HQ222095: Connection failed with failedOver=false: HornetQException[errorType=INTERNAL_ERROR message=HQ119005: Exception in Netty transport]
	at org.hornetq.core.remoting.impl.netty.HornetQChannelHandler.exceptionCaught(HornetQChannelHandler.java:107) [hornetq-core-client-2.3.0.Final-redhat-1.jar:2.3.0.Final-redhat-1]
	at org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:130) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.exceptionCaught(SimpleChannelUpstreamHandler.java:153) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:77) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.util.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:175) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:189) [rt.jar:1.7.0_15]
	at java.net.SocketInputStream.read(SocketInputStream.java:121) [rt.jar:1.7.0_15]
	at java.net.SocketInputStream.read(SocketInputStream.java:203) [rt.jar:1.7.0_15]
	at java.io.FilterInputStream.read(FilterInputStream.java:83) [rt.jar:1.7.0_15]
	at java.io.PushbackInputStream.read(PushbackInputStream.java:139) [rt.jar:1.7.0_15]
	at org.jboss.netty.channel.socket.oio.OioWorker.process(OioWorker.java:64) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	at org.jboss.netty.channel.socket.oio.OioWorker.run(OioWorker.java:51) [netty-3.6.2.Final-redhat-1.jar:3.6.2.Final-redhat-1]
	... 4 more

20:39:29,985 WARN  [org.hornetq.jms.server] (Thread-3 (HornetQ-client-global-threads-824748279)) Notified of connection failure in xa discovery, we will retry on the next recovery: HornetQException[errorType=NOT_CONNECTED message=HQ119006: Channel disconnected]
	at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:418) [hornetq-core-client-2.3.0.Final-redhat-1.jar:2.3.0.Final-redhat-1]
	at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:882) [hornetq-core-client-2.3.0.Final-redhat-1.jar:2.3.0.Final-redhat-1]
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.0.Final-redhat-1.jar:2.3.0.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]

Comment 1 Clebert Suconic 2013-05-29 23:59:31 UTC
https://github.com/hornetq/hornetq/pull/1099

Comment 2 Dimitris Andreadis 2013-08-01 11:29:34 UTC
Which HQ release this fix will be part of so it can be included in EAP 6.1.1?

https://issues.jboss.org/browse/HORNETQ-1218 points to 2.4.0.Alpha2

Comment 3 Francisco Borges 2013-08-01 11:43:40 UTC
(In reply to Dimitris Andreadis from comment #2)
> Which HQ release this fix will be part of so it can be included in EAP 6.1.1?
> 
> https://issues.jboss.org/browse/HORNETQ-1218 points to 2.4.0.Alpha2

Indeed you have a point. 

If you look at the "Source" tab of https://issues.jboss.org/browse/HORNETQ-1218 you'll notice commits addressing the issue applied to 2.2.eap5 2.2.x and master but NOT to 2.3.x (which is the branch from which we will create 2.3.3). I'll ask Clebert to verify why there was no commit to 2.3.x.

Comment 4 Clebert Suconic 2013-08-01 13:30:36 UTC
You're right.. I made a mistake here... I just cherry picked it and submitted a PR

Comment 7 Miroslav Novak 2013-08-16 12:56:25 UTC
Failover of HornetQ core bridge is ok. Verified in EAP 6.1.1.ER6. Nice work!