Bug 963215 - JMS bridge does not failover
JMS bridge does not failover
Status: CLOSED CURRENTRELEASE
Product: JBoss Enterprise Application Platform 6
Classification: JBoss
Component: HornetQ (Show other bugs)
6.1.0
Unspecified Unspecified
unspecified Severity medium
: ER7
: EAP 6.1.1
Assigned To: Yong Hao Gao
:
Depends On:
Blocks: 994214
  Show dependency treegraph
 
Reported: 2013-05-15 08:22 EDT by Miroslav Novak
Modified: 2013-09-16 16:28 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-16 16:28:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
configuration.zip (15.51 KB, application/zip)
2013-05-15 08:23 EDT, Miroslav Novak
no flags Details
the configuration of my bridge server (21.68 KB, text/plain)
2013-06-27 06:50 EDT, Yong Hao Gao
no flags Details
server_with_bridge.log (7.03 MB, text/x-log)
2013-08-16 08:15 EDT, Miroslav Novak
no flags Details
reproducer.zip (55.44 KB, application/zip)
2013-08-16 10:03 EDT, Miroslav Novak
no flags Details

  None (edit)
Description Miroslav Novak 2013-05-15 08:22:26 EDT
Test scenario:
1. Start two EAP 6.1.0 servers - live/backup pair in dedicated topology with shared store
2. Start third EAP 6.1.0 server with deployed JMS bridge
   - bridge sends messages from jms/queue/InQueue (on this/third server) to jms/queue/OutQueue to live/first server
3. Kill live server and check whether bridge reconnected to backup

Result:
After step 3. following appears in console log on 3rd server:

13:56:35,566 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://192.168.40.3:9990
13:56:35,567 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015874: JBoss EAP 6.1.0.GA (AS 7.2.0.Final-redhat-8) started in 3379ms - Started 149 of 276 services (126 services are passive or on-demand)
14:00:40,092 WARN  [org.hornetq.jms.server] (Thread-3 (HornetQ-client-global-threads-1885690219)) HQ122014: Notified of connection failure in xa recovery connectionFactory for provider ClientSessionFactoryImpl [serverLocator=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=192-168-40-1, TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=192-168-40-2], discoveryGroupConfiguration=null], connectorConfig=TransportConfiguration(name=netty, factory=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&host=192-168-40-1, backupConfig=null] will attempt reconnect on next pass: HornetQException[errorType=NOT_CONNECTED message=HQ119006: Channel disconnected]
	at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:418) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:882) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]

14:00:40,092 WARN  [org.hornetq.jms.server] (Thread-8 (HornetQ-client-global-threads-1885690219)) Notified of connection failure in xa discovery, we will retry on the next recovery: HornetQException[errorType=NOT_CONNECTED message=HQ119006: Channel disconnected]
	at org.hornetq.core.client.impl.ClientSessionFactoryImpl.connectionDestroyed(ClientSessionFactoryImpl.java:418) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.core.remoting.impl.netty.NettyConnector$Listener$1.run(NettyConnector.java:882) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]

14:00:41,115 WARN  [org.hornetq.jms.server] (Thread-92) HQ122006: Detected failure on bridge connection
14:00:41,243 WARN  [org.hornetq.jms.server] (pool-4-thread-3) HQ122004: JMS Bridge Will retry after a pause of 1,000 ms

14:00:47,251 WARN  [org.hornetq.jms.server] (pool-4-thread-3) HQ122010: Failed to connect JMS Bridge: javax.naming.NamingException: Failed to connect to any server. Servers tried: [remote://192.168.40.1:4447]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.failOverSequence(HaRemoteNamingStore.java:213) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingStore(HaRemoteNamingStore.java:144) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingOperation(HaRemoteNamingStore.java:125) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.lookup(HaRemoteNamingStore.java:241) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:79) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:83) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at javax.naming.InitialContext.lookup(InitialContext.java:411) [rt.jar:1.7.0_15]
	at org.hornetq.jms.bridge.impl.JNDIFactorySupport.createObject(JNDIFactorySupport.java:58) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JNDIDestinationFactory.createDestination(JNDIDestinationFactory.java:40) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjects(JMSBridgeImpl.java:1144) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjectsWithRetry(JMSBridgeImpl.java:1375) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.access$2000(JMSBridgeImpl.java:75) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl$FailureHandler.run(JMSBridgeImpl.java:1896) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:2.3.1.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_15]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_15]

Configuration of JMS bridge:
            <jms-bridge name="myBridge">
                <source>
                    <connection-factory name="jms/RemoteConnectionFactory"/>
                    <destination name="jms/queue/InQueue"/>
                </source>
                <target>
                    <connection-factory name="jms/RemoteConnectionFactory"/>
                    <destination name="jms/queue/OutQueue"/>
                    <context>
                        <property key="java.naming.factory.initial" value="org.jboss.naming.remote.client.InitialContextFactory"/>
                        <property key="java.naming.provider.url" value="remote://192.168.40.1:4447"/>
                    </context>
                </target>
                <quality-of-service>AT_MOST_ONCE</quality-of-service>
                <failure-retry-interval>1000</failure-retry-interval>
                <max-retries>-1</max-retries>
                <max-batch-size>10</max-batch-size>
                <max-batch-time>100</max-batch-time>
                <add-messageID-in-header>true</add-messageID-in-header>
            </jms-bridge>

Configuration of all servers attached - configuration.zip
Comment 1 Miroslav Novak 2013-05-15 08:23:29 EDT
Created attachment 748245 [details]
configuration.zip
Comment 2 Yong Hao Gao 2013-06-13 06:59:41 EDT
Hi Mirek,

I think this is a config issue. Your <source> Connection Factory and <target> one points to one same factory, which means the <source> and <target> are in the same server. According to your description, the <target> is at server live, and <source> is at server 'jms-bridge'. Look at the server config the CF points to localhost, which means the 'jms-bridge' server. 

Can you correct this configuration and try again?

Thanks,
Howard
Comment 3 Miroslav Novak 2013-06-20 09:31:18 EDT
Hi Howard,

sorry for late reply. It took me some time to get to this after PTO. I believe the configuration is ok. There is:

<jms-bridge name="myBridge">
                <source>
                    <connection-factory name="jms/RemoteConnectionFactory"/>
                    <destination name="jms/queue/InQueue"/>
                </source>
                <target>
                    <connection-factory name="jms/RemoteConnectionFactory"/>
                    <destination name="jms/queue/OutQueue"/>
                    <context>
                        <property key="java.naming.factory.initial" value="org.jboss.naming.remote.client.InitialContextFactory"/>
                        <property key="java.naming.provider.url" value="remote://192.168.40.1:4447"/>
                    </context>
                </target>
...
</jms-bridge>

the "target CF" is looked up on live server (IP 192.168.40.1). Or is there something I missed?

Thanks,
Mirek
Comment 4 Yong Hao Gao 2013-06-27 06:48:51 EDT
Hi Mirek,

I need some help setting up the test. I run the live backup and 3rd server on one machine. The live and backup started successfully, but when I start the 3rd one, which has the jms bridge deployed I got exception like :

18:42:41,846 INFO  [org.jboss.as.remoting] (MSC service thread 1-1) JBAS017100: Listening on 127.0.0.1:10019
18:42:41,846 INFO  [org.jboss.as.remoting] (MSC service thread 1-2) JBAS017100: Listening on 127.0.0.1:4467
18:42:41,851 INFO  [org.jboss.as.server.deployment.scanner] (MSC service thread 1-6) JBAS015012: Started FileSystemDeploymentService for directory /home/howard/Downloads/EAP6/jboss-eap-6.1-standalone/standalone/deployments
18:42:42,390 INFO  [org.hornetq.core.server] (MSC service thread 1-12) HQ221020: Started Netty Acceptor version 3.6.6.Final-90e1eb2 127.0.0.1:5475 for CORE protocol
18:42:42,392 INFO  [org.hornetq.core.server] (MSC service thread 1-12) HQ221020: Started Netty Acceptor version 3.6.6.Final-90e1eb2 127.0.0.1:5465 for CORE protocol
18:42:42,395 INFO  [org.hornetq.core.server] (MSC service thread 1-12) HQ221007: Server is now live
18:42:42,395 INFO  [org.hornetq.core.server] (MSC service thread 1-12) HQ221001: HornetQ Server version 2.3.2.Snapshot (Wild Hornet, 123) [b92b01f6-ded5-11e2-a4f8-5b342e5ac970] 
18:42:42,426 INFO  [org.jboss.as.messaging] (ServerService Thread Pool -- 59) JBAS011601: Bound messaging object to jndi name java:jboss/exported/jms/RemoteConnectionFactory
18:42:42,428 INFO  [org.hornetq.core.server] (ServerService Thread Pool -- 61) HQ221003: trying to deploy queue jms.queue.queue/InQueue
18:42:42,442 INFO  [org.jboss.as.messaging] (ServerService Thread Pool -- 61) JBAS011601: Bound messaging object to jndi name java:/queue/InQueue
18:42:42,442 INFO  [org.jboss.as.messaging] (ServerService Thread Pool -- 61) JBAS011601: Bound messaging object to jndi name java:jboss/exported/jms/queue/InQueue
18:42:42,444 INFO  [org.jboss.as.messaging] (ServerService Thread Pool -- 60) JBAS011601: Bound messaging object to jndi name java:/ConnectionFactory
18:42:42,490 INFO  [org.jboss.as.connector.deployment] (MSC service thread 1-4) JBAS010406: Registered connection factory java:/JmsXA
18:42:42,529 INFO  [org.hornetq.ra] (MSC service thread 1-4) HornetQ resource adaptor started
18:42:42,529 INFO  [org.jboss.as.connector.services.resourceadapters.ResourceAdapterActivatorService$ResourceAdapterActivator] (MSC service thread 1-4) IJ020002: Deployed: file://RaActivatorhornetq-ra
18:42:42,532 INFO  [org.jboss.as.connector.deployment] (MSC service thread 1-12) JBAS010401: Bound JCA ConnectionFactory [java:/JmsXA]
18:42:46,764 WARN  [org.hornetq.jms.server] (ServerService Thread Pool -- 58) HQ122010: Failed to connect JMS Bridge: javax.naming.NamingException: Failed to connect to any server. Servers tried: [remote://127.0.0.1:4467]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.failOverSequence(HaRemoteNamingStore.java:213) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingStore(HaRemoteNamingStore.java:144) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingOperation(HaRemoteNamingStore.java:125) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.lookup(HaRemoteNamingStore.java:241) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:79) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:83) [jboss-remote-naming-1.0.6.Final-redhat-2.jar:1.0.6.Final-redhat-2]
	at javax.naming.InitialContext.lookup(InitialContext.java:411) [rt.jar:1.7.0_06]
	at org.hornetq.jms.bridge.impl.JNDIFactorySupport.createObject(JNDIFactorySupport.java:63) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:]
	at org.hornetq.jms.bridge.impl.JNDIDestinationFactory.createDestination(JNDIDestinationFactory.java:40) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjects(JMSBridgeImpl.java:1142) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.start(JMSBridgeImpl.java:353) [hornetq-jms-server-2.3.1.Final-redhat-1.jar:]
	at org.jboss.as.messaging.jms.bridge.JMSBridgeService.startBridge(JMSBridgeService.java:92) [jboss-as-messaging-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
	at org.jboss.as.messaging.jms.bridge.JMSBridgeService$1.run(JMSBridgeService.java:78) [jboss-as-messaging-7.2.0.Final-redhat-8.jar:7.2.0.Final-redhat-8]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_06]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_06]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_06]
	at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.0.Final-redhat-1.jar:2.1.0.Final-redhat-1]

So the bridge couldn't start successfully. I don't know what's wrong with my config, can you help take a look at it? Here is my config:

                <jms-bridge name="myBridge">
                   <source>
                      <connection-factory name="jms/RemoteConnectionFactory"/>
                      <destination name="jms/queue/InQueue"/>
                      <context>
                          <property key="java.naming.factory.initial" value="org.jboss.naming.remote.client.InitialContextFactory"/>
                          <property key="java.naming.provider.url" value="remote://127.0.0.1:4467"/>
                      </context>
                   </source>
                   <target>
                      <connection-factory name="jms/RemoteConnectionFactory"/>
                      <destination name="jms/queue/OutQueue"/>
                      <context>
                          <property key="java.naming.factory.initial" value="org.jboss.naming.remote.client.InitialContextFactory"/>
                          <property key="java.naming.provider.url" value="remote://localhost:4447"/>
                      </context>
                   </target>
                   <quality-of-service>AT_MOST_ONCE</quality-of-service>
                   <failure-retry-interval>1000</failure-retry-interval>
                   <max-retries>-1</max-retries>
                   <max-batch-size>10</max-batch-size>
                   <max-batch-time>100</max-batch-time>
                   <add-messageID-in-header>true</add-messageID-in-header>
                </jms-bridge>

You can see from the log the server is listening to port 4467, but yet it failed. I'll attach the full configuration file.

Thanks
Howard
Comment 5 Yong Hao Gao 2013-06-27 06:50:28 EDT
Created attachment 766056 [details]
the configuration of my bridge server
Comment 6 Miroslav Novak 2013-06-27 07:20:00 EDT
Hi Howard,

I can see that there is:
                   <source>
                      <connection-factory name="jms/RemoteConnectionFactory"/>
                      <destination name="jms/queue/InQueue"/>
                      <context>
                          <property key="java.naming.factory.initial" value="org.jboss.naming.remote.client.InitialContextFactory"/>
                          <property key="java.naming.provider.url" value="remote://127.0.0.1:4467"/>
                      </context>
                   </source>

could you try to remove <context>...</context> section, please?

Source queue and connection factory is on 3rd bridge server so it should make in-vm jndi lookup.

Also I guess you're not using port-offset for 3rd/bridge server. There is bad remoting port 4467 in <source><context>...part. It should be 4447 but as I mentioned, look up should work without <context> for <source>.

Thanks,

Mirek
Comment 7 Yong Hao Gao 2013-06-27 22:31:06 EDT
Thanks Mirek, like you pointed out, the problem is gone after I removed <context> from it. 

Now I have some idea about what's happening here. When live is shutdown the bridge connection actually failed over. but still the exception listener in the bridge is called, which triggered the retry operation of the bridge. Each retry will start with looking up the connection factories and destinations whose JNDI context points to the live (which is dead now).

I'm trying to come up with a solution to this. My idea is that when the exception listener gets called, the bridge tries to wait for the failover to happen and base on whether failover happened or not, or failover is successful or failed, to initiate retry or not.

Howard
Comment 9 Dimitris Andreadis 2013-07-31 10:57:30 EDT
BTW, Howard is the correct owner and he is still working on it.
Comment 10 Clebert Suconic 2013-08-02 18:09:08 EDT
I've been working with Howard on this. I believe we have found a solution now. we will perform some final tests on monday.
Comment 13 Clebert Suconic 2013-08-07 09:35:45 EDT
PR sent and merged on hornetq:

https://github.com/hornetq/hornetq/pull/1200


this will be PRed through BZ994214 (https://bugzilla.redhat.com/show_bug.cgi?id=994214)
Comment 14 Miroslav Novak 2013-08-16 08:05:50 EDT
Reopening. Failover of JMS bridges is still not working in EAP 6.1.1.ER6.
Comment 15 Miroslav Novak 2013-08-16 08:15:26 EDT
Created attachment 787229 [details]
server_with_bridge.log
Comment 16 Miroslav Novak 2013-08-16 10:02:13 EDT
There is a reproducer (attached):
1. Download and unzip reprodurcer - next steps run in extracted directory
2. run - sh prepare.sh
3. Start live/backup pair
	- sh start-server1.sh 127.0.0.1
	- sh start-server2.sh <backup_ip>
4. Start third server with deployed bridge
	- sh start-server3.sh <bridge_ip>
5. Start producer
	-  sh start-producer.sh <bridge_ip> jms/queue/InQueue 100000
6. Kill live server
	- kill -9 ... (jps -l to get pid of server3 with deployed bridge)
7. Start consumer on backup server
	- sh start-consumer.sh <backup_ip> jms/queue/OutQueue
Comment 17 Miroslav Novak 2013-08-16 10:03:01 EDT
Created attachment 787274 [details]
reproducer.zip
Comment 18 Clebert Suconic 2013-08-16 12:41:55 EDT
What is not working? messaging not received? or the actual failover?

You need recovery from the TM to recover any messages and there is not such thing as recovery from the TM. You will need to solve issues manually on the XIDs or restart the previous node back to recover messages.
Comment 19 Clebert Suconic 2013-08-16 19:21:56 EDT
I see what happened. Howard didn't fix one of the cases on the bridge and I missed the verification on AT_MOST_ONCE


I have a fix here and we will make some tests.

Is Monday morning good for a release? is there a rush over the weekend? if there is I can do it probably by Sunday Morning.. otherwise it would be monday morning since I will get Howard to help me on this one as well.
Comment 20 Yong Hao Gao 2013-08-19 01:10:32 EDT
Hi Mirek,

Where can I download the EAP 6.1.1.ER6? Can this test run on one machine?


Thanks
Howard
Comment 22 Yong Hao Gao 2013-08-19 03:33:21 EDT
Thanks Carlo.
Comment 23 Miroslav Novak 2013-08-19 04:05:39 EDT
Hi Howard,

reproducer will prepare everything for you (it also downloads EAP 6.1.1.ER6) and it works on one machine. It's necessary to be on vpn. 

As Clebert mentioned there is not working quality-of-service AT_MOST_ONCE and DUPLICATES_OK. Quality-of-service ONCE_AND_ONLY_ONCE appears to be ok.

Thanks,
Mirek
Comment 24 Yong Hao Gao 2013-08-19 04:20:15 EDT
Thanks Mirek. I'll try myself. However I didn't find relevant exception or error messages regarding the issue in the uploaded log file, can you point me at which step the test fails?

Howard
Comment 25 Miroslav Novak 2013-08-19 04:46:24 EDT
Step 6. in reproducer shows the problem. I see I did a typo in parenthesis. It should be:
6. Kill live server
	- kill -9 ... (jps -l to get pid of server1)

The problem is that after kill of live server the JMS bridge is not able to failover to backup and is trying to reconnect indefinitely. Server with deployed bridge repeatedly logs:

10:37:06,798 INFO  [org.jboss.messaging] (ServerService Thread Pool -- 64) JBAS011610: Started JMS Bridge myBridge
10:37:06,798 WARN  [org.hornetq.jms.server] (pool-4-thread-1) HQ122004: JMS Bridge Will retry after a pause of 1,000 ms
10:37:06,980 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015961: Http management interface listening on http://192.168.40.3:9990/management
10:37:06,981 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://192.168.40.3:9990
10:37:06,981 INFO  [org.jboss.as] (Controller Boot Thread) JBAS015874: JBoss EAP 6.1.1.GA (AS 7.2.1.Final-redhat-9) started in 9818ms - Started 149 of 272 services (122 services are passive or on-demand)
10:37:07,802 DEBUG [org.xnio.nio] (Remoting "config-based-naming-client-endpoint" read-1) Started channel thread 'Remoting "config-based-naming-client-endpoint" read-1', selector sun.nio.ch.EPollSelectorImpl@2376170d
10:37:07,803 DEBUG [org.xnio.nio] (Remoting "config-based-naming-client-endpoint" write-1) Started channel thread 'Remoting "config-based-naming-client-endpoint" write-1', selector sun.nio.ch.EPollSelectorImpl@3cecc1e1
10:37:12,811 WARN  [org.hornetq.jms.server] (pool-4-thread-1) HQ122010: Failed to connect JMS Bridge: javax.naming.NamingException: Failed to connect to any server. Servers tried: [remote://127.0.0.1:4447]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.failOverSequence(HaRemoteNamingStore.java:213) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingStore(HaRemoteNamingStore.java:144) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingOperation(HaRemoteNamingStore.java:125) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.lookup(HaRemoteNamingStore.java:241) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:79) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:83) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at javax.naming.InitialContext.lookup(InitialContext.java:392) [rt.jar:1.6.0_38]
	at org.hornetq.jms.bridge.impl.JNDIFactorySupport.createObject(JNDIFactorySupport.java:58) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JNDIDestinationFactory.createDestination(JNDIDestinationFactory.java:40) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjects(JMSBridgeImpl.java:1223) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjectsWithRetry(JMSBridgeImpl.java:1459) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.access$2000(JMSBridgeImpl.java:82) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl$FailureHandler.run(JMSBridgeImpl.java:2003) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]

10:37:12,813 INFO  [org.hornetq.jms.server] (pool-4-thread-1) HQ121000: Failed to set up JMS bridge connections. Most probably the source or target servers are unavailable. Will retry after a pause of 1,000 ms
10:37:13,817 DEBUG [org.xnio.nio] (Remoting "config-based-naming-client-endpoint" read-1) Started channel thread 'Remoting "config-based-naming-client-endpoint" read-1', selector sun.nio.ch.EPollSelectorImpl@5810f29f
10:37:13,817 DEBUG [org.xnio.nio] (Remoting "config-based-naming-client-endpoint" write-1) Started channel thread 'Remoting "config-based-naming-client-endpoint" write-1', selector sun.nio.ch.EPollSelectorImpl@b6bf0e6
10:37:18,826 WARN  [org.hornetq.jms.server] (pool-4-thread-1) HQ122010: Failed to connect JMS Bridge: javax.naming.NamingException: Failed to connect to any server. Servers tried: [remote://127.0.0.1:4447]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.failOverSequence(HaRemoteNamingStore.java:213) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingStore(HaRemoteNamingStore.java:144) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.namingOperation(HaRemoteNamingStore.java:125) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.HaRemoteNamingStore.lookup(HaRemoteNamingStore.java:241) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:79) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at org.jboss.naming.remote.client.RemoteContext.lookup(RemoteContext.java:83) [jboss-remote-naming-1.0.7.Final-redhat-1.jar:1.0.7.Final-redhat-1]
	at javax.naming.InitialContext.lookup(InitialContext.java:392) [rt.jar:1.6.0_38]
	at org.hornetq.jms.bridge.impl.JNDIFactorySupport.createObject(JNDIFactorySupport.java:58) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JNDIDestinationFactory.createDestination(JNDIDestinationFactory.java:40) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjects(JMSBridgeImpl.java:1223) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.setupJMSObjectsWithRetry(JMSBridgeImpl.java:1459) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl.access$2000(JMSBridgeImpl.java:82) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at org.hornetq.jms.bridge.impl.JMSBridgeImpl$FailureHandler.run(JMSBridgeImpl.java:2003) [hornetq-jms-server-2.3.4.Final-redhat-1.jar:2.3.4.Final-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_38]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_38]
	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_38]

10:37:18,827 INFO  [org.hornetq.jms.server] (pool-4-thread-1) HQ121000: Failed to set up JMS bridge connections. Most probably the source or target servers are unavailable. Will retry after a pause of 1,000 ms
10:37:19,831 DEBUG [org.xnio.nio] (Remoting "config-based-naming-client-endpoint" read-1) Started channel thread 'Remoting "config-based-naming-client-endpoint" read-1', selector sun.nio.ch.EPollSelectorImpl@2d32e1b6
Comment 26 Yong Hao Gao 2013-08-19 07:43:41 EDT
Thanks Mirek.

Howard
Comment 27 Yong Hao Gao 2013-08-19 07:54:20 EDT
Hi Mirek,

I'm having problem running server2. When I do

sh start-server2.sh 127.0.0.2

I got the following error log:

19:49:46,220 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-5) MSC000001: Failed to start service jboss.network.unsecure: org.jboss.msc.service.StartException in service jboss.network.unsecure: JBAS015810: failed to resolve interface unsecure
	at org.jboss.as.server.services.net.NetworkInterfaceService.start(NetworkInterfaceService.java:97) [jboss-as-server-7.2.1.Final-redhat-9.jar:7.2.1.Final-redhat-9]
	at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1811) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
	at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1746) [jboss-msc-1.0.4.GA-redhat-1.jar:1.0.4.GA-redhat-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_06]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_06]
	at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_06]

Please help.

Thanks
Howard
Comment 28 Miroslav Novak 2013-08-19 08:31:30 EDT
I've sent mail to Howard:

the trick with 127.0.0.2 is not working for EAP 6. It's necessary to create IP aliases for this purpose:

ifconfig eth0:1 192.168.1.2 up
ifconfig eth0:2 192.168.1.3 up

eth0 is the name of my network interface when I type "ifconfig" command.

Bound the servers to those IP aliases. Server1 must be bound to 127.0.0.1 because server3 has hardcoded 127.0.0.1 for jndi lookup of target connection factory.
Comment 32 Miroslav Novak 2013-08-19 09:12:01 EDT
As Clebert mentioned there is fix ready for this issue. I'll test it as soon as it gets to me. Most likely in the next test cycle.
Comment 33 Miroslav Novak 2013-08-21 12:01:31 EDT
Setting as Verified in EAP 6.1.1.ER7(HQ 2.3.5). Bridge can failover from live->backup now. Thanks to all involved.

Note You need to log in before you can comment on or make changes to this bug.