HornetQ resource adapter in EAP 6.1.0.ER6 is not able to failover from live to backup. This is regression against EAP 6.1.0.ER5 caused by upgrade of HornetQ 2.3.0.CR2 to HornetQ 2.3.0.Final. This is common use case and this issue causes that customers won't be able to use HornetQ resource adapter for HA solution and can lead to outage of their service. Test scenario: 1. Start two EAP 6.1.0.ER6 servers (HQ 2.3.0.Final) in dedicated topology ("live" and "backup") with deployed InQueue and OutQueue. 2. Start producer which sends 1000 messages to InQueue 3. Start 3rd EAP 6.1.0.ER6 server ("mdb server") with deployed MDB reading messages from InQueue and sending to OutQueue. Resource adapter is configured to connect to live server. 4. When MDB is processing messages, kill "live" server (kill -9 ...) 5. Start consumer receiving messages from InQueue and check if number of messages is equal to send messages (from step 2.) Result: Backup came alive but HQ resource adapter on "mdb" server did not failover to "backup". There are lots of exceptions in "mdb" server log like: 11:37:08,257 WARN [com.arjuna.ats.jta] (Thread-11 (HornetQ-client-global-threads-1465287450)) ARJUNA016086: TransactionImple.enlistResource setTransactionTimeout on XAResource < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a2203bf:-38dfe4e:51823321:929, node_name=1, branch_uid=0:ffff0a2203bf:-38dfe4e:51823321:9c8, subordinatenodename=null, eis_name=java:/JmsXA > threw: XAException.XAER_RMERR: javax.transaction.xa.XAException at org.hornetq.core.client.impl.ClientSessionImpl.setTransactionTimeout(ClientSessionImpl.java:1696) [hornetq-core-client-2.3.0.Final-redhat-1.jar:2.3.0.Final-redhat-1] at org.hornetq.ra.HornetQRAXAResource.setTransactionTimeout(HornetQRAXAResource.java:249) at org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.setTransactionTimeout(XAResourceWrapperImpl.java:182) at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.enlistResource(TransactionImple.java:611) at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.enlistResource(TransactionImple.java:397) at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.enlist(TxConnectionListener.java:607) at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:265) at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:467) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:599) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:467) at org.hornetq.ra.HornetQRASessionFactoryImpl.allocateConnection(HornetQRASessionFactoryImpl.java:832) at org.hornetq.ra.HornetQRASessionFactoryImpl.createSession(HornetQRASessionFactoryImpl.java:465) at org.jboss.qa.hornetq.apps.mdb.MdbWithRemoteOutQueueToContaniner1.onMessage(MdbWithRemoteOutQueueToContaniner1.java:68) [mdb1.jar:] at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) [:1.7.0_15] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_15] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_15] Steps to use reproducer (attached reproducer.zip) - following commands run in reproducer-mdb-failover directory: 1. Run "sh prepare.sh" - downloads and prepares all servers 2. Start "live" server - "sh start-server1.sh <ip_live_server>" 3. Start "backup" server - "sh start-server2.sh <ip_backup_server>" 4. Start producer which sends messages to "live" server to jms/queue/InQueue - "sh start-producer.sh <ip_live_server> 1000" (parameters: IP address, number of messages) 5. Start "mdb" server - "sh start-server3.sh <ip_mdb_server> <ip_live_server> <ip_backup_server>" (parameters: IP address of "mdb" server, IP address of "live" server, IP address of "backup" server) 6. While MDB is processing messages kill "mdb" server by kill -9 <process_id_mdb_server> (use "jps -m" to get its process id) Link to mdb sources: http://git.app.eng.bos.redhat.com/?p=jbossqe/eap-tests-hornetq.git;a=blob;f=jboss-hornetq-testsuite/src/test/java/org/jboss/qa/hornetq/apps/mdb/MdbWithRemoteOutQueueToContaniner1.java;h=4b0bdbce733fe454f1108673b026c70693e34e1a;hb=HEAD
Created attachment 742645 [details] reproducer.zip
Making it a blocker. Clebert?
Yes.. it's a blocker unfortunately. We did a last change for this issue here: https://bugzilla.redhat.com/show_bug.cgi?id=901137 With this commit: https://github.com/hornetq/hornetq/commit/6eb89a7288fc1f9a569641ee9058df004db24257 The issue is on the outbound connection. probably related to mayAttemptToFailover disabled. I'm asking Francisco to take a look on this.. we will have it fixed soon. Sorry about this... a small last change on BZ901137 caused this.
Link to git repo with MDB is: git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git I'm attaching also the source (MdbWithRemoteOutQueueToContaniner1.java)
Created attachment 743034 [details] MdbWithRemoteOutQueueToContaniner1.java
*** Bug 959196 has been marked as a duplicate of this bug. ***
Francisco has already sent a PR on HQ's repo
For reference - https://github.com/hornetq/hornetq/commit/b8d664d0ea8246cc4f4421783b4910cbfc195535
PR to upgrade HQ to 2.3.1.Final - https://github.com/jbossas/jboss-eap/pull/137
Verified in EAP 6.1.0.ER7.