project_key: JBPAPP6 Test scenario: 1. Start two EAP 6 servers - mdb and jms server. mdb server is configured to connect through remote JCA to jms server. 2. Stop (ctrl-c) jms server 3. Try to stop (ctrl-c) mdb server Result: Mdb server hangs (I waited more than 10 minutes) - attached mdb-server-console.log (with thread dump during hang) Attached reproducer-shutdown.zip: 1. Download and unzip reproducer-shutdown.zip 2. Prepare servers - "sh prepare.sh" 3. Start jms server - "sh start-server1.sh localhost" 4. Start mdb server - "sh start-server2.sh <some_other_ip>" 5 Shutdown jms server 6. Try to shutdown mdb server Issue seems to be caused by setting <reconnect-attempts>-1</reconnect-attempts> in "hornetq-ra" connection factory: {code} <pooled-connection-factory name="hornetq-ra"> <transaction mode="xa"/> <reconnect-attempts>-1</reconnect-attempts> <connectors> <connector-ref connector-name="netty-remote"/> </connectors> <entries> <entry name="java:/JmsXA"/> </entries> </pooled-connection-factory> {code} When reconnect-attempts is set to 0 then server shutdowns immediately.
Attachment: Added: reproducer-shutdown.zip
Attachment: Added: mdb-server-console.log
This issue is listed as Major or below and as such is not targetted for the EAP 6.0.1 release, now that we are in Blocker or Critical issue only mode. Should this be reconsidered, please contact the EAP PM team.
Docs QE Status: Removed: NEW
Link: Added: This issue Cloned to JBPAPP6-1654
This issue is still valid with a little different scenario (step 5. is new) Steps to reproduce: 1. Download and unzip reproducer.zip from attachement. Next steps excexute in unzipped "reproducer" directory 2. run "sh prepare.sh" - dowloads EAP 6.1.0.DR4 - creates two directories server1 and server2 - copies directory jboss-eap-6.1 to server1 and server2 - copies configuration standalone-full-ha-jms.xml to server1 - copies configuration standalone-full-ha-mdb.xml to server2 - copies mdb1.jar to server2's deployments directory 3. start first (jms) server by "sh start-server1.sh localhost" 4. start second (mdb) server by "sh start-server2.sh <some_other_ip>" 5. start jms producer by "sh start-producer.sh localhost 1000" 6. shutdown first (jms) server by ctrl-c 7. try to shutdown second (mdb) server -> server hangs (threadump.txt attached)
Created attachment 699585 [details] reproducer.zip
Created attachment 699586 [details] thread dump from mdb server (EAP 6.1.0.DR4)
Can you try replacing the Jars from trunk? I believe this is fixed.
Server still hangs with trunk/master. Check what I did, please: - switched to master branch in git in HornetQ project: "git checkout master; git pull" - build hornetq jars by: "mvn -Prelease package" - copied built ./hornetq-ra/target/hornetq-ra-2.3.0.CR1.jar to ./server2/jboss-eap-6.1/modules/system/layers/base/org/hornetq/ra/main/hornetq-ra-2.3.0.CR1.jar - tried last test scenario (from comment 2013-02-19 13:34:49 EST)
Created attachment 699886 [details] threaddump-master.txt
Can you try with the latest CR2?
PR for the hornetq CR2 upgrade: https://github.com/jbossas/jboss-eap/pull/79
I can still hit this problem with HornetQ 2.3.0.CR2. Thread dump from mdb server attached (threaddump_hq230cr2.txt)
Created attachment 731118 [details] threaddump_hq230cr2.txt
@Miroslav: I"m not sure we should fix this... First, the use case is something really of an edge case.. you first shutdown one server, than the remote server. it's not even a developer's case. Second, that would break other cases that are more important because of this edge case. So, I would say this is a won't fix it.. you could even document the case if you wanted.. but this is also somewhat obvious... I think we should just close this as won't fix. The ristk of breaking other cases is too great... The proper fix here would be to change the session.close() to be ignored in case of a failover is in place.. and this could break other scenarios that are not considered as edgy as this one here.
I'm also afraid of regressions. Problem is that this is not such edge case it appears to be. We're testing this scenario because there were support tickets for it from our customers. Check comments in related jira from Jimmy Wilson and Shaun Appleton [1]. There is a high probability that we'll have to fix it anyway. [1] https://issues.jboss.org/browse/JBPAPP-10450?focusedCommentId=12737770&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12737770
This issue is related to: https://bugzilla.redhat.com/show_bug.cgi?id=877277
Fixed on https://github.com/FranciscoBorges/hornetq/tree/shutdownOnReconnect but I still need to verify it (although the fix is so simple that I am calling it "fixed")
@Francisco: I looked at your fix... Do we really to still close those sessions? AFAIK a connection.close() will close any session. (Maybe I am missing something on the Resource Adapter?)
the fix looks good BTW: simple change! which is great! thanks man!
No, we do not need those close sessions, I left them there for safety sake until I figured it out how to reproduce and verify this case. Fwiw, I just tried to verify and we are now hanging somewhere else, assuming I did everything correctly.
Ok, Miroslav Novak confirmed, that change got us ahead but the server still did not exit. I made a second change and pushed, after a while the server will exit. At least it did here for me. On Monday we try to do some more throughout verification.
A fix was merged. The commit is this one https://github.com/hornetq/hornetq/commit/6eb89a7288fc1f9a569641ee9058df004db24257
Cannot hit the problem with EAP 6.1.0.ER6 (HQ 2.3.0.Final). Great work, Francisco!