Description of problem: Sometimes there is IllegalStateException after failback from backup to live (in dedicated topology with replicated journal). It appears that backup server is stopped before destinations are unbound from JNDI which causes this error: ... 12:31:18,500 WARN [org.hornetq.core.server] (Thread-103) HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED enabled=true 12:31:18,500 WARN [org.hornetq.core.server] (Thread-103) HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED true ... 12:31:18,571 WARN [org.jboss.messaging] (ServerService Thread Pool -- 69) JBAS011603: Failed to destroy queue: DLQ: java.lang.IllegalStateException: Cannot access JMS Server, core server is not yet active at org.hornetq.jms.server.impl.JMSServerManagerImpl.checkInitialised(JMSServerManagerImpl.java:1657) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1] at org.hornetq.jms.server.impl.JMSServerManagerImpl.access$1100(JMSServerManagerImpl.java:108) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1] at org.hornetq.jms.server.impl.JMSServerManagerImpl$3.runException(JMSServerManagerImpl.java:820) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1] at org.hornetq.jms.server.impl.JMSServerManagerImpl.runAfterActive(JMSServerManagerImpl.java:1869) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1] at org.hornetq.jms.server.impl.JMSServerManagerImpl.removeQueueFromJNDI(JMSServerManagerImpl.java:809) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1] at org.jboss.as.messaging.jms.JMSQueueService$2.run(JMSQueueService.java:89) [jboss-as-messaging-7.5.0.Final-redhat-9.jar:7.5.0.Final-redhat-9] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_20] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_20] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_20] at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.1.Final-redhat-1.jar:2.1.1.Final-redhat-1] ... Steps to Reproduce: 1. Start 2 EAP 6.4.0.DR7 in dedicated topology with replicated journal 2. Start producer and consumer on queue 3. Kill "live" server 4. Wait for clients to failover and start "live" server again 5. Clients failback to live and backup stops itself Actual results: Sometimes there "IllegalStateExceptions" with Failed to destroy queue/topic in log of backup server. Expected results: No exceptions should be thrown. Additional info: Adding server.log from backup server.
Created attachment 952466 [details] server.log (backup)
To reproduce the problem follow those steps: clone our testsuite from git: git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git Go to eap-tests-hornetq/scripts and run groovy script PrepareServers.groovy with -DEAP_VERSION=6.4.0.DR7 parameter: groovy -DEAP_VERSION=6.4.0.DR7 PrepareServers.groovy (Script will prepare 4 servers - server1..4 in the directory where are you currently standing.) Export these paths to server directories + directory for shared journal and mcast addresse.: export JBOSS_HOME_1=$PWD/server1/jboss-eap export JBOSS_HOME_2=$PWD/server2/jboss-eap export JBOSS_HOME_3=$PWD/server3/jboss-eap export JBOSS_HOME_4=$PWD/server4/jboss-eap export MCAST_ADDR=235.3.4.5 And finally: go to jboss-hornetq-testsuite/ in our testsuite and run mvn clean test -Darquillian.xml=arquillian-4-nodes.xml -Peap6x -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackTransAckQueue Test does not fail! Only way to is to check server.log of server2 which is the replicated backup.
This looks like a classic race condition. When org.jboss.as.messaging.jms.JMSQueueService invokes org.hornetq.jms.server.impl.JMSServerManagerImpl.removeQueueFromJNDI the method checks to see if the broker is active (which it is). However, by the time it reaches the next check the broker isn't active anymore and so the exception is thrown. It looks like the JMSQueueService is working in its own thread while another thread has stopped the broker. I'm no expert on the messaging subsystem, but it seems to me these threads should coordinate with each other somehow to avoid this race.
PR: https://github.com/jbossas/jboss-eap/pull/2880
I am not seeing the exception anymore with EAP 6.4.13.CP.CR2. Verified.
Released with EAP 6.4.13 on Feb 02 2017.