Bug 1159290

Summary: [GSS](6.4.z) JBAS011603: Failed to destroy queue: DLQ: java.lang.IllegalStateException: Cannot access JMS Server, core server is not yet active...
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: JMSAssignee: Petr Jurak <pjurak>
Status: CLOSED CURRENTRELEASE QA Contact: Peter Mackay <pmackay>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4.0CC: ataylor, bmaxwell, csuconic, jbertram, msochure, pjurak, pmackay, rstancel, sappleto
Target Milestone: CR1   
Target Release: EAP 6.4.13   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-03 16:43:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1386335, 1387698, 1390788    
Attachments:
Description Flags
server.log (backup) none

Description Miroslav Novak 2014-10-31 11:57:28 UTC
Description of problem:

Sometimes there is IllegalStateException after failback from backup to live (in dedicated topology with replicated journal). It appears that backup server is stopped before destinations are unbound from JNDI which causes this error:
...
12:31:18,500 WARN  [org.hornetq.core.server] (Thread-103) HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED enabled=true
12:31:18,500 WARN  [org.hornetq.core.server] (Thread-103) HQ222015: LIVE IS STOPPING?!? message=STOP_CALLED true
...
12:31:18,571 WARN  [org.jboss.messaging] (ServerService Thread Pool -- 69) JBAS011603: Failed to destroy queue: DLQ: java.lang.IllegalStateException: Cannot access JMS Server, core server is not yet active
	at org.hornetq.jms.server.impl.JMSServerManagerImpl.checkInitialised(JMSServerManagerImpl.java:1657) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1]
	at org.hornetq.jms.server.impl.JMSServerManagerImpl.access$1100(JMSServerManagerImpl.java:108) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1]
	at org.hornetq.jms.server.impl.JMSServerManagerImpl$3.runException(JMSServerManagerImpl.java:820) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1]
	at org.hornetq.jms.server.impl.JMSServerManagerImpl.runAfterActive(JMSServerManagerImpl.java:1869) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1]
	at org.hornetq.jms.server.impl.JMSServerManagerImpl.removeQueueFromJNDI(JMSServerManagerImpl.java:809) [hornetq-jms-server-2.3.21.Final-redhat-1.jar:2.3.21.Final-redhat-1]
	at org.jboss.as.messaging.jms.JMSQueueService$2.run(JMSQueueService.java:89) [jboss-as-messaging-7.5.0.Final-redhat-9.jar:7.5.0.Final-redhat-9]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_20]
	at org.jboss.threads.JBossThread.run(JBossThread.java:122) [jboss-threads-2.1.1.Final-redhat-1.jar:2.1.1.Final-redhat-1]
...

Steps to Reproduce:
1. Start 2 EAP 6.4.0.DR7 in dedicated topology with replicated journal
2. Start producer and consumer on queue
3. Kill "live" server
4. Wait for clients to failover and start "live" server again
5. Clients failback to live and backup stops itself

Actual results:
Sometimes there "IllegalStateExceptions" with Failed to destroy queue/topic in log of backup server.

Expected results:
No exceptions should be thrown.

Additional info:
Adding server.log from backup server.

Comment 1 Miroslav Novak 2014-10-31 11:57:52 UTC
Created attachment 952466 [details]
server.log (backup)

Comment 2 Miroslav Novak 2014-10-31 12:07:24 UTC
To reproduce the problem follow those steps:
clone our testsuite from git:
git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git

Go to eap-tests-hornetq/scripts and run groovy script PrepareServers.groovy with -DEAP_VERSION=6.4.0.DR7 parameter:
groovy -DEAP_VERSION=6.4.0.DR7 PrepareServers.groovy

(Script will prepare 4 servers - server1..4 in the directory where are you currently standing.)

Export these paths to server directories + directory for shared journal and mcast addresse.:
export JBOSS_HOME_1=$PWD/server1/jboss-eap
export JBOSS_HOME_2=$PWD/server2/jboss-eap
export JBOSS_HOME_3=$PWD/server3/jboss-eap
export JBOSS_HOME_4=$PWD/server4/jboss-eap
export MCAST_ADDR=235.3.4.5

And finally: go to jboss-hornetq-testsuite/ in our testsuite and run
mvn clean test  -Darquillian.xml=arquillian-4-nodes.xml -Peap6x -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackTransAckQueue

Test does not fail! Only way to is to check server.log of server2 which is the replicated backup.

Comment 4 Justin Bertram 2016-06-17 18:12:38 UTC
This looks like a classic race condition.  When org.jboss.as.messaging.jms.JMSQueueService invokes org.hornetq.jms.server.impl.JMSServerManagerImpl.removeQueueFromJNDI the method checks to see if the broker is active (which it is). However, by the time it reaches the next check the broker isn't active anymore and so the exception is thrown.  It looks like the JMSQueueService is working in its own thread while another thread has stopped the broker.  I'm no expert on the messaging subsystem, but it seems to me these threads should coordinate with each other somehow to avoid this race.

Comment 6 Petr Jurak 2016-11-15 08:20:20 UTC
PR: https://github.com/jbossas/jboss-eap/pull/2880

Comment 7 Peter Mackay 2017-01-17 14:32:28 UTC
I am not seeing the exception anymore with EAP 6.4.13.CP.CR2. Verified.

Comment 8 Petr Penicka 2017-02-03 16:43:07 UTC
Released with EAP 6.4.13 on Feb 02 2017.