Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 912369

Summary: Replicated journal - Errors during live server clean shutdown
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Miroslav Novak <mnovak>
Component: HornetQAssignee: Chao Wang <chaowan>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1.0CC: chaowan, myarboro, rsvoboda
Target Milestone: ER3   
Target Release: EAP 6.1.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-26 08:48:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
conf for live server
none
conf for backup server none

Description Miroslav Novak 2013-02-18 14:37:25 UTC
Description of problem:
When live server is cleanly shut downed following errors are logged:

15:24:01,269 WARN  [org.hornetq.core.client] (Thread-2 (HornetQ-client-global-threads-219066074)) HQ212107: Connection failure has been detected: HQ119035: The connection was disconnected because of server shutdown [code=DISCONNECTED]
15:24:01,302 INFO  [org.hornetq.core.server] (Thread-7 (HornetQ-server-HornetQServerImpl::serverUUID=487cb2d8-79d6-11e2-aa14-8351b510082b-2142988662)) HQ221034: stopped bridge sf.my-cluster.32ded993-6928-11e2-b0ad-f0def199b2cf
15:24:01,800 WARN  [org.hornetq.core.client] (hornetq-discovery-group-thread-dg-group1) HQ212050: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=487cb2d8-79d6-11e2-aa14-8351b510082b
15:24:11,387 WARN  [org.hornetq.core.server] (MSC service thread 1-4) HQ222003: Timed out waiting for pool to terminate java.util.concurrent.ThreadPoolExecutor@18e2c931. Interrupting all its threads!
15:24:11,387 INFO  [org.hornetq.core.server] (MSC service thread 1-4) HQ221004: HornetQ Server version 2.3.0.CR1 (buzzzzz!, 122) [487cb2d8-79d6-11e2-aa14-8351b510082b] stopped
15:24:11,388 ERROR [org.hornetq.core.client] (Thread-2 (HornetQ-server-HornetQServerImpl::serverUUID=487cb2d8-79d6-11e2-aa14-8351b510082b-2142988662)) HQ214075: Caught unexpected Throwable: java.lang.IllegalStateException: Server locator is closed (maybe it was garbage collected)
	at org.hornetq.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1955) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:772) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:614) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:598) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl$4.run(ServerLocatorImpl.java:1617) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.0.CR1.jar:]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.6.0_24]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.6.0_24]
	at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_24]

15:24:11,388 ERROR [org.hornetq.core.client] (Thread-9 (HornetQ-server-HornetQServerImpl::serverUUID=487cb2d8-79d6-11e2-aa14-8351b510082b-2142988662)) HQ214011: Failed to stop discovery group: org.hornetq.api.core.HornetQInterruptedException: java.lang.InterruptedException
	at org.hornetq.core.cluster.DiscoveryGroup.stop(DiscoveryGroup.java:167) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl.doClose(ServerLocatorImpl.java:1369) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.client.impl.ServerLocatorImpl.close(ServerLocatorImpl.java:1342) [hornetq-core-client-2.3.0.CR1.jar:]
	at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl.closeLocator(ClusterConnectionImpl.java:513) [hornetq-server-2.3.0.CR1.jar:]
	at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl.access$100(ClusterConnectionImpl.java:83) [hornetq-server-2.3.0.CR1.jar:]
	at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$1.run(ClusterConnectionImpl.java:497) [hornetq-server-2.3.0.CR1.jar:]
	at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:106) [hornetq-core-client-2.3.0.CR1.jar:]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.6.0_24]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.6.0_24]
	at java.lang.Thread.run(Thread.java:679) [rt.jar:1.6.0_24]
Caused by: java.lang.InterruptedException
	at java.lang.Object.wait(Native Method) [rt.jar:1.6.0_24]
	at java.lang.Thread.join(Thread.java:1211) [rt.jar:1.6.0_24]
	at org.hornetq.core.cluster.DiscoveryGroup.stop(DiscoveryGroup.java:159) [hornetq-core-client-2.3.0.CR1.jar:]
	... 9 more

How reproducible:
Start EAP 6.1.0.DR4 live/backup pair in dedicated topology with replicated journal and cleanly (ctrl-c) shutdown live server.

Configuration for live and backup server in attachment:
standalone-full-ha-live.xml
standalone-full-ha-backup.xml

Expected results:
No errors in log.

Comment 1 Miroslav Novak 2013-02-18 14:38:20 UTC
Created attachment 698910 [details]
conf for live server

Comment 2 Miroslav Novak 2013-02-18 14:38:54 UTC
Created attachment 698911 [details]
conf for backup server

Comment 3 Chao Wang 2013-03-19 06:52:52 UTC
Hey @Miroslav I tried a couple of times to reproduce the failure, but I did not see any logged error from terminal/server.log. This is how I configure two servers:


Download EAP6.1.0 DR4
Configure two servers standalone-live and standalone-backup with the attached conf files:
cp -r standalone/ standalone-live
cp -r standalone/ standalone-backup
cp standalone-full-ha-live.xml standalone-live/configuration/
cp standalone-full-ha-backup.xml standalone-backup/configuration/

Start server live:
./bin/standalone.sh -c standalone-full-ha-live.xml -Djboss.server.base.dir=standalone-live -Djboss.server.config.dir=standalone-live/configuration/ -Djboss.server.name=node1

Start server backup:
./bin/standalone.sh -c standalone-full-ha-backup.xml -Djboss-server.base-dir=standalone-backup/ -Djboss.server.config.dir=standalone-backup/configuration/ -Djboss.server.name=node2 -Djboss.socket.binding.port-offset=100

Comment 4 Miroslav Novak 2013-03-20 09:25:29 UTC
Could you try not to use port offset and bind second server to another IP address, please?

Comment 5 Chao Wang 2013-03-21 03:05:25 UTC
I swithed to use 127.0.1.2 for the backup server: 

ifconfig lo:2 127.0.1.2 netmask 255.0.0.0 up

Modify the related interfaces info in the standalone-full-ha-backup.xml

./bin/standalone.sh -c standalone-full-ha-live.xml -Djboss.server.base.dir=standalone-live -Djboss.server.config.dir=standalone-live/configuration/ -Djboss.server.name=live

./bin/standalone.sh -c standalone-full-ha-backup.xml -Djboss-server.base-dir=standalone-backup/ -Djboss.server.config.dir=standalone-backup/configuration/ -Djboss.server.name=backup

After shutting down the live server, There is no error message emerged. 
Is this failure shows up permanently?

Comment 6 Miroslav Novak 2013-03-21 09:45:09 UTC
I tried your way to configure and start servers and could NOT hit those exceptions. 

The way how I configured and started servers is:
1. Download EAP6.1.0 ER3 (our last build of EAP 6.1 at this moment)
2. Unzip it into two directories - server1 and server2
3. Copy standalone-full-ha-live.xml to server1/jboss-eap-6.1/standalone/configuration
4. Copy standalone-full-ha-backup.xml to server2/jboss-eap-6.1/standalone/configuration
5. Start first server:
- go to server1/jboss-eap-6.1/bin 
- sh standalone.sh -c standalone-full-ha-live.xml -b <first_ip>
6. Start second server: 
- go to server2/jboss-eap-6.1/bin 
- sh standalone.sh -c standalone-full-ha-backup.xml -b <second_ip>

Now when server1(HQ configured as live) is cleanly shutdowned (ctrl-c) then you will see the exceptions.

Comment 7 Chao Wang 2013-03-26 06:32:28 UTC
Hey Miroslav, sorry for the delay. 
I tried your latest configuration(start from server1 and server2), but still can't reproduce the exceptions.

Comment 8 Miroslav Novak 2013-03-26 12:01:30 UTC
Hi Chao,  I tried my steps on another machine and did not get exceptions. Looks like environment on my workstation is somehow unique. I've also noticed that this won't happen always. I really hate those kinds of issues. I can only suggest to try start/stop server many times in a row.

Comment 9 Chao Wang 2013-03-26 12:05:33 UTC
OK, I'll try my luck and see if I can find something suspicious from the log in description.

Comment 10 Rostislav Svoboda 2013-07-08 11:37:37 UTC
Will be checked with 6.1.1 ER3

Comment 12 Miroslav Novak 2013-07-26 08:48:27 UTC
Can't reproduce this issue again with EAP 6.1.1.ER3. Closing.