Bug 1099730

Summary: Add CLI operation to restart HornetQ server without restarting JBoss server instance
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Masafumi Miura <mmiura>
Component: HornetQAssignee: Clebert Suconic <csuconic>
Status: CLOSED WONTFIX QA Contact: Miroslav Novak <mnovak>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: urgent    
Version: 6.2.2CC: ataylor, bbaranow, brian.stansberry, cdewolf, csuconic, dandread, jawilson, jbertram, jmesnil, kkhan, msvehla, myarboro, rsvoboda
Target Milestone: ---Keywords: Reopened
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-27 12:20:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Masafumi Miura 2014-05-21 05:07:46 UTC
Description of problem:

At this moment we're forced to reset EAP 6 with HQ backup after failback so failover can happen again.

However, we do not have CLI operation to restart HornetQ server only, so we have to restart JBoss server instance. But, when HornetQ is configured as co-located live-backup, hornetq live server have to be restarted in conjunction with hornetq backup server. This will be a problem.

Please add CLI operation to restart HornetQ server without restarting JBoss server instance.

Similar issue was discussed in BZ#1013536. The BZ added "max-saved-replicated-journal-size" option only but it did not add CLI operation to restart hornetq server only.

Comment 1 Miroslav Novak 2014-05-22 08:12:48 UTC
Setting severity "high" as there is customer waiting for this.

Comment 2 Clebert Suconic 2014-05-29 22:28:08 UTC
@Jeff: we have the stop/start method on the Server. but he's saying it's not part of the CLI. Can you make sure it's possible to include it?

Comment 3 Clebert Suconic 2014-05-29 22:28:30 UTC
as a workaround the customer can use JMX for this oepration. The stop/start is exposed there.

Comment 4 Masafumi Miura 2014-05-29 22:52:27 UTC
(In reply to Clebert Suconic from comment #3)
> as a workaround the customer can use JMX for this oepration. The stop/start
> is exposed there.

What is the name of MBean? I can not find it. I just started EAP 6 with standalone-full.xml and connect to the instance via jconsole. I checked MBeans under jboss.as:subsystem=messaging and jboss.as:extension=org.jboss.as.messaging but there's no start or stop operations.

Comment 5 Clebert Suconic 2014-05-29 22:57:29 UTC
you have to enable JMX on the messaging model through the standalone config on Wildfly / EAP.

I don't remember the exact name now.. (You can feed it here for future references)

But we certainly have a method to start / stop the server through management and internal APIs. Jeff knows well about it.

Comment 6 Masafumi Miura 2014-05-30 02:04:35 UTC
(In reply to Clebert Suconic from comment #5)
> you have to enable JMX on the messaging model through the standalone config
> on Wildfly / EAP.
> 
> I don't remember the exact name now.. (You can feed it here for future
> references)
> 
> But we certainly have a method to start / stop the server through management
> and internal APIs. Jeff knows well about it.

I enabled JMX in the messaging subsystem via JBoss CLI. For example:

   /subsystem=messaging/hornetq-server=default:write-attribute(name=jmx-management-enabled,value=true)
   :reload

The above CLI adds the following to the messaging subsystem:

    <jmx-management-enabled>true</jmx-management-enabled>

Then I can see "org.hornetq" in jconsole. But there's still no start/stop operation...

Comment 7 Clebert Suconic 2014-05-30 04:00:45 UTC
It would been on JMSServerControl.. but the method is not exposed.


We could expose it through JMX for the user while the CLI is implemented.

Comment 9 Justin Bertram 2014-07-21 18:01:41 UTC
Does it make sense to add a "start" method here to the JMX interface?  Unless I'm mistaken the only time the MBean will be available is when the server is started.  As soon as the server is stopped the MBean will disappear.  Exposing only "stop" makes sense to me at this point.

I can understand adding both start and stop to the CLI since the EAP management layer will still be available even if HornetQ is stopped.

Thoughts?

Comment 10 Clebert Suconic 2014-07-21 18:17:23 UTC
Can't you change the behaviour to keep the server on the JMX if stop is called?


I don't see an issue on keeping it.


it could maybe be an issue on our testsuite as we stop/start servers and maybe that would leak on the tests.. but I don't see an issue in production.


We would need to double check this.. I think the tests are restarting a new JMX server every time one is needed.

Comment 11 Justin Bertram 2014-07-21 18:38:21 UTC
In a standalone (i.e. not in an application server) use-case won't the JVM exit once the server is stopped?  If so, there will be nothing to "keep."

As I said, I can see where start/stop on the EAP CLI would be useful.  However, in that case I would expect the messaging subsystem could do everything it needs to do without any modifications to the JMSServerControl.

Comment 12 Clebert Suconic 2014-07-21 19:58:53 UTC
I don't see any issues on keeping the ServerControl on the JMX. as you said it will go away on standalone anyways.

Comment 13 Jeff Mesnil 2014-11-07 16:03:05 UTC
Is this RFE still valid?

With HornetQ 2.4.5.Final a backup server that allow failback is automatically restarted after the live comes back:

[Server:server-two] 16:57:59,740 INFO  [org.hornetq.core.server] (Thread-110) HQ221002: HornetQ Server version 2.4.5.FINAL (Wild Hornet, 124) [32f4f857-
6696-11e4-b015-0f7bb9743b30] stopped
[Server:server-two] 16:57:59,740 INFO  [org.hornetq.core.server] (Thread-110) HQ221039: Restarting as Replicating backup server after live restart
...
Server:server-two] 16:57:59,902 INFO  [org.hornetq.core.server] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HQ221109: HornetQ
Backup Server version 2.4.5.FINAL (Wild Hornet, 124) [null] started, waiting live to fail before it gets active
...
[Server:server-two] 16:58:00,015 INFO  [org.hornetq.core.server] (Thread-18 (HornetQ-client-netty-threads-290979250)) HQ221024: Backup server HornetQServerImpl::serverUUID=32f4f857-6696-11e4-b015-0f7bb9743b30 is synchronized with live-server.


When was this behaviour introduced?

Comment 14 Miroslav Novak 2014-11-10 07:50:45 UTC
Hi Jeff, 

the problem here is that by default failback can happen only twice. The reason for this is that after failback backup synchronizes with live again and creates new journal directories for this. This is limited to 2 by default so only 2 failbacks can happen. 
After 3rd failback backup refuses to synchronise with live again and must be restarted. But in colocated topology this means that also live (from the 2nd live backup pair) is restarted because at this moment we can restart only the whole EAP server. This causes that we get to the same situation with the 2nd live/backup pair and we're in never ending cycle.
Solotion for this is to add CLI operation which stop/start just HQ backup server.

Thanks,
Mirek

Comment 15 Jeff Mesnil 2014-11-13 15:42:53 UTC
@Mirek, we can then increase the value of max-saved-replicated-journal-size that governs the number of failback that are allowed, right?

In that case, a failed back server should have an operation to clean up any saved replicated journal so that it can failback without limit.

Adding a start/stop operation is not a solution. It will still require to perform a management operation to clean up the replicated journals (I don't see that addressed in HornetQ documentation). If we have this clean up operation, we don't need to manually start and stop hornetq server.
Besides it also has strong consequences on the hornetq dependencies (e.g. if a deployment depends on the backup server, stopping the server would undeploy the archive).

It is unlikely that we will add start/stop commands to HornetQ before implementing graceful shutdown (if we ever add them).

Comment 16 Miroslav Novak 2014-11-13 16:23:50 UTC
I think that increasing max-saved-replicated-journal-size just postpone the problem. Just more failover->failback is needed.

The reason why start/stop is required is that it resets "replicated-journal counter" so backup can synchronize with live again. No matter how many journals it stored before restart.

Probably an option could be to "rewrite/remove" the oldest journal if max-saved-replicated-journal-size is reached. Then no human interaction would be necessary. wdyt?

Comment 23 Dimitris Andreadis 2014-11-20 15:13:58 UTC
This is essentially a new feature offering a workaround to a real issue that has existed for some time. Removing blocker? It would be handled in a CP.

Comment 25 Clebert Suconic 2014-11-20 17:42:44 UTC
I'm ok with deferring this to EAP7.

Comment 30 JBoss JIRA Server 2015-01-27 12:14:25 UTC
Bartosz Baranowski <bbaranow> updated the status of jira JBPAPP-11212 to Resolved

Comment 32 JBoss JIRA Server 2015-01-27 12:48:57 UTC
Bartosz Baranowski <bbaranow> updated the status of jira JBPAPP-11212 to Closed

Comment 33 JBoss JIRA Server 2015-01-27 12:50:56 UTC
Carlo de Wolf <cdewolf> updated the status of jira JBPAPP-11212 to Reopened

Comment 34 JBoss JIRA Server 2015-01-27 12:51:30 UTC
Carlo de Wolf <cdewolf> updated the status of jira JBPAPP-11212 to Closed

Comment 35 JBoss JIRA Server 2015-07-03 08:41:58 UTC
Jeff Mesnil <jmesnil> updated the status of jira WFLY-207 to Closed

Comment 36 JBoss JIRA Server 2016-01-26 01:17:51 UTC
Brian Stansberry <brian.stansberry> updated the status of jira JBEAP-95 to Closed