Description of problem: At this moment we're forced to reset EAP 6 with HQ backup after failback so failover can happen again. However, we do not have CLI operation to restart HornetQ server only, so we have to restart JBoss server instance. But, when HornetQ is configured as co-located live-backup, hornetq live server have to be restarted in conjunction with hornetq backup server. This will be a problem. Please add CLI operation to restart HornetQ server without restarting JBoss server instance. Similar issue was discussed in BZ#1013536. The BZ added "max-saved-replicated-journal-size" option only but it did not add CLI operation to restart hornetq server only.
Setting severity "high" as there is customer waiting for this.
@Jeff: we have the stop/start method on the Server. but he's saying it's not part of the CLI. Can you make sure it's possible to include it?
as a workaround the customer can use JMX for this oepration. The stop/start is exposed there.
(In reply to Clebert Suconic from comment #3) > as a workaround the customer can use JMX for this oepration. The stop/start > is exposed there. What is the name of MBean? I can not find it. I just started EAP 6 with standalone-full.xml and connect to the instance via jconsole. I checked MBeans under jboss.as:subsystem=messaging and jboss.as:extension=org.jboss.as.messaging but there's no start or stop operations.
you have to enable JMX on the messaging model through the standalone config on Wildfly / EAP. I don't remember the exact name now.. (You can feed it here for future references) But we certainly have a method to start / stop the server through management and internal APIs. Jeff knows well about it.
(In reply to Clebert Suconic from comment #5) > you have to enable JMX on the messaging model through the standalone config > on Wildfly / EAP. > > I don't remember the exact name now.. (You can feed it here for future > references) > > But we certainly have a method to start / stop the server through management > and internal APIs. Jeff knows well about it. I enabled JMX in the messaging subsystem via JBoss CLI. For example: /subsystem=messaging/hornetq-server=default:write-attribute(name=jmx-management-enabled,value=true) :reload The above CLI adds the following to the messaging subsystem: <jmx-management-enabled>true</jmx-management-enabled> Then I can see "org.hornetq" in jconsole. But there's still no start/stop operation...
It would been on JMSServerControl.. but the method is not exposed. We could expose it through JMX for the user while the CLI is implemented.
Does it make sense to add a "start" method here to the JMX interface? Unless I'm mistaken the only time the MBean will be available is when the server is started. As soon as the server is stopped the MBean will disappear. Exposing only "stop" makes sense to me at this point. I can understand adding both start and stop to the CLI since the EAP management layer will still be available even if HornetQ is stopped. Thoughts?
Can't you change the behaviour to keep the server on the JMX if stop is called? I don't see an issue on keeping it. it could maybe be an issue on our testsuite as we stop/start servers and maybe that would leak on the tests.. but I don't see an issue in production. We would need to double check this.. I think the tests are restarting a new JMX server every time one is needed.
In a standalone (i.e. not in an application server) use-case won't the JVM exit once the server is stopped? If so, there will be nothing to "keep." As I said, I can see where start/stop on the EAP CLI would be useful. However, in that case I would expect the messaging subsystem could do everything it needs to do without any modifications to the JMSServerControl.
I don't see any issues on keeping the ServerControl on the JMX. as you said it will go away on standalone anyways.
Is this RFE still valid? With HornetQ 2.4.5.Final a backup server that allow failback is automatically restarted after the live comes back: [Server:server-two] 16:57:59,740 INFO [org.hornetq.core.server] (Thread-110) HQ221002: HornetQ Server version 2.4.5.FINAL (Wild Hornet, 124) [32f4f857- 6696-11e4-b015-0f7bb9743b30] stopped [Server:server-two] 16:57:59,740 INFO [org.hornetq.core.server] (Thread-110) HQ221039: Restarting as Replicating backup server after live restart ... Server:server-two] 16:57:59,902 INFO [org.hornetq.core.server] (HQ119000: Activation for server HornetQServerImpl::serverUUID=null) HQ221109: HornetQ Backup Server version 2.4.5.FINAL (Wild Hornet, 124) [null] started, waiting live to fail before it gets active ... [Server:server-two] 16:58:00,015 INFO [org.hornetq.core.server] (Thread-18 (HornetQ-client-netty-threads-290979250)) HQ221024: Backup server HornetQServerImpl::serverUUID=32f4f857-6696-11e4-b015-0f7bb9743b30 is synchronized with live-server. When was this behaviour introduced?
Hi Jeff, the problem here is that by default failback can happen only twice. The reason for this is that after failback backup synchronizes with live again and creates new journal directories for this. This is limited to 2 by default so only 2 failbacks can happen. After 3rd failback backup refuses to synchronise with live again and must be restarted. But in colocated topology this means that also live (from the 2nd live backup pair) is restarted because at this moment we can restart only the whole EAP server. This causes that we get to the same situation with the 2nd live/backup pair and we're in never ending cycle. Solotion for this is to add CLI operation which stop/start just HQ backup server. Thanks, Mirek
@Mirek, we can then increase the value of max-saved-replicated-journal-size that governs the number of failback that are allowed, right? In that case, a failed back server should have an operation to clean up any saved replicated journal so that it can failback without limit. Adding a start/stop operation is not a solution. It will still require to perform a management operation to clean up the replicated journals (I don't see that addressed in HornetQ documentation). If we have this clean up operation, we don't need to manually start and stop hornetq server. Besides it also has strong consequences on the hornetq dependencies (e.g. if a deployment depends on the backup server, stopping the server would undeploy the archive). It is unlikely that we will add start/stop commands to HornetQ before implementing graceful shutdown (if we ever add them).
I think that increasing max-saved-replicated-journal-size just postpone the problem. Just more failover->failback is needed. The reason why start/stop is required is that it resets "replicated-journal counter" so backup can synchronize with live again. No matter how many journals it stored before restart. Probably an option could be to "rewrite/remove" the oldest journal if max-saved-replicated-journal-size is reached. Then no human interaction would be necessary. wdyt?
This is essentially a new feature offering a workaround to a real issue that has existed for some time. Removing blocker? It would be handled in a CP.
I'm ok with deferring this to EAP7.
Bartosz Baranowski <bbaranow> updated the status of jira JBPAPP-11212 to Resolved
Bartosz Baranowski <bbaranow> updated the status of jira JBPAPP-11212 to Closed
Carlo de Wolf <cdewolf> updated the status of jira JBPAPP-11212 to Reopened
Carlo de Wolf <cdewolf> updated the status of jira JBPAPP-11212 to Closed
Jeff Mesnil <jmesnil> updated the status of jira WFLY-207 to Closed
Brian Stansberry <brian.stansberry> updated the status of jira JBEAP-95 to Closed