When there is live/backup pair with replicated journal then it should be sufficient to set attribute "check-for-live-server" in messaging subsystem only on backup to force backup server to shutdown when live server comes alive again. Problem is this won't happen. Only when attribute "check-for-live-server" is set on live server then failback is successful (backup shutdown itself) It's not well documented where "check-for-live-server" should be set in HornetQ project documentation: http://docs.jboss.org/hornetq/2.3.0.CR1/docs/user-manual/html_single/index.html#ha.allow-fail-back This issue was hit with EAP 6.1.0.DR2 (HQ 2.3.0.CR1).
Created attachment 693833 [details] configuration for live server
Created attachment 693834 [details] configuration for backup server
Francisco Borges <francisco.borges> made a comment on jira AS7-6460 Hi, I just sent a PR improving the documentation of this option. Commit is this one https://github.com/FranciscoBorges/hornetq/commit/60381397aeacba97e95f10df4647a494785468fa
Miroslav Novak <mnovak> made a comment on jira AS7-6460 There is one more thing we could mention. I'm not sure if it's really problem now. Imagine this scenario: 1. There is live/backup pair in dedicated topology with replicated journal 2. Live is killed and backup takes its role (so everything failover to backup) 3. Now for whatever reason backup is shutdowned/killed too 4. Administrator comes and start live server first. Then he starts backup server. Live is not up-to-date and also backup will corrupt its journal when it synchronizes with "old" live. I guess we could add warning to doc not to do this if this is really an issue.
Francisco Borges <francisco.borges> made a comment on jira AS7-6460 My assumption is that the backup having newer data in a case like this is a "given assumption". @AndyTaylor and @Clebert, do you guys have any opinions? Notice that when the backup "restarts" as a back, it will move its data to a "side" directory. So its "exclusive" data won't get deleted that easily.
@Francisco +1
PR for HornetQ doc: https://github.com/FranciscoBorges/hornetq/commit/8d828124365a5f4de236d6228a114dffa13d2d08 Moving bz to ON_QA status for later verification.
After discussion with dev it was agreed that this is a feature. Can it be documented in EAP 6 doc? PR: https://github.com/FranciscoBorges/hornetq/commit/8d828124365a5f4de236d6228a114dffa13d2d08 This is a feature which was documented in above pull request. Assigning bz to doc team.
This BZ will be added to the list for review once the EAP 6.1 PRD commitments are complete.
Hi David, thanks a lot for adding all those attributes. I did a review in another bz [1] which is related to this one (it's in comment #4). Can you coordinate with Nichola to take updates, please? [1] https://bugzilla.redhat.com/show_bug.cgi?id=1099809#c4
Hi David, reading this BZ there is one more thing to be documented here. Can you add new paragraph to chapter 18.10.4. HornetQ Message Replication. This is for administrators to avoid dangerous situation: "To get to original state after failover, it necessary start live server again and wait until it's fully synchronized with backup. Only then you can shutdown backup for original live to activate again (this happens automatically when attribute allow-failback is set to true)." Thanks, Mirek
The book has not been rebuilt since this BZ was moved to MODIFIED so it should not be in ON_QA state.
This can be verified on DocStage here: http://documentation-devel.engineering.redhat.com/site/documentation/en-US/JBoss_Enterprise_Application_Platform/6.3/html-single/Administration_and_Configuration_Guide/index.html#HornetQ_Message_Replication
Thanks David! Setting as verified.