Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1079763

Summary: Some backup won't become passive mode mode when more than two live-backup (e.g. 2 live and 2 backup) is configured with data replication
Product: [JBoss] JBoss Enterprise Application Platform 6 Reporter: Masafumi Miura <mmiura>
Component: HornetQ, DocumentationAssignee: eap-docs <eap-docs>
Status: CLOSED WONTFIX QA Contact: Miroslav Novak <mnovak>
Severity: high Docs Contact: Russell Dickenson <rdickens>
Priority: unspecified    
Version: 6.2.1CC: csuconic, msvehla, tanabe.yoshimasa, twells
Target Milestone: ---   
Target Release: EAP 6.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-12 14:56:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
config files none

Description Masafumi Miura 2014-03-23 23:44:32 UTC
Description of problem:

Some backup won't become passive mode mode when more than two live-backup (e.g. 2 live and 2 backup) is configured with data replication

4 HornetQ instances have been configured to form HA live-backup group. The HA mode is date replication. All 4 nodes are belong to same backup group "fish".

 node1: live1
 node2: live2
 node3: backup1
 node4: backup2

If two backup nodes are started before two live nodes, all backup nodes become passive and form live-backup pairs. But, when starting these instances with the following steps, some backup node does not become passive and still waiting for live node:

1. Start 2 live nodes (node1 and node2)
2. Start 1 backup node (node3) -> This node will become passive successfully. For example, node1 and node2 become live-bakcup pair. node2 remains without bakup pair.
3. Start another backup node (node4) -> This node does not become passive and waiting for live node. Though node2 remains without bakup pair, node4 does not form live-backup pair with node2.


Version-Release number of selected component (if applicable):

- HornetQ 2.3.12.Final in EAP 6.2.0 
- HornetQ 2.3.14.Final in EAP 6.2.1.

Steps to Reproduce:
1. Start 2 live nodes (node1 and node2) 
2. Start 1 backup node (node3)
3. Start another backup node (node4)


Actual results:
node4 does not become passive mode and is still waiting for live node. Though one live node remains without bakup pair, node4 does not form live-backup pair with it.


Expected results:
node4 become passive mode. all backup nodes become passive and all instances form live-backup pairs. 

Additional info:

Comment 1 Masafumi Miura 2014-03-23 23:45:58 UTC
Created attachment 877908 [details]
config files

Comment 2 Miroslav Novak 2014-03-24 09:07:56 UTC
Hi,

I've checked the config and see that there is same backup-group-name "fish" for live1 and live2. Each live/backup pair should have unique backup-group-name. Can you modify the config and try?

Thanks,
Mirek

Comment 3 Masafumi Miura 2014-03-24 12:37:43 UTC
(In reply to Miroslav Novak from comment #2)
> 
> I've checked the config and see that there is same backup-group-name "fish"
> for live1 and live2. Each live/backup pair should have unique
> backup-group-name. Can you modify the config and try?
> 

Hi, the HornetQ documentation[1] has the following note:

~~~
A backup-group-name example: suppose you have 5 live servers and 6 backup servers:
 - live1, live2, live3: with backup-group-name=fish
 - live4, live5: with backup-group-name=bird
 - backup1, backup2, backup3, backup4: with backup-group-name=fish
 - backup5, backup6: with backup-group-name=bird

After joining the cluster the backups with backup-group-name=fish will search for live servers with backup-group-name=fish to pair with. Since there is one backup too many, the fish will remain with one spare backup.

The 2 backups with backup-group-name=bird (backup5 and backup6) will pair with live servers live4 and live5.
~~~

It indicates that multiple live and backup are able to have same backup-group-name. 

In addition, as I noted in the previous comment#1, all backup nodes become passive and form live-backup pairs if two backup nodes are started before two live nodes.

Therefore, I think "Each live/backup pair should have unique backup-group-name" is not correct. 

[1] http://docs.jboss.org/hornetq/2.3.0.Final/docs/user-manual/html/ha.html#ha.mode.replicated

Comment 4 Clebert Suconic 2014-03-24 13:38:07 UTC
That's related to shared storage. on Replication you can only have one node and one backup. this is as planned.


With Shared storage you can have a lock among the nodes. that's not possible with replication. 


I would close this and fix the documentation.

Comment 5 Miroslav Novak 2014-03-24 13:56:44 UTC
I just tried scenario:
1. Start 2 live nodes (node1 and node2) 
2. Start 1 backup node (node3)
3. Start another backup node (node4)
4. Stop node4

and see the same issue. node4 did not become backup for live node1. node4 does not activate even when live node1 is killed.

I can see this to happen. When node3 is started then it becomes backup for live node 1. When node 4 is started then it seems that it starts to replicate journal not from live node 1 but from backup node3 which is already backup for live node 1. So there is replication in this order node1 -> node3 -> node4. Because backup node 4 directly replicates data from backup node 3, it does not become backup for live node 1. I'm not sure if node 4 actually can become backup for node 1 when node 3 is stopped because there could be journal inconsistencies. I don't know the implementation details in this.

My apologize, you're right with the community documentation. There is exactly what you say. EAP 6 documentation does not mention this. 

We should discuss whether it's supported to have multiple backups for 1 live server for replicated journal. This appears to be tricky non-intuitive thing.

Comment 6 Clebert Suconic 2014-03-24 14:00:47 UTC
Implementing this would be a feature... and it won't be easy... I"m not sure we would fix it.

Comment 7 Miroslav Novak 2014-03-24 14:26:27 UTC
I agree with Clebert. This would be a new feature. I suggest to update documentation in this way:
a) we support only one backup per live server
b) live/backup pair must have unique backup group name and it must be specified (this means we'll not support the random live/backup pairing when backup-group-name is not specified)

Comment 8 Miroslav Novak 2014-05-21 08:57:14 UTC
Adding flags and assigning to doc team.

Comment 9 Miroslav Novak 2014-07-07 13:44:54 UTC
*** Bug 1079765 has been marked as a duplicate of this bug. ***