Bug 1015207 - RFE: Cross datacenter replication - update documentation on how to configure CDR with multiple site masters
RFE: Cross datacenter replication - update documentation on how to configure ...
Status: CLOSED CURRENTRELEASE
Product: JBoss Data Grid 6
Classification: JBoss
Component: Documentation (Show other bugs)
6.2.0
Unspecified Unspecified
unspecified Severity unspecified
: GA
: 6.2.0
Assigned To: Misha H. Ali
:
Depends On: 1018058
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-03 12:11 EDT by Divya Mehra
Modified: 2014-01-15 19:03 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-01-15 19:03:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Divya Mehra 2013-10-03 12:11:46 EDT
Document URL: 
Admin & Config Guide

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 
[1] results in a major performance improvement in Cross datacenter replication (CDR). Please update the section in which we Configure CDR to include information on how to configure multiple site-masters and why.

[1] https://issues.jboss.org/browse/JGRP-1649
Comment 2 Misha H. Ali 2013-10-03 18:20:02 EDT
The scope for this should be 1-2 topics expanding on the current x-site content to elaborate on multiple site masters (why is this useful and how to configure it). Scoped for GA.
Comment 4 Misha H. Ali 2013-10-04 01:58:49 EDT
Added the following topics (indicated with +) for this bug:

Section: Configure Multiple Site Managers
 Configure Multiple Site Managers (Remote Client-Server Mode)
 Configure Multiple Site Managers (Library Mode)

Drafting introductory topic, emailed Bela, Radim and Tristan for configuration specifics.
Comment 5 Misha H. Ali 2013-10-09 02:05:37 EDT
Status update:

+ indicates a newly added topic.

26.5. Configure Multiple Site Managers --> Draft complete.
+ 26.5.1. Multiple Site Manager Operations --> Draft complete.
26.5.2. Configure Multiple Site Managers (Remote Client-Server Mode) -- Incomplete draft. Sent follow up question to SMEs.
26.5.3. Configure Multiple Site Managers (Library Mode) --> Sent question to SMEs.

Possibly add an additional topic about programmatic configuration (if possible). Asked SMEs about that as well.
Comment 7 Misha H. Ali 2013-10-15 20:58:32 EDT
26.5. Configure Multiple Site Managers --> Draft complete.
26.5.1. Multiple Site Manager Operations --> Draft complete.
6.5.2. Configure Multiple Site Managers (Remote Client-Server Mode)
26.5.3. Configure Multiple Site Managers (Library Mode) --> Sent question to
Radim.
26.5.4. Configure Multiple Site Managers Programmatically (Draft completed)

Need information about a file called jgroups-configuration. Sent questions to Radim. Will continue with this bug when he has the time to respond.
Comment 8 Misha H. Ali 2013-10-18 01:38:48 EDT
Updated topic about library mode configuration for normal CDR. Waiting for Radim to ACK and then I'll replicate the change to the 6.2 Beta and 6.2 docs as a patch.
Comment 9 Misha H. Ali 2013-10-19 03:00:42 EDT
(In reply to Misha H. Ali from comment #8)
> Updated topic about library mode configuration for normal CDR. Waiting for
> Radim to ACK and then I'll replicate the change to the 6.2 Beta and 6.2 docs
> as a patch.

Radim's feedback:

* when speaking about relay and as the particular protocol (written 
monospaced upper-case), we should rather use relay.RELAY2 rather than 
RELAY - that may confuse someone
* fix indentation in 1.b
* infinispan-core-version.jar: the "version" should be slanted or 
otherwise differentiated as it's just the placeholder - in JDG 6.2 GA 
this will be something like infinispan-core-6.0.0.Final-redhat-1.jar. If 
it does not fit, use simply infinispan-core.jar
* 5.: I'd rephrase the "Move all the created files to the classpath 
before using the new configurations. " to "Make sure all the created 
files are on the classpath before using the new configurations." as 
classpath is not any specific directory

Applying this now before sending back to Radim for a final review.
Comment 10 Misha H. Ali 2013-10-19 03:18:54 EDT
(In reply to Misha H. Ali from comment #9)
> Radim's feedback:
> 
> * when speaking about relay and as the particular protocol (written 
> monospaced upper-case), we should rather use relay.RELAY2 rather than 
> RELAY - that may confuse someone

OK, fixed.

> * fix indentation in 1.b

This is just a weird styling this from the brand applied to the books, I'm afraid. I've logged a bug for this behavior but it is not intentional and nothing I can fix.

> * infinispan-core-version.jar: the "version" should be slanted or 
> otherwise differentiated as it's just the placeholder - in JDG 6.2 GA 
> this will be something like infinispan-core-6.0.0.Final-redhat-1.jar. If 
> it does not fit, use simply infinispan-core.jar

Sounds good, added {VERSION} to make it clear that this value is a stand-in for an actual version.

> * 5.: I'd rephrase the "Move all the created files to the classpath 
> before using the new configurations. " to "Make sure all the created 
> files are on the classpath before using the new configurations." as 
> classpath is not any specific directory

Changed as suggested.

> 
> Applying this now before sending back to Radim for a final review.
Comment 11 Misha H. Ali 2013-10-22 17:40:51 EDT
Tomas has reviewed and ACKed the following topics for this bug:

* Configure Multiple Site Managers
* Multiple Site Manager Operations
* Configure Multiple Site Managers (Remote Client-Server Mode)
Comment 12 Misha H. Ali 2013-10-22 19:30:33 EDT
Set programmatic configuration and library mode configuration on the QE and SME pads. Waiting for review from either.
Comment 13 Misha H. Ali 2013-10-28 20:44:39 EDT
Received feedback from Tristan about some CDR topics (and applied feedback).

Topics passed on to QE:

* Configure Multiple Site Masters
* Multiple Site Masters Operations

Topics still waiting for SME review:

* Configure Multiple Site Managers (Remote Client-Server Mode)
* Configure Multiple Site Managers (Library Mode)
* Configure Multiple Site Managers Programmatically
Comment 15 Misha H. Ali 2013-10-28 22:11:44 EDT
Tristan has reviewed and ACKed the ones we move to QE. The ones he hasn't gotten to yet are still waiting for his review. It has more to do with his availability than the content. Eventually, all topics get SME and then QE review.
Comment 16 Misha H. Ali 2013-10-29 20:28:21 EDT
Tristan has reviewed more topics. 

The following are now passed to QE:

* Configure Multiple Site Managers (Remote Client-Server Mode)
* Configure Multiple Site Managers Programmatically

The following topics require further information or clarification before further review:

* Configure Multiple Site Managers (Library Mode)

Additionally, Martin has reviewed the topics queued for QE review. The following topics are now QE passed:

* Configure Multiple Site Masters
* Multiple Site Masters Operations
Comment 25 Misha H. Ali 2013-11-19 23:38:05 EST
Hi Radim,

I'm afraid I'm really confused now. Can we just start from scratch to clear this up. This is what I have understood (I am pointing to existing code for each). please confirm or correct me where required:

XML Config (Lib)

* Setting Up Cross Datacenter Replication:

-------

Step 1: 26.3.2 Step 1a (no changes needed)

---------

Step 2: 26.3.2 Step 1b (no changes needed)

------

Step 3: Not in topic at the moment. Should be exactly this:

Configure the cache in site LON to backup to the sites NYC and SFO:
<infinispan>
   <global>      
      <site local="LON" />
      ...   
   </global>
   ...
   <namedCache name="lon">
      <sites>
         <backups>
            <backup site="NYC" strategy="SYNC" backupFailurePolicy="WARN" timeout="12000" />
            <backup site="SFO" strategy="ASYNC" backupFailurePolicy="IGNORE" timeout="10000" />
         </backups>
      </sites>
   </namedCache>
</infinispan>

--------

Step 4: Not in topic at the moment. Should be exactly this:

Part A) Configure the cache in site NYC to receive backup data from LON:
<infinispan>
   <global>      
      <site local="NYC" />
      ...   
   </global>
   ...
   <namedCache name="lonBackup">
      <sites>
         <backupFor remoteSite="LON" remoteCache="lon" />
      </sites>
   </namedCache>
</infinispan>

Part B) Configure the cache in site NYC to receive backup data from SFO:

<infinispan>
   <global>      
      <site local="NYC" />
      ...   
   </global>
   ...
   <namedCache name="sfoBackup">
      <sites>
         <backupFor remoteSite="SFO" remoteCache="sfo" />
      </sites>
   </namedCache>
</infinispan>

----------

Step 5: Exactly as stated in 26.3.2 Step 2

----------

Step 6: Exactly as stated in 26.3.2 Step 4

---------

*** END OF LIB MODE CONFIG FOR CDR ***

* XML Config for Multiple Masters

Would it be sufficient to say that as a prerequisite, perform all the steps in the CDR library mode config (above) and then set max_site_masters=16?

** END OF LIB MODE CONFIG FOR CDR MULTIPLE MASTERS **

* Programmatic config for CDR

Step 1: declare in which site the node is in particular site. Only need following code snippet:

globalConfiguration.site().localSite("LON");

-------

Step 2: configure infinispan to use jgroups configuration with relay. Code missing, was it the following?:

globalConfiguration.transport().addProperty("configurationFile", "{JGROUPS_CONFIGURATION_FILE}")

-------

Step 3: setup infinispan caches to replicate to remote site. Use this as it is:

ConfigurationBuilder lon = new ConfigurationBuilder();
lon.sites().addBackup()
      .site("NYC")
      .backupFailurePolicy(BackupFailurePolicy.WARN)
      .strategy(BackupConfiguration.BackupStrategy.SYNC)
      .replicationTimeout(12000)
      .sites().addInUseBackupSite("NYC")
    .sites().addBackup()
      .site("SFO")
      .backupFailurePolicy(BackupFailurePolicy.IGNORE)
      .strategy(BackupConfiguration.BackupStrategy.ASYNC)
      .sites().addInUseBackupSite("SFO")

--------

Step 4: setup infinispan caches to receive the replicated data from other site. Use this as-is:

ConfigurationBuilder cb = new ConfigurationBuilder();
cb.sites().backupFor().remoteCache("users").remoteSite("LON");

---------

** END OF PROG CONFIG FOR MULTIPLE SITE MASTERS ***

Radim, could you please confirm or correct the above?
Comment 26 Radim Vansa 2013-11-20 03:44:02 EST
Basically: yes. My remarks:

Setting Up Cross Datacenter Replication

Step 3.: please remove the `timeout="10000"` attribute (to keep in sync with programmatic config)

Step 4, Part B - we're backing data from LON to both NYC and SFO, not backing from SFO to NYC
------------------
Part B) Configure the cache in site SFO to receive backup data from LON:

<infinispan>
   <global>      
      <site local="SFO" />
      ...   
   </global>
   ...
   <namedCache name="lonBackup">
      <sites>
         <backupFor remoteSite="LON" remoteCache="lon" />
      </sites>
   </namedCache>
</infinispan>
------------------

Step 5: Exactly as stated in 26.3.2 Step 2
Step 6: Exactly as stated in 26.3.2 Step 3 (you missed this one)
Step 7: Exactly as stated in 26.3.2 Step 4


Programmatic config for CDR, Step 4 - let's keep things synced with XML config, therefore, use
------------------
ConfigurationBuilder lonBackup = new ConfigurationBuilder();
lonBackup.sites().backupFor().remoteCache("lon").remoteSite("LON");
-----------------

IMPORTANT: Then, you have to do the steps 5 - 7 from XML configuration as well. But I guess we don't want to copy-paste them.
Comment 27 Misha H. Ali 2013-11-20 04:44:32 EST
Thanks, Radim. This makes it much clearer. I'll work out a draft to show you tomorrow.
Comment 29 Martin Gencur 2013-11-28 08:04:32 EST
Misha,
after looking at the XML and programmatic configuration, I'd suggest these changes (comprise also Radim's latest proposals):

* 26.3.3. Step 2 

Change the configuration to the following:
globalConfiguration.transport().addProperty("configurationFile", jgroups-with-relay.xml);

...here we specify the file name explicitly, and that name matches the name we used in XML configuration earlier as well as
what we suggest later in 26.3.3 Step 5, no need to use a "placeholder" here

* 26.3.2 Step 5

We already include the configuration of multiple site masters here. But we have a specific chapter for this later. Let's remove 
can_become_site_master="true" max_site_masters="16" from this topic.

* 26.3.3. Step 4

Mimic the approach from XML configuration - configure both sites: NYC and SFO:

My Suggestion:
--------------------------------------
4. Configure the Back Up Caches

JBoss Data Grid implicitly replicates data to a cache with same name on the remote site.
If a back up cache on the remote site has a different name, users must specify a backupFor cache to 
ensure data is replicated to the right cache.

a. Configure the cache in site NYC to receive back up data from LON:
ConfigurationBuilder NYCbackupOfLon = new ConfigurationBuilder();
lonBackup.sites().backupFor().remoteCache("lon").remoteSite("LON");

b. Configure the cache in site SFO to receive back up data from LON:
ConfigurationBuilder SFObackupOfLon = new ConfigurationBuilder();
lonBackup.sites().backupFor().remoteCache("lon").remoteSite("LON");
-----------------------------------------

...so this whole step is optional, and needed only if the remote site's caches
have different names than the original ones

* 26.4.4

The text should be changed to the following:

A site can be taken offline in Red Hat JBoss Data Grid using the JBoss Operations Network Operations. For a list of the metrics, see Section 21.7.2, “JBoss Operations Network Plugin Operations

(change "Metrics" -> "Operations" and point to a bit different chapter)

* 26.5.3

AFAIK, there's only one way to configure multiple site masters in library mode, regardless of XML or programmatic configuration.
Of out chapters 26.5.3 and 26.5.4, only one step should remain (26.5.3 Step 2). All the other steps and whole chapter 26.5.4 can be removed.
It makes more sense to advise users perform all the steps in the CDR library mode config (above) and add 26.5.3 Step 2. I.e. just add
can_become_site_master="true" max_site_masters="16" to the RELAY2 protocol in jgroups-with-relay.xml file.


Other notes:
----------
Now there are these chapters:

26.3.2. Configure Cross-Datacentre Replication (Library Mode) 
26.3.3. Configure Cross-Datacentre Replication Programmatically
These two chapters should *not* be at the same level. The configuration examples in 26.3.2 are XML configuration for Library Mode
The configuration examples in 26.3.3 are actually programmatic configuration but for Library Mode too

...so it should rather look like this:
26.3.2. Configure Cross-Datacentre Replication (Library Mode) 
26.3.3.1 Configure Cross-Datacentre Replication through XML (declaratively)
26.3.3.2 Configure Cross-Datacentre Replication Programmatically
---------


The rest of the chapter looks good now.
Comment 30 Misha H. Ali 2013-12-01 22:49:17 EST
(In reply to Martin Gencur from comment #29)
> Misha,
> after looking at the XML and programmatic configuration, I'd suggest these
> changes (comprise also Radim's latest proposals):
> 
> * 26.3.3. Step 2 
> 
> Change the configuration to the following:
> globalConfiguration.transport().addProperty("configurationFile",
> jgroups-with-relay.xml);
> 
> ...here we specify the file name explicitly, and that name matches the name
> we used in XML configuration earlier as well as
> what we suggest later in 26.3.3 Step 5, no need to use a "placeholder" here

ACK, change implemented.

> * 26.3.2 Step 5
> 
> We already include the configuration of multiple site masters here. But we
> have a specific chapter for this later. Let's remove 
> can_become_site_master="true" max_site_masters="16" from this topic.

ACK, removed.

> * 26.3.3. Step 4
> 
> Mimic the approach from XML configuration - configure both sites: NYC and
> SFO:
> 
> My Suggestion:
> --------------------------------------
> 4. Configure the Back Up Caches
> 
> JBoss Data Grid implicitly replicates data to a cache with same name on the
> remote site.
> If a back up cache on the remote site has a different name, users must
> specify a backupFor cache to 
> ensure data is replicated to the right cache.
> 
> a. Configure the cache in site NYC to receive back up data from LON:
> ConfigurationBuilder NYCbackupOfLon = new ConfigurationBuilder();
> lonBackup.sites().backupFor().remoteCache("lon").remoteSite("LON");
> 
> b. Configure the cache in site SFO to receive back up data from LON:
> ConfigurationBuilder SFObackupOfLon = new ConfigurationBuilder();
> lonBackup.sites().backupFor().remoteCache("lon").remoteSite("LON");
> -----------------------------------------
> 
> ...so this whole step is optional, and needed only if the remote site's
> caches
> have different names than the original ones
> 

ACK, fixed as suggested.

> * 26.4.4
> 
> The text should be changed to the following:
> 
> A site can be taken offline in Red Hat JBoss Data Grid using the JBoss
> Operations Network Operations. For a list of the metrics, see Section
> 21.7.2, “JBoss Operations Network Plugin Operations
> 
> (change "Metrics" -> "Operations" and point to a bit different chapter)
> 

ACK, changed.

> * 26.5.3
> 
> AFAIK, there's only one way to configure multiple site masters in library
> mode, regardless of XML or programmatic configuration.
> Of out chapters 26.5.3 and 26.5.4, only one step should remain (26.5.3 Step
> 2). All the other steps and whole chapter 26.5.4 can be removed.

OK, reduced 26.5.3 to one step and removed 26.5.4.

> It makes more sense to advise users perform all the steps in the CDR library
> mode config (above) and add 26.5.3 Step 2. I.e. just add
> can_become_site_master="true" max_site_masters="16" to the RELAY2 protocol
> in jgroups-with-relay.xml file.

Fixed now.

> 
> Other notes:
> ----------
> Now there are these chapters:
> 
> 26.3.2. Configure Cross-Datacentre Replication (Library Mode) 
> 26.3.3. Configure Cross-Datacentre Replication Programmatically
> These two chapters should *not* be at the same level. The configuration
> examples in 26.3.2 are XML configuration for Library Mode
> The configuration examples in 26.3.3 are actually programmatic configuration
> but for Library Mode too
> 
> ...so it should rather look like this:
> 26.3.2. Configure Cross-Datacentre Replication (Library Mode) 
> 26.3.3.1 Configure Cross-Datacentre Replication through XML (declaratively)
> 26.3.3.2 Configure Cross-Datacentre Replication Programmatically
> ---------

OK, rearranged. Retained Declaratively instead of thought XML in keeping with the style used for other such configs.
Comment 31 Radim Vansa 2013-12-02 03:05:48 EST
Almost OK now :) There's residue from refactoring of the docs is line

"Set the max_site_masters value to the number of nodes in the cluster to make all nodes masters."

in procedures 26.2 and 26.3, in steps 5) after the code snippet.

Thanks
Comment 32 Misha H. Ali 2013-12-02 03:31:05 EST
Hi Radim,

I've removed the line from Procedure 26.2 because as you pointed out, it no longer makes sense there. However, the 26.3 procedure does have that attribute in the config so either the config should exclude that line and that attribute in the code or both should stay right?
Comment 33 Radim Vansa 2013-12-02 03:36:59 EST
You're right - remove the two attributes (max_site_masters and can_become_site_master) from code snippet in 26.3 as well.
Comment 34 Misha H. Ali 2013-12-02 03:42:01 EST
Thanks, Radim.

Done.

Also I have made one minor change to 26.5.3. Configure Multiple Site Masters (Library Mode) Step 1 in that I have linked to both the prog and declarative configs as step 1. Formerly this just linked to declarative but I think we can give the user a choice?
Comment 35 Misha H. Ali 2014-01-15 19:03:06 EST
The fix for this bug is now generally released and available here:

https://access.redhat.com/site/documentation/en-US/Red_Hat_JBoss_Data_Grid/6.2/index.html

Note You need to log in before you can comment on or make changes to this bug.