Bug 1033943 - RFE: Allow setting of DCAwareRoundRobinPolicy for supporting multiple datacenters
Summary: RFE: Allow setting of DCAwareRoundRobinPolicy for supporting multiple datacen...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: RHQ Project
Classification: Other
Component: Storage Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHQ 4.10
Assignee: John Sanda
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-24 15:21 UTC by Elias Ross
Modified: 2014-04-23 12:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-23 12:31:54 UTC
Embargoed:


Attachments (Terms of Use)
Patch to master (3.86 KB, application/mbox)
2013-12-02 19:17 UTC, Elias Ross
no flags Details
Patch to master c39bcd8571 (3.86 KB, patch)
2013-12-02 19:20 UTC, Elias Ross
no flags Details | Diff

Description Elias Ross 2013-11-24 15:21:38 UTC
Description of problem:

The default load balancing policy is RoundRobinPolicy, which does not work well when using multiple datacenters.

This is an ask to support DCAwareRoundRobinPolicy by setting the system property:

+    private static final String DC_PROP = "rhq.storage.dc";

Version-Release number of selected component (if applicable): 4.9

Comment 1 Elias Ross 2013-12-02 19:17:51 UTC
Created attachment 831749 [details]
Patch to master

Applied to master branch.

I've only tested this on RHQ 4.9 only. I'm not certain if this works with the settings changes that are in master.

The other enhancement that may make sense is specifying the class for this instead. For example, there is this policy which may work as a better default:

http://www.datastax.com/drivers/java/1.0/apidocs/com/datastax/driver/core/policies/LatencyAwarePolicy.html

(not sure which version this is in, but not part of the driver RHQ users.)

Comment 2 Elias Ross 2013-12-02 19:20:16 UTC
Created attachment 831750 [details]
Patch to master c39bcd8571

Comment 3 John Sanda 2014-01-09 19:51:37 UTC
Elias, what other changes did you make for multi-data center support? It probably makes sense to track whatever additional work that will be involved to fully support multiple data centers in a separate bug, but it would be great to hear about what you have done.

Comment 4 Elias Ross 2014-01-09 21:18:11 UTC
For multi-DC work, the other issues are:
1032199	Storage Node	RHQ may overwrite storage node (Cassandra) replication settings
1032192	Core Server, Performance	RFE: Optimize RHQ server for remote database connection (~100ms latency)

There are some manual steps I did to install the new Cassandra node due to a problem adding more than one node at a time. I used 'Puppet' to create the cassandra.yaml. But then this led me to finding this problem:

1032308 - Storage node should output a warning from RhqInternodeAuthenticator if node not found

Currently, I can't get the RHQ 4.9 UI to show the storage nodes in the other data center. This may be due to network ACLs. It was very slow before anyway, with just 4 nodes. (I noticed some fixes in master for this.) I may retry with 4.10 at some point.

I did have RHQ running in the secondary datacenter for about a week, but again due to network ACLs, there was no way for the secondary DC RHQ to talk to agents in the other--it does work the other way currently. This goes back to the issue of how RHQ must connect outbound to all agents. Obviously getting the storage nodes installed requires network access as well.

Comment 5 John Sanda 2014-02-11 03:14:01 UTC
Changes have been committed to master. This adds some initial support for configuring the load balancing policy. Define the rhq.storage.client.load-balancing property in rhq-server.properties. It currently recognizes two values - RoundRobin and DCAwareRoundRobin. If the latter is specified, then the rhq.storage.dc property also needs to be set; otherwise, RoundRobinPolicy will be used.

I would like to add support for the token aware and latency aware policies as well. They both wrap another load balancing policy which will make configuring them more difficult since the configuration right now is pretty simple using system properties.

master commit hash: a78ef9f

Comment 6 Heiko W. Rupp 2014-04-23 12:31:54 UTC
Bulk closing of 4.10 issues.

If an issue is not solved for you, please open a new BZ (or clone the existing one) with a version designator of 4.10.


Note You need to log in before you can comment on or make changes to this bug.