Bug 801638 - As a JON user I want to use an example CLI script which will join an EAP6 instance into an existing cluster and then restart the instance
Summary: As a JON user I want to use an example CLI script which will join an EAP6 ins...
Alias: None
Product: RHQ Project
Classification: Other
Component: Plugins
Version: unspecified
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: JON 3.1.0
Assignee: Lukas Krejci
QA Contact: Mike Foley
Depends On: 817631 815447 818673 829944 830865
Blocks: as7-plugin
TreeView+ depends on / blocked
Reported: 2012-03-09 04:27 UTC by Charles Crouch
Modified: 2015-02-01 23:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2013-09-03 15:11:50 UTC

Attachments (Terms of Use)
test web application (2.75 KB, application/octet-stream)
2012-05-03 18:56 UTC, Lukas Krejci
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 813066 0 urgent CLOSED [as7] Expose multicast group/port of a ha-server as trait on server level 2021-02-22 00:41:40 UTC

Internal Links: 813066

Description Charles Crouch 2012-03-09 04:27:32 UTC
-The concept of what it actually means to be in a "cluster" has always been complex in JBAS, e.g. its been possible to cluster instances for replicating HTTP Sessions and EJB SFSBs independently from one another. The scope of this feature is to allow instances to join whatever types of clusters AS7 enables. When initiating this work we need to figure out first what cluster membership management capabilities are available. If there are gaps JIRAs need to be raised on EAP6 to add this support. While the gaps are filled we should add support to the AS7 plugin to support whatever cluster membership management capabilities are currently available.
-Assuming the instance to join the cluster has got the same apps deployed as the other cluster members then after having run the script then it should be possible to execute the apps in a cluster aware fashion.

-CLI script to execute the desired changes. Inputs to the script
1) Name of EAP6 instance to add to the cluster
2) Cluster identifier

Comment 1 Charles Crouch 2012-03-09 04:28:37 UTC
Assigning to Heiko for initial triage

Comment 2 Heiko W. Rupp 2012-03-13 21:12:58 UTC
To create the script we need to differentiate between the two operation modes of AS7

In both cases, the socket-binding-group for the cluster needs to be the same; the cluster bindings need to live on a public interface ; this can be done per binding or by setting /socket-binding-group=x/default-interface=public

- domain mode: 

A managed server is created on a specific server-group that can not be changed. To set a different server-group (which must live in the ha profile), the server needs to be removed and re-created.
Re-creation will need to use the server-group that is running the ha-profile and which also needs to have the same socket bindings for the cluster subsystems if it is living on a different box. An app deployed to the server-group automatically gets deployed to the newly created managed server instance

- standalone mode : here more than one server in standalone mode need to form the cluster. The server connection-property needs to be changed to standalone-ha.xml and the server then restarted. If the application was already deployed, it will now operate in ha-mode 

as the socket bindings may be different for the two servers (no common setup as in domain mode), the script may also need to copy those over

Actually there are more properties like jgroups stack etc that may need to be copied over - here it could make sense to take the respective part of standalone-ha.xml and literally copy that into the new server and not fiddle with RHQ configuration objects at all.

Comment 3 Heiko W. Rupp 2012-03-14 09:40:55 UTC
In standalone mode it may be also needed to copy over the app from standalone.xml to standalone-ha.xml onfiguration when the app was deployed via the api (which the rhq-plugin does); this is not needed for apps that are deployed via file system - as7 unfortunately has an inconsistency here

Comment 4 Heiko W. Rupp 2012-04-17 14:58:12 UTC
We need to know the internals of the cluster to join
- jgroups property
- infinispan transport 
- infinspan cache-containers
- stuff like sso cache-container and others that use clustering

So reading the stuff on node 1 (existing cluster nodes) and applying to node2

Requirement: server 2 needs to run the same ha-profiel than the existing node

Deployments would be read from node1's content subsystem representation and deploy into node2

Comment 5 Heiko W. Rupp 2012-04-17 19:15:16 UTC
We would like to see https://issues.jboss.org/browse/AS7-4501 implemented, but need to work around it, as the basic idea of as7 is not really to have standalone servers in clusters (even if supported).

Comment 6 Lukas Krejci 2012-04-27 20:25:46 UTC
The script is only implemented for the standalone servers, because there is no work to be done when a managed server is joining a server group that has clustering enabled in the domain mode.

Testing instructions will be provided in future comments - I need to compile some (relatively) easy to follow instructions/examples.

master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=70ea9ac32b250b0aa96db1215efe5f4720483745
Author: Lukas Krejci <lkrejci>
Date:   Fri Apr 27 22:20:40 2012 +0200

    [BZ 801638] - adding a sample CLI script that can make an AS7 standalone
    server join a cluster (and copy over the deployments to it).

Comment 7 Lukas Krejci 2012-05-02 15:22:22 UTC
commit 33f4a40cc30f44d0aedc5c70c2847ef8534e3029
Author: Lukas Krejci <lkrejci>
Date:   Wed May 2 17:20:51 2012 +0200

    [BZ 801638] - fixing a couple of small bugs in the clustering script +
    adding support for reading the AS7 configuration from the script args
    as opposed to the 'config' property.

Comment 8 Lukas Krejci 2012-05-03 18:56:19 UTC
Created attachment 581939 [details]
test web application

Comment 9 Lukas Krejci 2012-05-03 20:04:50 UTC
Repro steps:

1) Have 2 "fresh" installations of EAP6 ER6 ready (called svr1 and svr2)
2) Copy the attached test webapp into svr1/standalone/deployments
3) Start svr1 using:
   bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=svr1
4) Edit svr2/standalone/configuration/standalone-ha.xml and 
   svr2/standalone/configuration/standalone.xml and edit the 
   "port-offset" attribute of the "socket-binding-group" element to:
4) Start svr2 as:
5) Inventory both svr1 and svr2 in RHQ
6) Run the operation "Install RHQ user" on both of them so that RHQ can start 
   managing them, wait for the child resources of the servers to get discovered
7) ... this is a workaround for bug 815899 ...
   Go to the "standard-sockets" resource under each of the imported servers 
   (hidden under SocketBindingGroups category) and change any value to another 
   value, save the config, and then set the changed value back to the original 
   value and save again (this forces the configuration to be available in CLI 
   using "normal" means even if we've not given enough time to the system to 
   come to a consistent state wrt config reading (this shouldn't be necessary
   if there wasn't for the bug)).
8) In the RHQ GUI, navigate to the svr1 and svr2 resources and copy their ids 
   from the URL (I will refer to those ids as <SVR1_ID> and <SVR2_ID> in the
   text below).
9) Start the CLI, log in as a user with enough privs to see the above servers 
   and create child resources.
10) on the CLI command line, do:
    exec -f samples/add-as7-standalone-server-to-cluster.js
    var svr1 = ProxyFactory.getResource(<SVR1_ID>)
    var svr2 = ProxyFactory.getResource(<SVR2_ID>)
    addToCluster(svr2, 'svr2', svr1, true)
11) The above step will spit out several lines of information on the output. 
    Note that it can report a timeout error while deploying the test webapp to 
    the svr2. That can be ignored as long as the rest of the repro steps work. 
    Any other error in the output of that function is a bug.
12) In the shell, do:
    curl -v http://localhost:8080/cluster-demo/put.jsp
    This will output something along the lines of:
* About to connect() to localhost port 8080 (#0)
*   Trying connected
* Connected to localhost ( port 8080 (#0)
> GET /cluster-demo/put.jsp HTTP/1.1
> User-Agent: curl/7.21.7 (i386-redhat-linux-gnu) libcurl/7.21.7 NSS/ zlib/1.2.5 libidn/1.22 libssh2/1.2.7
> Host: localhost:8080
> Accept: */*
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< X-Powered-By: JSP/2.2
< Set-Cookie: JSESSIONID=dw9798zPgO22SCqAavFDxxBI; Path=/cluster-demo
< Content-Type: text/html;charset=ISO-8859-1
< Content-Length: 117
< Date: Thu, 03 May 2012 19:14:32 GMT
 <h2>Set Current Time</h2>
 The time is set to Thu May 03 21:14:32 CEST 2012

13) Copy the text between 'Set-Cookie: ' and the ';' to the clipboard (i.e. in 
    the above example output, you should copy 
    "JSESSIONID=dw9798zPgO22SCqAavFDxxBI" - I will refer to this value as 
14) Note down the time from the HTML output (i.e. "Thu May 03 21:14:32 CEST 
    2012" from above).
15) Now do:
    curl --cookie "<SESSION_ID>" http://localhost:8180/cluster-demo/get.jsp
    This will output something like:
 <h2>Get Time</h2>

 The time is Thu May 03 21:14:32 CEST 2012

16) Make sure that the time reported by the request to localhost:8080 is the same as the time reported when asking localhost:8180.

Note that the trick with the changing of the port-offset is necessary to overcome the consequences of this over-simplified setup that is only useful for testing. In a real environment, the two servers would most likely be on separate hosts and would not need to do such tricks.

Comment 10 Lukas Krejci 2012-05-04 11:08:35 UTC
I need to put this back to ON_DEV because bug 818673 is going to introduce a change that is going to break this script.

I will wait for bug 818673 to be implemented and fix this bug afterwards.

Comment 11 Lukas Krejci 2012-05-04 16:23:08 UTC
commit a40f010f63ff426d60d2b16c822c7aca9ca90db0
Author: Lukas Krejci <lkrejci>
Date:   Fri May 4 18:21:24 2012 +0200

    [BZ 801638] - updating the plugin name to 'JBossAS7'.

Comment 12 Libor Zoubek 2012-05-11 14:02:00 UTC
I did 2 testings:

First I did not follow your steps: 

I have 2 hosts, each is running EAP standalone in standalone-full.xml profile. Script passed, even deployments were copied over.

Then I followed your steps:

I skipped the part with port-offset - I do not need it because I have 2 hosts.

When running the script I got this:
rhqadmin@localhost:7080$ addToCluster(node,'node',cluster,true);       
Reading config of the existing cluster member
TypeError: Cannot read property "list" from null (<Unknown source>#443)

that is line: var portIterator = ports.list.iterator();
That means 'ports' is NULL and I cannot figure out why. I've restarted RHQ server agents, CLI, re-imported EAP running in HA without success. In UI I can successfully see all socket-bindings of EAP server.

It is probably not an issue of a script itself

Comment 13 Lukas Krejci 2012-05-12 09:14:10 UTC
The server that is already a member of a cluster must be running in standalone-ha.xml or standalone-full-ha.xml profile.

The server that is to join the cluster can run in any profile, it will be switched to match the profile of the existing member.

Comment 14 Libor Zoubek 2012-05-17 13:27:27 UTC
So .. error I got in #c12 was not the script issue. I was getting null becuse of Bug 815899. 

This means: before the script runs, both AS7 server resources must be
a) imported more than 15minutes ago
b) or you must manually update cnofiguration for both servers in UI as well as configuration of socket-binding groups (standart-sockets) for both servers.

Otherwise you'll get null config properties as in #c12.

Comment 15 Lukas Krejci 2012-05-18 16:16:10 UTC
I'm putting this back to ON_DEV.

Even though bug 815899 has been implemented, I think we still need to modify the script to fetch the live configs to make absolutely sure the resource configuration is available.

The fix for bug 815899 doesn't guarantee that (and it really shouldn't - that is a new feature that we never had (as captured by RFE bug 822968). But the fix makes sure that we can safely invoke the live-config-getting methods on the server (be it from the UI or CLI) and not corrupt the state of the resource.

Comment 16 Lukas Krejci 2012-05-23 13:03:54 UTC
master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=1a8cbc1c27c3376306845f3baa9bd1f428fff92f
Author: Lukas Krejci <lkrejci>
Date:   Wed May 23 14:56:10 2012 +0200

    [BZ 801638] - Modified the as7 cluster script to use the live configs of
    the servers in question to overcome the possibility of the server having
    stale data (or even having no config data at all).

release/jon3.1.x http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=bc028441c486bdafacb3f69c3f8ce892e7631b32
Author: Lukas Krejci <lkrejci>
Date:   Wed May 23 14:56:10 2012 +0200

    [BZ 801638] - Modified the as7 cluster script to use the live configs of
    the servers in question to overcome the possibility of the server having
    stale data (or even having no config data at all).
    (cherry picked from commit 1a8cbc1c27c3376306845f3baa9bd1f428fff92f)

Comment 17 Lukas Krejci 2012-05-28 08:35:33 UTC
In JON 3.1.0 since ER5

Comment 18 Filip Brychta 2012-09-13 11:55:23 UTC
Verified on 3.1.1.CR2.

Comment 19 Heiko W. Rupp 2013-09-03 15:11:50 UTC
Bulk closing of old issues in VERIFIED state.

Note You need to log in before you can comment on or make changes to this bug.