This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 801638 - As a JON user I want to use an example CLI script which will join an EAP6 instance into an existing cluster and then restart the instance
As a JON user I want to use an example CLI script which will join an EAP6 ins...
Status: CLOSED CURRENTRELEASE
Product: RHQ Project
Classification: Other
Component: Plugins (Show other bugs)
unspecified
Unspecified Unspecified
urgent Severity urgent (vote)
: ---
: JON 3.1.0
Assigned To: Lukas Krejci
Mike Foley
:
Depends On: 817631 815447 818673 829944 830865
Blocks: as7-plugin
  Show dependency treegraph
 
Reported: 2012-03-08 23:27 EST by Charles Crouch
Modified: 2015-02-01 18:27 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-03 11:11:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
test web application (2.75 KB, application/octet-stream)
2012-05-03 14:56 EDT, Lukas Krejci
no flags Details

  None (edit)
Description Charles Crouch 2012-03-08 23:27:32 EST
-The concept of what it actually means to be in a "cluster" has always been complex in JBAS, e.g. its been possible to cluster instances for replicating HTTP Sessions and EJB SFSBs independently from one another. The scope of this feature is to allow instances to join whatever types of clusters AS7 enables. When initiating this work we need to figure out first what cluster membership management capabilities are available. If there are gaps JIRAs need to be raised on EAP6 to add this support. While the gaps are filled we should add support to the AS7 plugin to support whatever cluster membership management capabilities are currently available.
-Assuming the instance to join the cluster has got the same apps deployed as the other cluster members then after having run the script then it should be possible to execute the apps in a cluster aware fashion.


Deliverable
-CLI script to execute the desired changes. Inputs to the script
1) Name of EAP6 instance to add to the cluster
2) Cluster identifier
Comment 1 Charles Crouch 2012-03-08 23:28:37 EST
Assigning to Heiko for initial triage
Comment 2 Heiko W. Rupp 2012-03-13 17:12:58 EDT
To create the script we need to differentiate between the two operation modes of AS7

In both cases, the socket-binding-group for the cluster needs to be the same; the cluster bindings need to live on a public interface ; this can be done per binding or by setting /socket-binding-group=x/default-interface=public


- domain mode: 

A managed server is created on a specific server-group that can not be changed. To set a different server-group (which must live in the ha profile), the server needs to be removed and re-created.
Re-creation will need to use the server-group that is running the ha-profile and which also needs to have the same socket bindings for the cluster subsystems if it is living on a different box. An app deployed to the server-group automatically gets deployed to the newly created managed server instance


- standalone mode : here more than one server in standalone mode need to form the cluster. The server connection-property needs to be changed to standalone-ha.xml and the server then restarted. If the application was already deployed, it will now operate in ha-mode 

as the socket bindings may be different for the two servers (no common setup as in domain mode), the script may also need to copy those over

Actually there are more properties like jgroups stack etc that may need to be copied over - here it could make sense to take the respective part of standalone-ha.xml and literally copy that into the new server and not fiddle with RHQ configuration objects at all.
Comment 3 Heiko W. Rupp 2012-03-14 05:40:55 EDT
In standalone mode it may be also needed to copy over the app from standalone.xml to standalone-ha.xml onfiguration when the app was deployed via the api (which the rhq-plugin does); this is not needed for apps that are deployed via file system - as7 unfortunately has an inconsistency here
Comment 4 Heiko W. Rupp 2012-04-17 10:58:12 EDT
We need to know the internals of the cluster to join
- jgroups property
- infinispan transport 
- infinspan cache-containers
- stuff like sso cache-container and others that use clustering

So reading the stuff on node 1 (existing cluster nodes) and applying to node2

Requirement: server 2 needs to run the same ha-profiel than the existing node

Deployments would be read from node1's content subsystem representation and deploy into node2
Comment 5 Heiko W. Rupp 2012-04-17 15:15:16 EDT
We would like to see https://issues.jboss.org/browse/AS7-4501 implemented, but need to work around it, as the basic idea of as7 is not really to have standalone servers in clusters (even if supported).
Comment 6 Lukas Krejci 2012-04-27 16:25:46 EDT
The script is only implemented for the standalone servers, because there is no work to be done when a managed server is joining a server group that has clustering enabled in the domain mode.

Testing instructions will be provided in future comments - I need to compile some (relatively) easy to follow instructions/examples.

master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=70ea9ac32b250b0aa96db1215efe5f4720483745
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Fri Apr 27 22:20:40 2012 +0200

    [BZ 801638] - adding a sample CLI script that can make an AS7 standalone
    server join a cluster (and copy over the deployments to it).
Comment 7 Lukas Krejci 2012-05-02 11:22:22 EDT
commit 33f4a40cc30f44d0aedc5c70c2847ef8534e3029
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Wed May 2 17:20:51 2012 +0200

    [BZ 801638] - fixing a couple of small bugs in the clustering script +
    adding support for reading the AS7 configuration from the script args
    as opposed to the 'config' property.
Comment 8 Lukas Krejci 2012-05-03 14:56:19 EDT
Created attachment 581939 [details]
test web application
Comment 9 Lukas Krejci 2012-05-03 16:04:50 EDT
Repro steps:

1) Have 2 "fresh" installations of EAP6 ER6 ready (called svr1 and svr2)
2) Copy the attached test webapp into svr1/standalone/deployments
3) Start svr1 using:
   bin/standalone.sh -c standalone-ha.xml -Djboss.node.name=svr1
4) Edit svr2/standalone/configuration/standalone-ha.xml and 
   svr2/standalone/configuration/standalone.xml and edit the 
   "port-offset" attribute of the "socket-binding-group" element to:
   "${jboss.socket.binding.port-offset:100}"
4) Start svr2 as:
   bin/standalone.sh
5) Inventory both svr1 and svr2 in RHQ
6) Run the operation "Install RHQ user" on both of them so that RHQ can start 
   managing them, wait for the child resources of the servers to get discovered
7) ... this is a workaround for bug 815899 ...
   Go to the "standard-sockets" resource under each of the imported servers 
   (hidden under SocketBindingGroups category) and change any value to another 
   value, save the config, and then set the changed value back to the original 
   value and save again (this forces the configuration to be available in CLI 
   using "normal" means even if we've not given enough time to the system to 
   come to a consistent state wrt config reading (this shouldn't be necessary
   if there wasn't for the bug)).
8) In the RHQ GUI, navigate to the svr1 and svr2 resources and copy their ids 
   from the URL (I will refer to those ids as <SVR1_ID> and <SVR2_ID> in the
   text below).
9) Start the CLI, log in as a user with enough privs to see the above servers 
   and create child resources.
10) on the CLI command line, do:
    exec -f samples/add-as7-standalone-server-to-cluster.js
    var svr1 = ProxyFactory.getResource(<SVR1_ID>)
    var svr2 = ProxyFactory.getResource(<SVR2_ID>)
    addToCluster(svr2, 'svr2', svr1, true)
11) The above step will spit out several lines of information on the output. 
    Note that it can report a timeout error while deploying the test webapp to 
    the svr2. That can be ignored as long as the rest of the repro steps work. 
    Any other error in the output of that function is a bug.
12) In the shell, do:
    curl -v http://localhost:8080/cluster-demo/put.jsp
    This will output something along the lines of:
 
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /cluster-demo/put.jsp HTTP/1.1
> User-Agent: curl/7.21.7 (i386-redhat-linux-gnu) libcurl/7.21.7 NSS/3.13.1.0 zlib/1.2.5 libidn/1.22 libssh2/1.2.7
> Host: localhost:8080
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
< X-Powered-By: JSP/2.2
< Set-Cookie: JSESSIONID=dw9798zPgO22SCqAavFDxxBI; Path=/cluster-demo
< Content-Type: text/html;charset=ISO-8859-1
< Content-Length: 117
< Date: Thu, 03 May 2012 19:14:32 GMT
< 
 <html>
 <body>
 <h2>Set Current Time</h2>
 
 The time is set to Thu May 03 21:14:32 CEST 2012
 </body>
 </html>

13) Copy the text between 'Set-Cookie: ' and the ';' to the clipboard (i.e. in 
    the above example output, you should copy 
    "JSESSIONID=dw9798zPgO22SCqAavFDxxBI" - I will refer to this value as 
    <SESSION_ID>
14) Note down the time from the HTML output (i.e. "Thu May 03 21:14:32 CEST 
    2012" from above).
15) Now do:
    curl --cookie "<SESSION_ID>" http://localhost:8180/cluster-demo/get.jsp
    This will output something like:
 <html>
 <body>
 <h2>Get Time</h2>

 The time is Thu May 03 21:14:32 CEST 2012
 </body>
 </html>

16) Make sure that the time reported by the request to localhost:8080 is the same as the time reported when asking localhost:8180.


Note that the trick with the changing of the port-offset is necessary to overcome the consequences of this over-simplified setup that is only useful for testing. In a real environment, the two servers would most likely be on separate hosts and would not need to do such tricks.
Comment 10 Lukas Krejci 2012-05-04 07:08:35 EDT
I need to put this back to ON_DEV because bug 818673 is going to introduce a change that is going to break this script.

I will wait for bug 818673 to be implemented and fix this bug afterwards.
Comment 11 Lukas Krejci 2012-05-04 12:23:08 EDT
commit a40f010f63ff426d60d2b16c822c7aca9ca90db0
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Fri May 4 18:21:24 2012 +0200

    [BZ 801638] - updating the plugin name to 'JBossAS7'.
Comment 12 Libor Zoubek 2012-05-11 10:02:00 EDT
I did 2 testings:

First I did not follow your steps: 

I have 2 hosts, each is running EAP standalone in standalone-full.xml profile. Script passed, even deployments were copied over.

Then I followed your steps:

I skipped the part with port-offset - I do not need it because I have 2 hosts.

When running the script I got this:
rhqadmin@localhost:7080$ addToCluster(node,'node',cluster,true);       
Reading config of the existing cluster member
null
TypeError: Cannot read property "list" from null (<Unknown source>#443)
addToCluster(node,'node',cluster,true); 
^

that is line: var portIterator = ports.list.iterator();
That means 'ports' is NULL and I cannot figure out why. I've restarted RHQ server agents, CLI, re-imported EAP running in HA without success. In UI I can successfully see all socket-bindings of EAP server.

It is probably not an issue of a script itself
Comment 13 Lukas Krejci 2012-05-12 05:14:10 EDT
The server that is already a member of a cluster must be running in standalone-ha.xml or standalone-full-ha.xml profile.

The server that is to join the cluster can run in any profile, it will be switched to match the profile of the existing member.
Comment 14 Libor Zoubek 2012-05-17 09:27:27 EDT
So .. error I got in #c12 was not the script issue. I was getting null becuse of Bug 815899. 

This means: before the script runs, both AS7 server resources must be
a) imported more than 15minutes ago
b) or you must manually update cnofiguration for both servers in UI as well as configuration of socket-binding groups (standart-sockets) for both servers.

Otherwise you'll get null config properties as in #c12.
Comment 15 Lukas Krejci 2012-05-18 12:16:10 EDT
I'm putting this back to ON_DEV.

Even though bug 815899 has been implemented, I think we still need to modify the script to fetch the live configs to make absolutely sure the resource configuration is available.

The fix for bug 815899 doesn't guarantee that (and it really shouldn't - that is a new feature that we never had (as captured by RFE bug 822968). But the fix makes sure that we can safely invoke the live-config-getting methods on the server (be it from the UI or CLI) and not corrupt the state of the resource.
Comment 16 Lukas Krejci 2012-05-23 09:03:54 EDT
master http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=1a8cbc1c27c3376306845f3baa9bd1f428fff92f
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Wed May 23 14:56:10 2012 +0200

    [BZ 801638] - Modified the as7 cluster script to use the live configs of
    the servers in question to overcome the possibility of the server having
    stale data (or even having no config data at all).

release/jon3.1.x http://git.fedorahosted.org/git/?p=rhq/rhq.git;a=commitdiff;h=bc028441c486bdafacb3f69c3f8ce892e7631b32
Author: Lukas Krejci <lkrejci@redhat.com>
Date:   Wed May 23 14:56:10 2012 +0200

    [BZ 801638] - Modified the as7 cluster script to use the live configs of
    the servers in question to overcome the possibility of the server having
    stale data (or even having no config data at all).
    (cherry picked from commit 1a8cbc1c27c3376306845f3baa9bd1f428fff92f)
Comment 17 Lukas Krejci 2012-05-28 04:35:33 EDT
In JON 3.1.0 since ER5
Comment 18 Filip Brychta 2012-09-13 07:55:23 EDT
Verified on 3.1.1.CR2.
Comment 19 Heiko W. Rupp 2013-09-03 11:11:50 EDT
Bulk closing of old issues in VERIFIED state.

Note You need to log in before you can comment on or make changes to this bug.