Bug 247772 - RFE: One service following another
RFE: One service following another
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
: FutureFeature
Depends On: 247980 250101
Blocks: 251044 367631
  Show dependency treegraph
Reported: 2007-07-11 08:01 EDT by Mark Hlawatschek
Modified: 2009-04-16 16:22 EDT (History)
5 users (show)

See Also:
Fixed In Version: RHBA-2008-0791
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-25 15:15:14 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
basic idea of service following another (3.62 KB, patch)
2007-07-11 08:05 EDT, Mark Hlawatschek
no flags Details | Diff
Preliminary event parser specification. (4.36 KB, text/plain)
2007-08-29 14:04 EDT, Lon Hohberger
no flags Details
Updated specification w/ example script which is being tested (9.06 KB, text/plain)
2007-10-18 13:55 EDT, Lon Hohberger
no flags Details
Patch against RHEL5 (81.71 KB, patch)
2007-10-18 15:17 EDT, Lon Hohberger
no flags Details | Diff
Default catch-all script (4.02 KB, application/octet-stream)
2007-10-18 15:21 EDT, Lon Hohberger
no flags Details
Event scripting 0.7 - RHEL5 (109.35 KB, patch)
2007-11-09 14:20 EST, Lon Hohberger
no flags Details | Diff
Updated specification (9.58 KB, text/plain)
2007-11-09 14:20 EST, Lon Hohberger
no flags Details

  None (edit)
Description Mark Hlawatschek 2007-07-11 08:01:28 EDT
Description of problem:
In order to make a SAP infrastructure highly available, the SAP "enqueue
server" ("enserver") must also be highly available.

The basic scenario is that an enqueing server (enserver) is running on node A 
and the enqueue replication service is running on node B. The enqueue 
replication service (enrepserver) is connecting to the enqueuing service and 
is replicating all states to a local shared memory segment.
In the case, the enqueuing service fails, it has to be restarted on the node  
where the replication service run before (Node B), to be able to attach to the 
shared memory segment with all state information stored.

The basic failover scenarion would be

1) enserver runs on node A, enrepserver on node B. enrepserver
continuously replicates the lock table from enserver into a shared
memory segment on node B.

2) node A fails, cluster software starts enserver on node B (where
enrepserver is running).

3) enserver attaches to shared memory segment containing lock table
replication, rebuilds its lock table from there and shuts down local

4) cluster software starts enrepserver on e.g. node C, where it starts
with replicating the lock table.

Additional info:

We had a discussion about that topic with Nils and Lon at the end of last 
year, when we did a workshop wit the SAP Linux Lab. 
In that workshop, I also created a small rgmanager patch that applies to  
cluster-1.03.00. This patch shows some basic ideas we had to enable such 
I added two attributes to the service tag:
1. follow (e.g. <service domain="all" name="enqueue" follow="repenqueue"> ...)
That means, when service "enqueue" has to be moved to another node, it has to 
follow the service "repenqueue" if it its up and running.

2. avoid (e.g. <service domain="all" name="repenqueue" avoid="enqueue"> ...)
This means, when service "repenqueue" has to be moved to another node, it 
should not be started on the same node where service "enqueue" is running.

I'll attach the patch to this bz.
Comment 1 Mark Hlawatschek 2007-07-11 08:05:05 EDT
Created attachment 158940 [details]
basic idea of service following another
Comment 2 Lon Hohberger 2007-07-11 16:14:10 EDT
Hi Mark, 

Going over this again, I whiteboarded was something like:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 1 dies.

  +- - -+   +-----+   +-----+
  |  A      |  B  |   |     |   B is running; A is on dead node 1
        |   |     |   |     |
  + - - +   +-----+   +-----+

 Node 1 is fenced.  Node 2 starts A

  +     +   +-----+   +-----+
            | A   |   |     |
            |   B |   |     |
  +     +   +-----+   +-----+

 After A's startup is complete, node 2 stops B

  +     +   +-----+   +-----+
            |  A  |   |     |
            |     |   |     |
  +     +   +-----+   +-----+

     Finally, node 3 starts B

  +     +   +-----+   +-----+
            |  A  |   |  B  |
            |     |   |     |
  +     +   +-----+   +-----+

Now - what I would like to know, is... paint me a picture of what happens if nod
e 2 failed instead of node 1.  I imagine it's just "node 3 starts B".

Also, as far as I'm aware, in the particular instance we're concerned with
(SAP), this is mostly an optimization, correct?  It could be that we just start
'A' on node 3.  Restoring from the replication server can occur over the
network, but at a significant performance hit.
Comment 3 Lon Hohberger 2007-07-11 16:15:46 EDT
Also - the 'avoid' patch can be more or less done with the exclusive flag (or
should be able to be done) in most cases, unless there are more services than nodes.
Comment 4 Mark Hlawatschek 2007-07-11 17:38:18 EDT
Hi Lon,

the following picture shows the case node 2 failes:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 2 dies.

  +-----+   +- - -+   +-----+
  |  A  |   |  B      |     |   A is running; B is on dead node 2
  |     |         |   |     |
  +-----+   +- - -+   +-----+

 Node 2 is fenced.  Node 3 starts B

  +-----+   +     +   +-----+
  |  A  |             |  B  |   B is running on node 3
  |     |             |     |
  +-----+   +     +   +-----+

The enqueue service must be started on the node, where the replication service 
is running. The enqueue service will then attach the shared memory segment 
holding the data (locktables). 
If the HA software does not support this feature, the "polling" concept must 
be used. I.e. the replication service must be started on all nodes in the 
failover domain. The drawback: multiple replication servers are causing a 
significant performance loss for the enqueing service as the replication is 
done synchronously. A performance hit of the enqueing service would cause a 
performance hit for the whole SAP application.

I assume that technically the exclusive flag could be used to permit to start 
the replication service on the same node where the enqueing service runs. But 
normally multiple cluster services are running an a SAP cluster. The enque 
replication service should be able to share a node with other services. 
Normally it wouldn't be an option to keep exclusive servers for the enque 
Comment 5 Nils Philippsen 2007-07-12 09:52:32 EDT
Note: bug #247776 is the same for RHCS5.
Comment 6 Russell Doty 2007-08-06 14:30:44 EDT
I'm not sure what version this should be targeted for - I set the flag for
cluster-4.6. If this is wrong, please set the flag properly.
Comment 8 Lon Hohberger 2007-08-29 14:04:53 EDT
Created attachment 179501 [details]
Preliminary event parser specification.
Comment 9 Lon Hohberger 2007-10-18 13:55:50 EDT
Created attachment 231331 [details]
Updated specification w/ example script which is being tested
Comment 10 Lon Hohberger 2007-10-18 13:58:07 EDT
Note: the example script included there is actually overly complex; it's doing
the work of 3 different event handlers:
  * main server start
  * replication queue server start
  * node transition (node up)
Comment 11 Lon Hohberger 2007-10-18 14:01:36 EDT
The script language despite being fairly complex allows a whole lot of
flexibility.  For example, a 'follows-push-away' logic could now be added
trivially to rgmanager by customers.
Comment 12 Lon Hohberger 2007-10-18 15:17:52 EDT
Created attachment 231411 [details]
Patch against RHEL5
Comment 13 Lon Hohberger 2007-10-18 15:19:19 EDT

* User event processing (e.g. clusvcadm -r service)
* Relocate operation (relocate-or-migrate)
* Migration detection on service start
Comment 14 Lon Hohberger 2007-10-18 15:21:15 EDT
Created attachment 231421 [details]
Default catch-all script


* Make this the default catch-all.  Currently not part of the patch; install in
/usr/share/cluster and place:
  <event name="catchall" priority="100"
file="/usr/share/cluster/default_event_script.sl"/> in cluster.conf
Comment 15 Lon Hohberger 2007-10-22 13:44:32 EDT
Possibility of adding email-notification API to script language
Comment 23 Lon Hohberger 2007-11-09 14:20:23 EST
Created attachment 253261 [details]
Event scripting 0.7 - RHEL5
Comment 24 Lon Hohberger 2007-11-09 14:20:54 EST
Created attachment 253271 [details]
Updated specification
Comment 25 Lon Hohberger 2007-11-09 14:37:24 EST
rgmanager event scripting "RIND" v0.7

RIND is not dependencies.

Patch is against current RHEL5 branch of rgmanager and should apply.
Chances since 0.5 include:

* User request handling is centralized
* Recovery is centralized


* Migration
* More testing
* clusvcadm doesn't get correct return codes yet
* Copyright / license stuff.  It all falls under the GPL v2, though.


* You need to install slang and slang-devel to build with this patch.
Comment 26 Lon Hohberger 2008-04-15 11:07:12 EDT
Pushed to RHEL4 git branch
Comment 30 errata-xmlrpc 2008-07-25 15:15:14 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.