Bug 247772 - RFE: One service following another
Summary: RFE: One service following another
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager   
(Show other bugs)
Version: 4
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
Keywords: FutureFeature
Depends On: 247980 250101
Blocks: 251044 367631
TreeView+ depends on / blocked
Reported: 2007-07-11 12:01 UTC by Mark Hlawatschek
Modified: 2009-04-16 20:22 UTC (History)
5 users (show)

Fixed In Version: RHBA-2008-0791
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-25 19:15:14 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
basic idea of service following another (3.62 KB, patch)
2007-07-11 12:05 UTC, Mark Hlawatschek
no flags Details | Diff
Preliminary event parser specification. (4.36 KB, text/plain)
2007-08-29 18:04 UTC, Lon Hohberger
no flags Details
Updated specification w/ example script which is being tested (9.06 KB, text/plain)
2007-10-18 17:55 UTC, Lon Hohberger
no flags Details
Patch against RHEL5 (81.71 KB, patch)
2007-10-18 19:17 UTC, Lon Hohberger
no flags Details | Diff
Default catch-all script (4.02 KB, application/octet-stream)
2007-10-18 19:21 UTC, Lon Hohberger
no flags Details
Event scripting 0.7 - RHEL5 (109.35 KB, patch)
2007-11-09 19:20 UTC, Lon Hohberger
no flags Details | Diff
Updated specification (9.58 KB, text/plain)
2007-11-09 19:20 UTC, Lon Hohberger
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0791 normal SHIPPED_LIVE rgmanager bug fix and enhancement update 2008-07-25 19:14:58 UTC

Description Mark Hlawatschek 2007-07-11 12:01:28 UTC
Description of problem:
In order to make a SAP infrastructure highly available, the SAP "enqueue
server" ("enserver") must also be highly available.

The basic scenario is that an enqueing server (enserver) is running on node A 
and the enqueue replication service is running on node B. The enqueue 
replication service (enrepserver) is connecting to the enqueuing service and 
is replicating all states to a local shared memory segment.
In the case, the enqueuing service fails, it has to be restarted on the node  
where the replication service run before (Node B), to be able to attach to the 
shared memory segment with all state information stored.

The basic failover scenarion would be

1) enserver runs on node A, enrepserver on node B. enrepserver
continuously replicates the lock table from enserver into a shared
memory segment on node B.

2) node A fails, cluster software starts enserver on node B (where
enrepserver is running).

3) enserver attaches to shared memory segment containing lock table
replication, rebuilds its lock table from there and shuts down local

4) cluster software starts enrepserver on e.g. node C, where it starts
with replicating the lock table.

Additional info:

We had a discussion about that topic with Nils and Lon at the end of last 
year, when we did a workshop wit the SAP Linux Lab. 
In that workshop, I also created a small rgmanager patch that applies to  
cluster-1.03.00. This patch shows some basic ideas we had to enable such 
I added two attributes to the service tag:
1. follow (e.g. <service domain="all" name="enqueue" follow="repenqueue"> ...)
That means, when service "enqueue" has to be moved to another node, it has to 
follow the service "repenqueue" if it its up and running.

2. avoid (e.g. <service domain="all" name="repenqueue" avoid="enqueue"> ...)
This means, when service "repenqueue" has to be moved to another node, it 
should not be started on the same node where service "enqueue" is running.

I'll attach the patch to this bz.

Comment 1 Mark Hlawatschek 2007-07-11 12:05:05 UTC
Created attachment 158940 [details]
basic idea of service following another

Comment 2 Lon Hohberger 2007-07-11 20:14:10 UTC
Hi Mark, 

Going over this again, I whiteboarded was something like:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 1 dies.

  +- - -+   +-----+   +-----+
  |  A      |  B  |   |     |   B is running; A is on dead node 1
        |   |     |   |     |
  + - - +   +-----+   +-----+

 Node 1 is fenced.  Node 2 starts A

  +     +   +-----+   +-----+
            | A   |   |     |
            |   B |   |     |
  +     +   +-----+   +-----+

 After A's startup is complete, node 2 stops B

  +     +   +-----+   +-----+
            |  A  |   |     |
            |     |   |     |
  +     +   +-----+   +-----+

     Finally, node 3 starts B

  +     +   +-----+   +-----+
            |  A  |   |  B  |
            |     |   |     |
  +     +   +-----+   +-----+

Now - what I would like to know, is... paint me a picture of what happens if nod
e 2 failed instead of node 1.  I imagine it's just "node 3 starts B".

Also, as far as I'm aware, in the particular instance we're concerned with
(SAP), this is mostly an optimization, correct?  It could be that we just start
'A' on node 3.  Restoring from the replication server can occur over the
network, but at a significant performance hit.

Comment 3 Lon Hohberger 2007-07-11 20:15:46 UTC
Also - the 'avoid' patch can be more or less done with the exclusive flag (or
should be able to be done) in most cases, unless there are more services than nodes.

Comment 4 Mark Hlawatschek 2007-07-11 21:38:18 UTC
Hi Lon,

the following picture shows the case node 2 failes:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 2 dies.

  +-----+   +- - -+   +-----+
  |  A  |   |  B      |     |   A is running; B is on dead node 2
  |     |         |   |     |
  +-----+   +- - -+   +-----+

 Node 2 is fenced.  Node 3 starts B

  +-----+   +     +   +-----+
  |  A  |             |  B  |   B is running on node 3
  |     |             |     |
  +-----+   +     +   +-----+

The enqueue service must be started on the node, where the replication service 
is running. The enqueue service will then attach the shared memory segment 
holding the data (locktables). 
If the HA software does not support this feature, the "polling" concept must 
be used. I.e. the replication service must be started on all nodes in the 
failover domain. The drawback: multiple replication servers are causing a 
significant performance loss for the enqueing service as the replication is 
done synchronously. A performance hit of the enqueing service would cause a 
performance hit for the whole SAP application.

I assume that technically the exclusive flag could be used to permit to start 
the replication service on the same node where the enqueing service runs. But 
normally multiple cluster services are running an a SAP cluster. The enque 
replication service should be able to share a node with other services. 
Normally it wouldn't be an option to keep exclusive servers for the enque 

Comment 5 Nils Philippsen 2007-07-12 13:52:32 UTC
Note: bug #247776 is the same for RHCS5.

Comment 6 Russell Doty 2007-08-06 18:30:44 UTC
I'm not sure what version this should be targeted for - I set the flag for
cluster-4.6. If this is wrong, please set the flag properly.

Comment 8 Lon Hohberger 2007-08-29 18:04:53 UTC
Created attachment 179501 [details]
Preliminary event parser specification.

Comment 9 Lon Hohberger 2007-10-18 17:55:50 UTC
Created attachment 231331 [details]
Updated specification w/ example script which is being tested

Comment 10 Lon Hohberger 2007-10-18 17:58:07 UTC
Note: the example script included there is actually overly complex; it's doing
the work of 3 different event handlers:
  * main server start
  * replication queue server start
  * node transition (node up)

Comment 11 Lon Hohberger 2007-10-18 18:01:36 UTC
The script language despite being fairly complex allows a whole lot of
flexibility.  For example, a 'follows-push-away' logic could now be added
trivially to rgmanager by customers.

Comment 12 Lon Hohberger 2007-10-18 19:17:52 UTC
Created attachment 231411 [details]
Patch against RHEL5

Comment 13 Lon Hohberger 2007-10-18 19:19:19 UTC

* User event processing (e.g. clusvcadm -r service)
* Relocate operation (relocate-or-migrate)
* Migration detection on service start

Comment 14 Lon Hohberger 2007-10-18 19:21:15 UTC
Created attachment 231421 [details]
Default catch-all script


* Make this the default catch-all.  Currently not part of the patch; install in
/usr/share/cluster and place:
  <event name="catchall" priority="100"
file="/usr/share/cluster/default_event_script.sl"/> in cluster.conf

Comment 15 Lon Hohberger 2007-10-22 17:44:32 UTC
Possibility of adding email-notification API to script language

Comment 23 Lon Hohberger 2007-11-09 19:20:23 UTC
Created attachment 253261 [details]
Event scripting 0.7 - RHEL5

Comment 24 Lon Hohberger 2007-11-09 19:20:54 UTC
Created attachment 253271 [details]
Updated specification

Comment 25 Lon Hohberger 2007-11-09 19:37:24 UTC
rgmanager event scripting "RIND" v0.7

RIND is not dependencies.

Patch is against current RHEL5 branch of rgmanager and should apply.
Chances since 0.5 include:

* User request handling is centralized
* Recovery is centralized


* Migration
* More testing
* clusvcadm doesn't get correct return codes yet
* Copyright / license stuff.  It all falls under the GPL v2, though.


* You need to install slang and slang-devel to build with this patch.

Comment 26 Lon Hohberger 2008-04-15 15:07:12 UTC
Pushed to RHEL4 git branch

Comment 30 errata-xmlrpc 2008-07-25 19:15:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.