Bug 247772

Summary: RFE: One service following another
Product: [Retired] Red Hat Cluster Suite Reporter: Mark Hlawatschek <hlawatschek>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: cluster-maint, grimme, helge.deller, nphilipp, rdoty
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0791 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-25 19:15:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 247980, 250101    
Bug Blocks: 251044, 367631    
Attachments:
Description Flags
basic idea of service following another
none
Preliminary event parser specification.
none
Updated specification w/ example script which is being tested
none
Patch against RHEL5
none
Default catch-all script
none
Event scripting 0.7 - RHEL5
none
Updated specification none

Description Mark Hlawatschek 2007-07-11 12:01:28 UTC
Description of problem:
In order to make a SAP infrastructure highly available, the SAP "enqueue
server" ("enserver") must also be highly available.

The basic scenario is that an enqueing server (enserver) is running on node A 
and the enqueue replication service is running on node B. The enqueue 
replication service (enrepserver) is connecting to the enqueuing service and 
is replicating all states to a local shared memory segment.
In the case, the enqueuing service fails, it has to be restarted on the node  
where the replication service run before (Node B), to be able to attach to the 
shared memory segment with all state information stored.

The basic failover scenarion would be

1) enserver runs on node A, enrepserver on node B. enrepserver
continuously replicates the lock table from enserver into a shared
memory segment on node B.

2) node A fails, cluster software starts enserver on node B (where
enrepserver is running).

3) enserver attaches to shared memory segment containing lock table
replication, rebuilds its lock table from there and shuts down local
enrepserver.

4) cluster software starts enrepserver on e.g. node C, where it starts
with replicating the lock table.


Additional info:

We had a discussion about that topic with Nils and Lon at the end of last 
year, when we did a workshop wit the SAP Linux Lab. 
In that workshop, I also created a small rgmanager patch that applies to  
cluster-1.03.00. This patch shows some basic ideas we had to enable such 
feature: 
I added two attributes to the service tag:
1. follow (e.g. <service domain="all" name="enqueue" follow="repenqueue"> ...)
That means, when service "enqueue" has to be moved to another node, it has to 
follow the service "repenqueue" if it its up and running.

2. avoid (e.g. <service domain="all" name="repenqueue" avoid="enqueue"> ...)
This means, when service "repenqueue" has to be moved to another node, it 
should not be started on the same node where service "enqueue" is running.

I'll attach the patch to this bz.

Comment 1 Mark Hlawatschek 2007-07-11 12:05:05 UTC
Created attachment 158940 [details]
basic idea of service following another

Comment 2 Lon Hohberger 2007-07-11 20:14:10 UTC
Hi Mark, 

Going over this again, I whiteboarded was something like:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 1 dies.

  +- - -+   +-----+   +-----+
  |  A      |  B  |   |     |   B is running; A is on dead node 1
        |   |     |   |     |
  + - - +   +-----+   +-----+

 Node 1 is fenced.  Node 2 starts A

  +     +   +-----+   +-----+
            | A   |   |     |
            |   B |   |     |
  +     +   +-----+   +-----+

 After A's startup is complete, node 2 stops B

  +     +   +-----+   +-----+
            |  A  |   |     |
            |     |   |     |
  +     +   +-----+   +-----+

     Finally, node 3 starts B

  +     +   +-----+   +-----+
            |  A  |   |  B  |
            |     |   |     |
  +     +   +-----+   +-----+

Now - what I would like to know, is... paint me a picture of what happens if nod
e 2 failed instead of node 1.  I imagine it's just "node 3 starts B".

Also, as far as I'm aware, in the particular instance we're concerned with
(SAP), this is mostly an optimization, correct?  It could be that we just start
'A' on node 3.  Restoring from the replication server can occur over the
network, but at a significant performance hit.

Comment 3 Lon Hohberger 2007-07-11 20:15:46 UTC
Also - the 'avoid' patch can be more or less done with the exclusive flag (or
should be able to be done) in most cases, unless there are more services than nodes.

Comment 4 Mark Hlawatschek 2007-07-11 21:38:18 UTC
Hi Lon,

the following picture shows the case node 2 failes:

     1         2         3
  +-----+   +-----+   +-----+
  |  A  |   |  B  |   |     |   A and B are on separate nodes
  |     |   |     |   |     |
  +-----+   +-----+   +-----+

          Node 2 dies.

  +-----+   +- - -+   +-----+
  |  A  |   |  B      |     |   A is running; B is on dead node 2
  |     |         |   |     |
  +-----+   +- - -+   +-----+

 Node 2 is fenced.  Node 3 starts B

  +-----+   +     +   +-----+
  |  A  |             |  B  |   B is running on node 3
  |     |             |     |
  +-----+   +     +   +-----+

The enqueue service must be started on the node, where the replication service 
is running. The enqueue service will then attach the shared memory segment 
holding the data (locktables). 
If the HA software does not support this feature, the "polling" concept must 
be used. I.e. the replication service must be started on all nodes in the 
failover domain. The drawback: multiple replication servers are causing a 
significant performance loss for the enqueing service as the replication is 
done synchronously. A performance hit of the enqueing service would cause a 
performance hit for the whole SAP application.

I assume that technically the exclusive flag could be used to permit to start 
the replication service on the same node where the enqueing service runs. But 
normally multiple cluster services are running an a SAP cluster. The enque 
replication service should be able to share a node with other services. 
Normally it wouldn't be an option to keep exclusive servers for the enque 
replication. 

Comment 5 Nils Philippsen 2007-07-12 13:52:32 UTC
Note: bug #247776 is the same for RHCS5.

Comment 6 Russell Doty 2007-08-06 18:30:44 UTC
I'm not sure what version this should be targeted for - I set the flag for
cluster-4.6. If this is wrong, please set the flag properly.

Comment 8 Lon Hohberger 2007-08-29 18:04:53 UTC
Created attachment 179501 [details]
Preliminary event parser specification.

Comment 9 Lon Hohberger 2007-10-18 17:55:50 UTC
Created attachment 231331 [details]
Updated specification w/ example script which is being tested

Comment 10 Lon Hohberger 2007-10-18 17:58:07 UTC
Note: the example script included there is actually overly complex; it's doing
the work of 3 different event handlers:
  
  * main server start
  * replication queue server start
  * node transition (node up)

Comment 11 Lon Hohberger 2007-10-18 18:01:36 UTC
The script language despite being fairly complex allows a whole lot of
flexibility.  For example, a 'follows-push-away' logic could now be added
trivially to rgmanager by customers.


Comment 12 Lon Hohberger 2007-10-18 19:17:52 UTC
Created attachment 231411 [details]
Patch against RHEL5

Comment 13 Lon Hohberger 2007-10-18 19:19:19 UTC
TODO: 

* User event processing (e.g. clusvcadm -r service)
* Relocate operation (relocate-or-migrate)
* Migration detection on service start

Comment 14 Lon Hohberger 2007-10-18 19:21:15 UTC
Created attachment 231421 [details]
Default catch-all script

TODO:

* Make this the default catch-all.  Currently not part of the patch; install in
/usr/share/cluster and place:
  
  <event name="catchall" priority="100"
file="/usr/share/cluster/default_event_script.sl"/> in cluster.conf

Comment 15 Lon Hohberger 2007-10-22 17:44:32 UTC
Possibility of adding email-notification API to script language

Comment 23 Lon Hohberger 2007-11-09 19:20:23 UTC
Created attachment 253261 [details]
Event scripting 0.7 - RHEL5

Comment 24 Lon Hohberger 2007-11-09 19:20:54 UTC
Created attachment 253271 [details]
Updated specification

Comment 25 Lon Hohberger 2007-11-09 19:37:24 UTC
rgmanager event scripting "RIND" v0.7

RIND is not dependencies.

Patch is against current RHEL5 branch of rgmanager and should apply.
Chances since 0.5 include:

* User request handling is centralized
* Recovery is centralized

Todo:

* Migration
* More testing
* clusvcadm doesn't get correct return codes yet
* Copyright / license stuff.  It all falls under the GPL v2, though.

Requirements:

* You need to install slang and slang-devel to build with this patch.

Comment 26 Lon Hohberger 2008-04-15 15:07:12 UTC
Pushed to RHEL4 git branch

Comment 30 errata-xmlrpc 2008-07-25 19:15:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0791.html