Description of problem: In order to make a SAP infrastructure highly available, the SAP "enqueue server" ("enserver") must also be highly available. The basic scenario is that an enqueing server (enserver) is running on node A and the enqueue replication service is running on node B. The enqueue replication service (enrepserver) is connecting to the enqueuing service and is replicating all states to a local shared memory segment. In the case, the enqueuing service fails, it has to be restarted on the node where the replication service run before (Node B), to be able to attach to the shared memory segment with all state information stored. The basic failover scenarion would be 1) enserver runs on node A, enrepserver on node B. enrepserver continuously replicates the lock table from enserver into a shared memory segment on node B. 2) node A fails, cluster software starts enserver on node B (where enrepserver is running). 3) enserver attaches to shared memory segment containing lock table replication, rebuilds its lock table from there and shuts down local enrepserver. 4) cluster software starts enrepserver on e.g. node C, where it starts with replicating the lock table. Additional info: We had a discussion about that topic with Nils and Lon at the end of last year, when we did a workshop wit the SAP Linux Lab. In that workshop, I also created a small rgmanager patch that applies to cluster-1.03.00. This patch shows some basic ideas we had to enable such feature: I added two attributes to the service tag: 1. follow (e.g. <service domain="all" name="enqueue" follow="repenqueue"> ...) That means, when service "enqueue" has to be moved to another node, it has to follow the service "repenqueue" if it its up and running. 2. avoid (e.g. <service domain="all" name="repenqueue" avoid="enqueue"> ...) This means, when service "repenqueue" has to be moved to another node, it should not be started on the same node where service "enqueue" is running. I'll attach the patch to this bz.
Created attachment 158940 [details] basic idea of service following another
Hi Mark, Going over this again, I whiteboarded was something like: 1 2 3 +-----+ +-----+ +-----+ | A | | B | | | A and B are on separate nodes | | | | | | +-----+ +-----+ +-----+ Node 1 dies. +- - -+ +-----+ +-----+ | A | B | | | B is running; A is on dead node 1 | | | | | + - - + +-----+ +-----+ Node 1 is fenced. Node 2 starts A + + +-----+ +-----+ | A | | | | B | | | + + +-----+ +-----+ After A's startup is complete, node 2 stops B + + +-----+ +-----+ | A | | | | | | | + + +-----+ +-----+ Finally, node 3 starts B + + +-----+ +-----+ | A | | B | | | | | + + +-----+ +-----+ Now - what I would like to know, is... paint me a picture of what happens if nod e 2 failed instead of node 1. I imagine it's just "node 3 starts B". Also, as far as I'm aware, in the particular instance we're concerned with (SAP), this is mostly an optimization, correct? It could be that we just start 'A' on node 3. Restoring from the replication server can occur over the network, but at a significant performance hit.
Also - the 'avoid' patch can be more or less done with the exclusive flag (or should be able to be done) in most cases, unless there are more services than nodes.
Hi Lon, the following picture shows the case node 2 failes: 1 2 3 +-----+ +-----+ +-----+ | A | | B | | | A and B are on separate nodes | | | | | | +-----+ +-----+ +-----+ Node 2 dies. +-----+ +- - -+ +-----+ | A | | B | | A is running; B is on dead node 2 | | | | | +-----+ +- - -+ +-----+ Node 2 is fenced. Node 3 starts B +-----+ + + +-----+ | A | | B | B is running on node 3 | | | | +-----+ + + +-----+ The enqueue service must be started on the node, where the replication service is running. The enqueue service will then attach the shared memory segment holding the data (locktables). If the HA software does not support this feature, the "polling" concept must be used. I.e. the replication service must be started on all nodes in the failover domain. The drawback: multiple replication servers are causing a significant performance loss for the enqueing service as the replication is done synchronously. A performance hit of the enqueing service would cause a performance hit for the whole SAP application. I assume that technically the exclusive flag could be used to permit to start the replication service on the same node where the enqueing service runs. But normally multiple cluster services are running an a SAP cluster. The enque replication service should be able to share a node with other services. Normally it wouldn't be an option to keep exclusive servers for the enque replication.
Note: bug #247776 is the same for RHCS5.
I'm not sure what version this should be targeted for - I set the flag for cluster-4.6. If this is wrong, please set the flag properly.
Created attachment 179501 [details] Preliminary event parser specification.
Created attachment 231331 [details] Updated specification w/ example script which is being tested
Note: the example script included there is actually overly complex; it's doing the work of 3 different event handlers: * main server start * replication queue server start * node transition (node up)
The script language despite being fairly complex allows a whole lot of flexibility. For example, a 'follows-push-away' logic could now be added trivially to rgmanager by customers.
Created attachment 231411 [details] Patch against RHEL5
TODO: * User event processing (e.g. clusvcadm -r service) * Relocate operation (relocate-or-migrate) * Migration detection on service start
Created attachment 231421 [details] Default catch-all script TODO: * Make this the default catch-all. Currently not part of the patch; install in /usr/share/cluster and place: <event name="catchall" priority="100" file="/usr/share/cluster/default_event_script.sl"/> in cluster.conf
Possibility of adding email-notification API to script language
Created attachment 253261 [details] Event scripting 0.7 - RHEL5
Created attachment 253271 [details] Updated specification
rgmanager event scripting "RIND" v0.7 RIND is not dependencies. Patch is against current RHEL5 branch of rgmanager and should apply. Chances since 0.5 include: * User request handling is centralized * Recovery is centralized Todo: * Migration * More testing * clusvcadm doesn't get correct return codes yet * Copyright / license stuff. It all falls under the GPL v2, though. Requirements: * You need to install slang and slang-devel to build with this patch.
Pushed to RHEL4 git branch
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0791.html