Bug 492828 - RFE: priorities for services/virtual machines
Summary: RFE: priorities for services/virtual machines
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: rgmanager
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
: 714671 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-30 08:16 UTC by Marc Grimme
Modified: 2018-10-20 04:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:05:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Add priority attribute to the service definition (565 bytes, patch)
2009-04-03 05:56 UTC, Marc Grimme
no flags Details | Diff
Optional patch for default_event_script.sl (5.96 KB, patch)
2009-04-03 05:57 UTC, Marc Grimme
no flags Details | Diff
priority_service.sl is the standalone implementation (6.61 KB, text/plain)
2009-04-03 05:59 UTC, Marc Grimme
no flags Details
Patch against current rhel5 branch (6.46 KB, patch)
2009-04-29 17:29 UTC, Lon Hohberger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1339 0 normal SHIPPED_LIVE Low: rgmanager security, bug fix, and enhancement update 2009-09-01 10:42:29 UTC

Description Marc Grimme 2009-03-30 08:16:02 UTC
Description of problem:
It might be useful when having multiple services/virtual machines under clustercontrol to fail them over in case a node goes down in a userdefined ordered way.

That means that those services that are more critical get failed over when more resources are available.

This usecase can best be described with virtual machines but could also be extended to services.

Let's say you have a cluster of two nodes with different virtual machines running on each node. When one machine goes down you will be sure that those virtual machine get start that are the most important ones. The others should only be successfully failed over when enough resources are still available.

Version-Release number of selected component (if applicable):
n.a.

How reproducible:
When having multiple vms it is not predictable in what order they are failed over in case of problems.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
The fail over behavior is not persistently predictable

Expected results:


Additional info:
As I have a customer who is requesting that feature and as I'm aware of the RIND implementation for rgmanager. I think it can "easily" be implemented with extending the actual fail over policy.

My idea would be to being able to give every services/virtual machine a priority attribute and in case of NODE_DOWN event fail over the services in a  that order (lowerst priority first).

As I have to implement it nevertheless I wanted to make this feature officially available. And discuss if you think this could be a good way.

Marc.

Comment 1 Marc Grimme 2009-03-30 08:17:24 UTC
Lon, as you are on this bug what do you think?

Regards Marc.

Comment 2 Lon Hohberger 2009-03-30 13:15:44 UTC
In the simplest case, we could add an attribute to the service and sort based on this attribute before running through the service list in the event handler(s).

Comment 3 Marc Grimme 2009-03-30 13:56:50 UTC
Yes that was my idea as well. I'll add the patch when I'm done. Ok?

Comment 4 Lon Hohberger 2009-04-02 17:19:10 UTC
Sure. :)

Comment 5 Marc Grimme 2009-04-03 05:54:02 UTC
So this is my first try. I've tested it with two nodes and different services. With and without priorities. When I trigger a NODE_EVENT the services get failed over in an ordered manner. If no priority is specified the service list stays constant as before.

The idea is as if no priority is specified 0 is supposed. The services are ordered with lowest priority first. This means if no priority is specified this service/vm will always get "highest" priority.

The relevant parts of the cluster.conf look as follows:

  <rm central_processing="1" log_facility="local4" log_level="8">
    <events>
       <event name="node" class="node">
          notice("Event node triggered!");
          evalfile("/usr/local/cluster/priority_services.sl");
       </event>
    </events>
    <failoverdomains>
       <failoverdomain name="all">
          <failoverdomainnaode name="axqa03-1" priority="1"/>
          <failoverdomainnaode name="axqa03-1" priority="1"/>
       </failoverdomain>
    </failoverdomains>
    <service name="test1" domain="all" autostart="0" priority="5">
       <script name="/usr/local/test/test1.sh"/>
    </service>
    <service name="test2" domain="all" autostart="0" priority="4">
       <script name="/usr/local/test/test2.sh"/>
    </service>
    <service name="test3" domain="all" autostart="0" priority="3">
       <script name="/usr/local/test/test3.sh"/>
    </service>
    <service name="test4" domain="all" autostart="0" priority="2">
       <script name="/usr/local/test/test4.sh"/>
    </service>
    <service name="test5" domain="all" autostart="0" priority="1">
       <script name="/usr/local/test/test5.sh"/>
    </service>
    <vm name="axqad101_2" path="/etc/xen" domain="all" autostart="0"/>
    <resources/>
  </rm>

Patches follow.

Comment 6 Marc Grimme 2009-04-03 05:56:05 UTC
Created attachment 337979 [details]
Add priority attribute to the service definition

This patch adds the priority attribute to the /usr/share/cluster/service.sh in order to make it available to rgmanager and the service_property slang function.

Comment 7 Marc Grimme 2009-04-03 05:57:43 UTC
Created attachment 337980 [details]
Optional patch for default_event_script.sl

This patch is optional. If you want to make this concept available to the default behavior you can apply this to the default_event_script.sl.

Comment 8 Marc Grimme 2009-04-03 05:59:26 UTC
Created attachment 337981 [details]
priority_service.sl is the standalone implementation

This file can be used as stand alone implementation of this concept. As described in previous Comment.

Comment 9 Marc Grimme 2009-04-03 06:00:57 UTC
If you use this implementation in default_event_script.sl you might want to have a relevant cluster.conf part as follows:

  <rm central_processing="1" log_facility="local4" log_level="8">
    <failoverdomains>
       <failoverdomain name="all">
          <failoverdomainnaode name="axqa03-1" priority="1"/>
          <failoverdomainnaode name="axqa03-1" priority="1"/>
       </failoverdomain>
    </failoverdomains>
    <service name="test1" domain="all" autostart="0" priority="5">
       <script name="/usr/local/test/test1.sh"/>
    </service>
    <service name="test2" domain="all" autostart="0" priority="4">
       <script name="/usr/local/test/test2.sh"/>
    </service>
    <service name="test3" domain="all" autostart="0" priority="3">
       <script name="/usr/local/test/test3.sh"/>
    </service>
    <service name="test4" domain="all" autostart="0" priority="2">
       <script name="/usr/local/test/test4.sh"/>
    </service>
    <service name="test5" domain="all" autostart="0" priority="1">
       <script name="/usr/local/test/test5.sh"/>
    </service>
    <vm name="axqad101_2" path="/etc/xen" domain="all" autostart="0"/>
    <resources/>
  </rm>

What do you think?

Marc.

Comment 10 Lon Hohberger 2009-04-03 13:11:48 UTC
Patch nuked freeze/unfreeze:

@@ -296,15 +434,7 @@
 
 		ret = service_stop(service_name);
 
-	} else if (user_request == USER_FREEZE) {
-
-		ret = service_freeze(service_name);
-
-	} else if (user_request == USER_UNFREEZE) {
-
-		ret = service_unfreeze(service_name);
-
-	}
+	} 
 
 	%
 	% todo - migrate

Aside from that, I'd say it's pretty good.  I'll try to test it today.

Comment 11 Lon Hohberger 2009-04-09 21:28:43 UTC
I haven't tested this yet due to other priorities.  My apologies.  I will test it as soon as I return from vacation.

Comment 12 Lon Hohberger 2009-04-29 17:16:19 UTC
I:

* added back the bits relating to FREEZE/UNFREEZE)

* changed the description of the 'priority' field to the following to note that it only has an effect with central_processing turned on:

   Priority for the service.  In a failover scenario, this
   indicates the ordering of the service (1 is processed
   first, 2 is processed second, etc.).  This overrides the
   order presented in cluster.conf.  This option only has
   an effect if central processing within rgmanager is turned
   on.

* changed <content type="string"... to <content type="integer"... in the new service attribute

Testing went fine.  Note that administrators can achieve the same goal by sorting the services the way they want in cluster.conf directly.

Comment 13 Lon Hohberger 2009-04-29 17:29:56 UTC
Created attachment 341796 [details]
Patch against current rhel5 branch

Comment 18 errata-xmlrpc 2009-09-02 11:05:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1339.html

Comment 21 Lon Hohberger 2011-06-21 13:43:05 UTC
*** Bug 714671 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.